ML Marketplace Custom solution

Filter Automation, Generic AI or Tailor-Made Machine Learning Moderation Solutions: Which Should You Use and When?

Feb 8 2017

Every forward-looking business is considering how to implement automation strategies throughout their different departments these days. It is not surprising that moderation, which traditionally required a lot of manual work has been one of the first areas to get turned upside down by the emerging automation trend.

But smart business owners also realize that for automation to be efficient, it has to be applied right. Even though there are huge cost-savings to be had by adding automation to your moderation process, you will only reap the monetary benefits if you chose the right approach.

Bill Gates famously said: 

“The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency. The second is that automation applied to an inefficient operation will magnify the inefficiency.”

We agree with Gates and will also add:

“The third rule is that the wrong automation type applied to an efficient or inefficient operation will be very costly both to resources and potentially to brand image as well.”

Why? Because each type of automation has it’s own strengths and weaknesses, which will benefit businesses differently depending on volumes, budget and growth stage.

We will go through these characteristics now.

Filters

Filters were part of the first wave of automation efforts applied by site owners desperate to manage content and keep users safe.
In 2017, filters might seem like an inflexible and rigid approach to moderation, but they are still a very valuable tool in the moderation toolkit both as a cheap entry point and as an additional level working alongside AI to keep your site clean.

How Do Automation Filters Work and Where Are They Most Useful?

If applied and maintained correctly filters can be quite powerful on their own. In Implio, filters or rules as we call them can be set up to be pretty comprehensive. You can design them so they look at price, IP, description or any other data field available. On top of this our tool offers the option to create lists, allowing for easy updates without messing about with the rule structure. Filter automation is great for cases where you want to target very specific information like user data.

Filters can add a relatively inexpensive first layer of defence against scammers and other undesirable users. Filters can be created to target any unwanted content that follows a recurring pattern, which can be precisely described and thus formulated as a rule.

Another benefit with filters is that they are very easy to change and set up. You don’t need huge datasets to train the filters, you just enter a new rule based on your needs and you are ready to go.

The drawback with filters is of course that you continuously have to update them which means you are often in an open arms race with scammers and other users, intent on misusing your site.

Generic Machine Learning Models

Most offerings for automation on the market today, which aren’t filters, fall into this category. Generic machine learning models are algorithms created from general data and targeted at issues that are common to a wide variety of sites.

How Do Generic Machine Learning Models Work and Where Are They Most Useful?

The difference between generic and tailor made machine learning models, is primarily in the datasets used to train the algorithms. Whereas a tailor-made solution will be built from your specific data, a generic model is taught from an artificially created dataset aimed at covering all variables of the problem it is trying to solve.

A generic model meant to deal with swear words would as such be presented with a dataset containing 50% documents full of swear words and 50% which were clean and okay.

The main problem with generic models is, as you can probably imagine, that they have a tendency to be very broad, resulting in low accuracy rates.

One of the challenges in building datasets to train generic machine learning models is that there is no general agreement on what constitutes for instance bad language. Different communities will have different levels of sensitivity towards profanities and sometimes whether a word is bad or not can even be a matter of context. With a generic model you will get an algorithm tuned to what is generally accepted as swearwords and that might not fit your site.

If your community isn’t largely specialised, generic machine learning models can however be good for getting the worst content off your site quickly and at a very affordable price. Just make sure that you are aware of the limitations and comfortable knowing that some bad content will slip through and some genuine posts will get caught. Generic machine learning models are bound to create a lot more false positives than tailor-made algorithms and customized filters.

Tailor-made Machine Learning Models

As opposed to generic models, tailor-made machine learning algorithms are trained using data that is specific to your site.
This allows the model to operate at a very high accuracy level and if we take the example of swearing, you avoid false positives from words that are generally labelled as profanity, but which are acceptable in the context of your community.

To illustrate the point let’s say that you are running a site for medical professionals where they can buy and sell equipment while also discussing their trade and exchange knowledge. On such a site, using words for genitalia wouldn’t be out of place or considered profane. Posts containing those words would however very likely get caught by a generic model. A tailor-made algorithm would know that those posts are generally allowed through, whereas those containing the f-word are still rejected as profane.

How Do Tailor-Made Machine Learning Models Work and Where Are They Most Useful?

When we create tailor-made machine learning models at Besedo we work closely together with our clients to ensure the best possible outcome. The bigger the dataset our client has available, the more accurate a solution we can create for them. Over time with tailor-made solutions we can reach an accuracy of 99% with an automation rate of 90%, something completely unrivalled by either generic models or filter automation.

A tailor-made machine learning solution is able to reach very high accuracy and automation levels, but it does require quite big data volumes. If you run a small site, and are just starting out, you may not have enough valid data to efficiently train the algorithms. In such cases your better choice might be filters or a generic machine learning solution.

Learn how to moderate without censoring

Why moderating content without censoring users demands consistent, transparent policies.

Untitled(Required)

Choosing the Automation Setup That Is Right for You

Now that we have been through the different types of automation solutions, it is time to find out which one fits your needs best.

At this point it probably comes as no surprise that it really comes down to a question of volumes, accuracy and budget.

In the table below we are illustrating elements that impacts the choice of automation type. Based on this matrix you can find a good guideline for which automation type will work best for your specific business.

When Are Tailor-Made Machine Learning Models the Best Option?

If your monthly volumes are 100k or more, then a tailor-made machine learning model is the way to go. You have sufficient volumes so it is possible to create very exact models.

Furthermore, if you are currently moderating your content manually, you will make your money back very fast even with an investment in a tailor-made solution. Tailor-made machine learning models will allow you to manage your content with high accuracy.

When Is Filter Automation the Best Option?

If your volumes are less than 100k and avoiding false positives is very important to you, then filter automation makes sense for your business. You will be able to tweak the wordlists and rules until they match your site rules and you can even apply a level of manual moderation to review items that get caught in the filter to ensure a higher accuracy level.

When Are Generic Machine Learning Models the Best Option?

If your volumes are less than 100k items per month and you can live with an accuracy as low as 75%, generic machine learning models might be a good solution. In the event that you have to choose between no moderation and a generic algorithm, it is generally better to apply some level of moderation as long as the solution doesn’t catch too much content published by genuine users.

It might even be that the best solution for you is a combination of the three in a tailored setup to fit the specific and varied challenges your business faces.

Are you still in doubt about which automation type fits your business best? One of our solution designers will be happy to analyze your business needs with you, to determine what option will benefit you the most.

This is Besedo

Global, full-service leader in content moderation

We provide automated and manual moderation for online marketplaces, online dating, sharing economy, gaming, communities and social media.

Form background