How to create accurate content moderation filters that work

Creating a filter for moderation is the easy part. Ensuring that a filter produces the desired result is a science. Great accurate content moderation filters don’t just catch what they are set up to find, but also work in a way that minimizes the number of false positives.

With good filters alone our clients have achieved up to 80% automation of their content moderation, but what is the secret to actually building a filter that is both accurate and efficient?

8 steps of filter moderation

Our expert filter manager Kevin Martinez shares his 8 steps on how to build content moderation filters that work.

In this example, we are using drugs as the target of the filter, but the process is the same for building anything from profanity filters to rules aimed at catching ads for endangered species.

1. Mission:
Define the goal of the filter. In this case, we want it to help us prevent illegal drugs from getting posted to our site. As such, we created a filter in Implio and name it “drugs”. It is always advisable to name your filters something descriptive. You are likely going to end up with a bunch of filters so it is good to be able to understand their function at a glance.

2. Local:
Check the laws of the country your site is operating in. Laws can vary widely depending on country and sometimes even by region. In Spain, for instance, you are allowed to sell the growing box for cannabis and cannabis seeds, but not the plant itself.

3. Action:
Decide on the action you want the filter to take. Should It refuse, send to manual moderation or is it just a test filter that shouldn’t take any action other than highlighting the ads that would’ve been caught? In Implio the default action is to automatically accept any ads that don’t match the filter, but you have full control over what happens to content that matches your rules.

4. List:
Create a list of all drug-related keywords (Cocaine, heroin, cannabis etc.). Make sure you also include any slang words your users are likely to use for drugs.

5. Rule:
Now it is time to set up your rule. Make sure that the rule pulls from the list and that you add exceptions to avoid false positives. For example, in step 2 we discovered that selling cannabis seeds are okay. As such we need to make sure our filter excludes “cannabis”+”seeds”.

6. QA:
Once your filter is set up with the list and all relevant exclusions it is time for the first quality check. Upload your data to Implio and review the matches. Are you getting any false positives?

7. Exceptions:
For all false positives make sure that you add exceptions (also called white-listing specific content). Think your exceptions through so you don’t give a blanket white-list that allows unwanted content through.

8. Rinse & repeat:
Once you have added new exceptions run your data through again to quality check your updated filter. Repeat step 6-8 as many times as you have to in order to reach your target quality rate. At Besedo we aim at 95% accuracy as a minimum and for most of our filters we reach higher.

Even though you have the option to refuse content pieces automatically, we do not recommend that unless you are able to reach 100% accuracy level on your filter. (It is quite rare to be able to reach that accuracy level, but an example of a filter that could reach such scores would be an IP related rule where you do not want to allow users from certain IP’s to post).

If you are not 100% certain that all ads matched by the refusal filter should be refused, then you should send matches for manual review instead. Otherwise, you run the risk of ruining your user experience.

If you follow Kevin’s 8 steps you should be well equipped to create your own accurate filters. If you want to improve your abilities in filter management even further, we offer training sessions where our expert filter managers will teach you all about regular expressions and rules crafting through step by step guides and exercises.

Learn more about filter training

Want to learn more?
Join the crowds who receive exclusive content moderation insights.