✨ New ✨ The Digital Services Act: A fireside chat covering all angles Watch it here → ×

How To Create Accurate Content Moderation Filters That Work

Contents

    Creating a filter for moderation is the easy part. Ensuring that a filter produces the desired result is a science. Great accurate content moderation filters don’t just catch what they are set up to find, but also work in a way that minimizes the number of false positives.

    With good filters alone, our clients have achieved up to 80% automation of their content moderation, but what is the secret to building an accurate and efficient filter?

    8 steps of filter moderation

    Our expert filter manager, Kevin Martinez, shares his eight steps for building content moderation filters that work.

    In this example, we are using drugs as the target of the filter, but the process is the same for building anything from profanity filters to rules aimed at catching ads for endangered species.

    1. Mission:
    Define the goal of the filter. In this case, we want it to help us prevent illegal drugs from getting posted to our site. As such, we created a filter in Implio called “drugs”. It is always advisable to name your filters something descriptive. You are likely going to end up with a bunch of filters so it is good to be able to understand their function at a glance.

    2. Local:
    Check the laws of the country your site is operating in. Laws can vary widely depending on country and sometimes even by region. In Spain, for instance, you are allowed to sell the growing box for cannabis and cannabis seeds, but not the plant itself.

    3. Action:
    Decide on the action you want the filter to take. Should it refuse to send to manual moderation, or is it just a test filter that shouldn’t take any action other than highlighting the ads that would’ve been caught? In Implio, the default action is to accept any ads that don’t match the filter automatically, but you have full control over what happens to content that matches your rules.

    4. List:
    Create a list of all drug-related keywords (Cocaine, heroin, cannabis, etc.). Make sure you also include any slang words your users are likely to use for drugs.

    5. Rule:
    Now, it is time to set up your rule. Make sure that the rule pulls from the list and that you add exceptions to avoid false positives. For example, in step 2, we discovered that selling cannabis seeds is okay. As such, we must ensure our filter excludes “cannabis” + ”seeds.”

    6. QA:
    Once your filter is set up with the list and all relevant exclusions, it is time for the first quality check. Upload your data to Implio and review the matches. Are you getting any false positives?

    7. Exceptions:
    For all false positives, add exceptions (also called white-listing specific content). Think your exceptions through so you don’t give a blanket white list allowing unwanted content.

    8. Rinse & repeat:
    Once you have added new exceptions, run your data through again to quality-check your updated filter. Repeat steps 6-8 as often as possible to reach your target quality rate. At Besedo, we aim at 95% accuracy as a minimum, and we reach higher for most of our filters.

    Even though you can refuse content pieces automatically, we do not recommend that unless you can reach a 100% accuracy level on your filter. (It is quite rare to be able to reach that accuracy level, but an example of a filter that could reach such scores would be an IP-related rule where you do not want to allow users from certain IP’s to post).

    If you are not 100% certain that all ads matched by the refusal filter should be refused, then you should send matches for manual review instead. Otherwise, you run the risk of ruining your user experience.

    If you follow Kevin’s 8 steps you should be well-equipped to create your own accurate filters.

    If you want to improve your abilities in filter management even further, we offer training sessions where our expert filter managers will teach you all about regular expressions and rules crafting through step-by-step guides and exercises.

    Learn more about filter training

    Contents