What does it take to build a state-of-the-art AI content moderation tool? We caught up with Besedo’s semantics expert and computational linguistics engineer, Evgeniya Bantyukova.
Interviewer: Nice to meet you! Tell us a little about yourself.
Evgeniya: I’m Evgeniya and I’m based in Besedo’s Paris office. I’m originally from Russia but I’ve been in France for the past five or so years. I started at ioSquare about a year and a half ago, and have continued to work there as part of Besedo since the two companies merged last year.
Interviewer: What do you do? What is your job title and what does it really mean?
Evgeniya: As a computational linguistics engineer, I guess you could describe me as part linguist and part computer programmer. The work I do bridges the gap between what people search for and post online and the way content is moderated.
I work with semantics. This means I spend a lot of time researching information and looking at the different ways words and phrases are presented and expressed. I also build filters to analyze and identify the information I’ve manually researched. It’s an iterative process of constant refinement that takes time to perfect.
The filters can then be used by us, on behalf of our clients, to identify when a certain piece of text using these terms and phrases is submitted to their site; before it gets posted. The ultimate aim is to ensure that incorrect, defamatory, or just plain rude information doesn’t get posted to our clients’ sites.
Interviewer: What kind of projects have you worked on? Could you give us an example?
Evgeniya: Sure. Recently I was tasked with creating a filter for profanity terms in several different languages – not just the words themselves, but variations on them, like different ways to spell them or alternative phrasings.
This also involved analyzing them and creating a program or model that could detect their use. There was a lot of data capture and testing involved on millions of data points; which helped ensure the filters we built were as effective as possible.
One thing I’m working on right now is a project tackling fake profiles on dating sites: analyzing scam messages and extracting the expressions and words that are most frequently used. One thing I have discovered in this process is that those posting fake profiles often use sequences of adjectives – words like ‘nice’, ‘honest’, or ‘cool ‘ – so now I’m looking at creating a model that finds profiles that fit that description. That approach on its own would create many false positives, but with discoveries like these we get a much more precise idea of what fake profiles look like, and that helps us create filters that limit the number that go live on our clients’ sites.
Interviewer: How does the work you do feed into AI moderation?
Evgeniya: Crafting filters involves working on a set amount of data. The more data we have, the more accurate we can make our filters. It’s an iterative and human-driven process, but engineered to be very precise.
Filters like these, when used as verification models, can help improve the precision and quality of manual content moderation. And when used in combination with our machine learning/deep learning pipeline, they improve our AI’s overall accuracy and efficiency.
The filters I build are quite generic so they are used as a framework for multiple clients, depending on their moderation needs. And they can be tailored to specific assignment as needed. On top of that and to keep our filters “sharp”, we continuously update them, as language evolves and new trends and words appear.
Interviewer: Do you have any heroes or role models that you admire in your field?
Evgeniya: Well, as you might imagine, role models in computational linguistics are kind of hard to come by. But I’m a big fan of theoretical linguists like Noam Chomsky.
Interviewer: What qualities do you need to succeed in your field?
Evgeniya: I think you need to be genuinely curious about the world in general. Every new trend and phenomenon should interest you as they will result in new tendencies and words and that will impact the filters you are crafting.
You also need to have a knack for languages or at least the structure of how different languages are built.
Finally you need to be openminded and able to stay objective. When working on a profanity filter, it doesn’t help if you continuously offended. You need to stay neutral and focus on the endgame; keeping people safe online.
This is why I enjoy my job so much, it is very rewarding knowing that you are making a difference – whether that’s ensuring that a site is secure for users or more generally when seeing the positive impact of something you’ve done. Take dating sites for instance; The fact that the work I do can help someone find love, that’s the greatest reward I can think of. I guess I’m something of a hopeless romantic!
Evgeniya is a linguistic engineer at Besedo.
She combines her programming and linguistic skills in order to automatically process natural languages.
Her work allows Besedo to build better and more accurate filters and machine learning algorithms.