Using a LLM to fight spam

I am sure everyone is already tired of hearing yet another story about LLMs and what they can do. I’ve been fairly skeptical myself and stayed away from it all due to the hype. And there are obvious limitations to the technology, but when used correctly it can be quite helpful. After a recent experiment in generating suggestions, I have now turned my attention to fighting spammers.

The problem

I develop and run a community website for fountain pen enthusiasts called Fountain Pen Companion. Sign up is available to everyone, and part of it is a profile page for every user that allows you to use Markdown to format your blurb (e.g. here’s mine).

While I already am successful at keeping out bots by requiring new users to confirm their email address and the use of a captcha, I’ve recently had to deal with spammers that are using the profile pages to host spam (curiously, almost exclusively in Vietnamese ?!?). Having to constantly check for changes in profile texts and deleting accounts is getting old, and I’d rather have a solution that classifies these accounts automatically and where I can double check the results on my own time and without haste.

The solution

Inspired by listening to the episode with Obie Fernandez on the Maintainable podcast, I decided to give classification via a LLM a try. In was really very simple. Give the model a few accounts that are spam (email address and profile text), a few that are not spam, and then ask it to classify the account you’re interested in (few-shot learning in the lingo, I think). More specifically the prompt looks like this:

Given the following spam accounts:
email, name, blurb, time zone
/CSV FORMATTED DATA HERE/

And the following normal accounts:
email, name, blurb, time zone
/CSV FORMATTED DATA HERE/

Classify the following account as spam or normal:
email, name, blurb, time zone
/CSV FORMATTED DATA HERE/

Return only "spam" if spam account and "normal" if normal account.

The prompt cannot get arbitrarily long, so I am passing in the latest (up to) 50 spam accounts and a random selection of 50 non-spam accounts that have a profile text.

You can instruct ChatGPT to return JSON formatted output, but here it’s just a simple true/false, so string matching on the one word returned works well enough.

So, after every change of a user’s blurb that contains at least one link, I run the classification. No spam without links, right? These then all get added to a “to review” queue. But at the same time, I run this classifier and whenever the LLM detects an account as spam it gets locked and it’s removed from the public part of the site. I can then double check the results at my leisure, knowing that the site won’t host any spam, but also having the possibility to double check it’s work afterwards.

Did it work then?

Sadly (?), the spammers haven’t been coming that much anymore, so I haven’t been able to classify that many accounts, but so far it has worked well! And given the little amount of work I’ve had to put into this, it was really well worth doing it. Actually, that’s the main attraction: Not having to think too much about how the classification will work, but instead asking a system to just do it for you.

In summary, as this isn’t it the critical path and also not super important it’s perfect for a LLM. It does a useful job, but when it makes a mistake no real harm is done and it can easily be corrected.

Resources

All of this happened in the context of my open source project Fountain Pen Companion. Below is a chronological list of all relevant Pull Requests: