Website spam and moderation queues

Mollom is a web service that blocks website spam. Websites using Mollom send data they want checked to, and Mollom replies with either a spam or ham classification. If Mollom is not certain, it will return an unsure classification, typically prompting websites to ask Mollom's CAPTCHA server for an audio or visual CAPTCHA challenge to present to the user. In other words, Mollom uses a classifier with three states: ham, spam and unsure. We explained that in detail on the "How Mollom works" page.

Over at the Mollom blog, Ben wrote a great post about why we believe this is a key difference, and how that allows Mollom to eliminate your moderation queue. A picture is worth more than a thousand words, so check out the plots below and check out Ben's blog post for more details.

Spam versus ham
The plot illustrates that having a binary classifier with only two states (ham and spam) is bound to make mistakes. The plot is generated from the actual data in Mollom's database.

As you can see on the first graph, a binary classifier with two states (ham, spam) is never going to be deadly accurate, and will require a moderation queue so the user can manually deal with legitimate comments that incorrectly got classified as spam. Unfortunately, moderation queues are not fun, and they don't make you any more productive. You'll still find yourself wading through thousands of spam comments looking for ham. In other words, a moderation queue doesn't really solve the problem -- it just makes the problem look different.

Spam versus ham
The plot illustrates that having a classifier with three states avoids false positives and false negatives. The plot is generated from the actual data in Mollom's database.

Time for something better. As you can see on the second graph, a classifier with three states is going to be a lot more accurate. In fact, Mollom is so accurate that the Drupal module doesn't come with a moderation queue! It an important distinction, and one of the many innovations that we have in store for you. Bye bye moderation queue!


Matthais (not verified):

Moderation queue's actually hog up valuable local database resources. You need extra columns or extra tables, extra queries which require validation, extra interfaces with forms, extra code, etc. etc.

So I've noticed the lack of the whole moderation thingy in the mollom module... and there was much rejoicement over here :-) Of course, I wondered if moderation is something one should implement optionally, but then again, as you demonstrated, Mollom has no need for it anyways.

April 18, 2008
Larry Garfield (not verified):

You need a kitch name for the "Unsure" category, something in keeping with the pork motif. Spam, Ham, and Bacon?

April 18, 2008
Robert Wetzlmayr (not verified):

I'm wondering whether a developer's API is ready yet, as I am planning to integrate Mollom with Textpattern's comment evaluator.

Any news on this?

April 18, 2008
Island Usurper (not verified):

@Larry: How about Sausage for the in between? Bacon is too good IRL to be associated with a bad thing. But there is good sausage, and then there's bad sausage.

Interestingly, this post was considered sausage.

April 18, 2008
Larry Garfield (not verified):

Sausage works!

Once this becomes an Internet meme, let's make sure people remember where it started. :-) Sausage, the "maybe spam".

April 18, 2008
Senpai (not verified):

Would that be 'maybe spam', or 'may be spam'? I taste a patch coming soon!

May 16, 2008

Updates from Dries straight to your mailbox