Mollom

Python wrapper for Mollom

Andy Georges released a Python wrapper for Mollom. The wrapper can be used to integrate Mollom in your Python applications, but it also gets Mollom one step closer to the Django project and Google App Engine.

The Mollom API was released less than 10 days ago, and already Mollom is supported on PHP, Java, Python and Ruby. Sweet!

Ruby Gem for Mollom

Jan De Poorter of Openminds (my favorite hosting company), wrote a Ruby Gem for Mollom. You can get the Gem from http://mollom.rubyforge.org/. Jan told me that a Ruby on Rails plugin is on its way, so keep an eye on Jan's development blog. Thanks Jan!

PHP5 class for Mollom

Tijs Verkoyen of CR Solutions used the Mollom API documentation to create a PHP5 class for Mollom. Tijs' class is available under terms of a BSD license, so you can use it to integrate Mollom into your PHP applications. We've added the class on the Mollom download page. Thanks Tijs!

Mollom API now available

In hours previously wasted sleeping, we've worked hard to finalize and document the Mollom API. We released the API to the public today, so now you can build Mollom's filtering capabilities into your own applications. Woot!

One million spam attempts blocked

Last weekend, just 3 weeks after we launched Mollom, Mollom has blocked the one millionth spam attempt. That is a million tiny contributions to make the web a nicer place. Incidentally, Mollom also got mentioned on Techcrunch that same weekend. Milestone weekend!

Website spam and moderation queues

Mollom is a web service that blocks website spam. Websites using Mollom send data they want checked to mollom.com, and Mollom replies with either a spam or ham classification. If Mollom is not certain, it will return an unsure classification, typically prompting websites to ask Mollom's CAPTCHA server for an audio or visual CAPTCHA challenge to present to the user. In other words, Mollom uses a classifier with three states: ham, spam and unsure. We explained that in detail on the "How Mollom works" page.

Over at the Mollom blog, Ben wrote a great post about why we believe this is a key difference, and how that allows Mollom to eliminate your moderation queue. A picture is worth more than a thousand words, so check out the plots below and check out Ben's blog post for more details.

Spam versus ham

The plot illustrates that having a binary classifier with only two states (ham and spam) is bound to make mistakes. The plot is generated from the actual data in Mollom's database.

As you can see on the first graph, a binary classifier with two states (ham, spam) is never going to be deadly accurate, and will require a moderation queue so the user can manually deal with legitimate comments that incorrectly got classified as spam. Unfortunately, moderation queues are not fun, and they don't make you any more productive. You'll still find yourself wading through thousands of spam comments looking for ham. In other words, a moderation queue doesn't really solve the problem -- it just makes the problem look different.

Spam versus ham

The plot illustrates that having a classifier with three states avoids false positives and false negatives. The plot is generated from the actual data in Mollom's database.

Time for something better. As you can see on the second graph, a classifier with three states is going to be a lot more accurate. In fact, Mollom is so accurate that the Drupal module doesn't come with a moderation queue! It an important distinction, and one of the many innovations that we have in store for you. Bye bye moderation queue!

Spam, OpenID and Mollom

There is an interesting discussion about spam and OpenID going on at Matt Mullenweg's blog. The discussion was triggered by the policy decision of social bookmarking site Magnolia to restrict signups to OpenID users. According to the site, 75% of new accounts were being created at Magnolia by spammers using automated tools (our friends the 'spambots'). They say that by restricting access to OpenID users, the rate of spam-account creation decreased. In the discussion, there is a lot of talk about whether OpenID should be used to fight spam, and whether it could be an effective spam-fighting tool in the long term.

Here are my thoughts. Spammers can create OpenIDs too, and a single sign-on system might be many a spammer's wet dream. It gives them easy access to millions of sites in one fell swoop.

Now, OpenID by itself can't prevent spam. All it does is provide a globally unique identifier for any given user on the planet. This is where a tool like Mollom comes in. At Mollom we're already maintaining an internal reputation for each OpenID account we encounter while assessing submitted content. Combine an identity system (OpenID) with a reputation system (Mollom) and it becomes a lot easier to separate spam users from non-spam users. Simon Willison said it best: "a trust system requires identity first". A globally unique identifier combined with reputation tools give us a powerful weapon to fight website spam. OpenID's attribute exchange might become Mollom's best friend ...

Similarly, Tim Berners-Lee is experimenting with combining FOAF ("friend of a friend") and OpenID to fight spam: you can only comment on Tim's blog if you are no more than a certain number of degrees of friendship away from him. Of course, it is a widely accepted theory that we are only six degrees away from everyone in the world so I do wonder how effective this would really be in the long run.

It is still early days in these debates and experiments, but for now, Mollom can already protect your login and submission forms with an image or audio CAPTCHA.

Either way, it is an interesting discussion that makes you wonder. Where will OpenID be in 3 years? Where do you think the website spam problem will be in 3 years? How will this affect online communities?

I have my own thoughts and predictions and it was one of the principal reasons for co-founding Mollom ...

© 1999-2007 Dries Buytaert Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.
Drupal is a Registered Trademark of Dries Buytaert.