Mollom
The Industry Standard using Drupal and Mollom
The online media industry continues to face readership and revenue challenges. They are burdened with the task of not only providing the content but gaining more user interaction in the form of reader comments. Comments by readers are beneficial to sites because they show created readership and mean more eyeballs to that particular page or article. For publishers, more eyeballs means more revenue.
The Industry Standard is a news and analysis site owned by IDG, a large publishing organization that publishes over 300 magazines in 85 countries!
The Industry Standard re-launched on Drupal in 2008 with the goal of engaging with new readers and encouraging them to contribute comments and content. They also wanted to allow readers to comment anonymously, something that most news sites do not do. The Industry Standard felt that anonymity gave readers more freedom to express their comments, and would encourage more frequent and detailed commentary while expanding traffic and tying the publication into the many other online conversations taking place around technology.
Ian Lamont, The Industry Standard's managing editor, had prior experience managing online communities, and knew that the relaunched publication would need a comment filter that could encourage quality comments while sifting out spam and trolls.
According to Lamont, having anonymous comments is hugely important to The Industry Standard. "We really believe that most people don't want to deal with the hassle of registration. Because we are relatively small, if we only had registered comments, there would be far less reader engagement on the site. As it is now, we can have dialogues with unregistered users, which is really important to building voice and an online identity."
The Industry Standard is using Mollom to help them remove the barrier to visitor participation, allowing readers to comment anonymously and eliminate spam vandalism. Since the re-launch in 2008, Mollom has blocked 800k spam messages in 539 days and blocked more than a thousand attempts a day with peaks up to several thousands a day. Cool!
Mollom status update and planned improvements
We recently wrote about the fact that the number of messages we've filtered have doubled in three months. All things considered, we're handling well over 200 million HTTP requests each month, making Mollom the largest web service I've ever helped build. Further, since each of these requests is dynamic, they're fairly expensive because we can't apply even simple caching techniques. Each request to Mollom retrieves data, invokes a parser, uses statistical classifiers, and updates reputation models, among other things.
While the response time of the service has always remained good, we've had some recent scalability issues that have affected our ability to react to the constantly changing behavior of spammers. To react well, we must constantly analyze our data and continually retrain our classifiers. We do this asynchronously, using background processes that are not part of the HTTP requests. When we started Mollom, it took ten minutes to analyze our dataset and to train a new classifier. With our current volume of data and frequency of requests, that same operation now takes at least 14 hours. Needless to say, that has affected our ability to effectively deal with spammers, and as a result, the quality of our classifiers have regressed. While that regression is only a fraction of a percentage, it is more than we would like, and if you get hammered badly like many of our users, it is noticeable. Not good.
To deal with the pains of, frankly, our unexpected success and growth, we did (or are in the process of doing) the following three things.
First, with the help of hosting company OpenMinds (these guys rock!), we upgraded one of our existing servers in Europe (for horizontal scaling), and launched our first server in the United States (for vertical scaling). Because of our large volume of data, and since our analysis is very data intensive, much of the work we do is I/O-bound. So, we've added more RAM to our servers, configured the disks in RAID-1 to mirror their contents for better read and write performance, and purchased 64GB solid state disk drives (SSD) that are providing random access times at least 150 times faster than our regular hard disks. With the extra RAM, the RAID-1 configuration, and the solid state disks, it now much faster to train a new classifier; a significant improvement making us much more agile in fighting spammers. The hardware upgrades are almost complete. Solid state disks, by the way, are seriously hot stuff.
Second, when you're processing more than 200 million HTTP requests a month, it becomes really hard to figure out what is going on, and doubly hard to determine where and why classification mistakes are being made. Simply put, Ben and myself started to feel like the characters in the story of the blind men and the elephant as we tried to figure out why some spam was slipping through. To cope, we've made important architectural changes to our backend software allowing it to learn faster and increasing our ability to debug it on the fly. We've worked on these changes for more than two months, and last weekend, we made an important breakthrough that allowed us to visualize all our data in a completely new way. We're now able to generate heat maps of our algorithms to identify the weaker areas or the areas that are currently under attack. Already, we've identified a number of areas where we will improve our algorithms to be more effective. In other words, expect Mollom's accuracy to improve over the next couple of weeks as we translate our new insights into algorithmic improvements.
Third, with the help of Damien Tournoud, we fixed an important bug in the Drupal Mollom module, while also improving its logging abilities. The bugfix should prevent incorrect CAPTCHA results from being accepted when (or if) a Mollom server is unavailable, and the improved logging makes it easier to understand specific attempts to circumvent Mollom CAPTCHAs on your site. With the new output, for example, we've already seen that some spammers have adjusted their scripts to specifically target Mollom-protected sites, and we've also learned that some caching modules cause conflicts with Mollom in some configurations. In addition, Dave Reid, our new co-maintainer for the Drupal Mollom module, has committed many smaller but no less important improvements, bugfixes and clean-ups to the Mollom module. Last night, we packaged all these changes into a new release of the Mollom module for Drupal 6. Upgrading is certainly recommended.
We believe that the combination of all these elements will significantly improve our ability to combat spam, and that they will form the platform that will carry Mollom to the next level. Stay tuned as we complete the roll-out of all our changes.
Mollom for Laconica
Laconica, billed as an open-source microblogging tool similar to Twitter or Jaiku, now has its own Mollom plugin to reduce comment and posting spam. Laconica is designed to allow people in a community, company or group to exchange short messages of 140 characters or less, over the web. The Mollom plugin for Laconica is available at http://gitorious.org/laconica-mollom-plugin/mainline/trees/master, and is written in PHP.
IIS module for Mollom
Zion Security, a Belgium-based company specializing in the security analysis of web sites and systems, has used Mollom's open API to develop a Microsoft IIS module utilizing Mollom to detect and prevent comment and posting spam.
This module is unique in that it is a HTTP module coded for Microsoft IIS, comparable to an Apache module, and allows Mollom to potentially expand to a number of ASP/IIS based systems.
The Mollom IIS module is available as a zipped file for download here and is listed on our downloads page. It checks any submitted form for spam using Mollom's spam detection analysis, and like other Mollom plugins, requires you to obtain a set of registration keys from mollom.com before it can be actively used to protect your ASP-based forms.
Because it is written as a module at the webserver layer, it may be possible to use Mollom's spam-detection and CAPTCHA challenge ability with existing web applications running on IIS (think SharePoint or DotNetNuke). It's an interesting approach and one we haven't really considered ourselves. It will be interesting to see how this develops, and if it sticks.
Hundred million spam attempts blocked
© Jamey Boje (aka graphicsguru)
At Mollom, our spam-filtering startup targeted toward eliminating comment and post spam, we've just reached two important milestones: we blocked our 100,000,000th spam message, and we're now actively protecting over 10,000 websites.
It was only about three months ago that we celebrated our 50 million message milestone, and two months before that we reached twenty-five million. These milestones are coming fast now. Will we double again in the next three months? Only time will tell.
In fact, these statistics are for our public servers only, and don't include message processing on private servers we operate on behalf of our larger clients. Mollom filters about an additional 4 million messages each day for Netlog, for instance.
All things combined, we're processing up to 150 million messages a month! Since it can take multiple HTTP requests to process a single message, we're handling well over 200 million HTTP requests per month. Each of these requests is dynamic and fairly expensive; they retrieve data and invoke a parser, statistical classifiers, reputation models, etc. As you can imagine, that isn't exactly trivial at the volumes we are now seeing. (As a reference, that is 5 times more than a site like drupal.org which serves less than 20 million dynamic pages a month.)
We're currently working on some important architectural changes to the Mollom backend to allow it to learn faster while making it easier for us to debug, analyze and oversee its actions. We're also busy upgrading our infrastructure to cope with our growth. It is a work in process, but once completed, it should allow us to focus more on improving the effectiveness of our classifiers and adding new features.
One thing is for sure though -- we're going to keep doing what we're doing, and if you're a Mollom user, we're glad to have you along for the ride.
NowPublic using Mollom
NowPublic is a Vancouver-based news network that mobilizes an army of reporters to cover events around the world. During Hurricane Katrina, NowPublic had more reporters in affected areas than most news organizations have on their entire staff.
Unfortunately, NowPublic was up against as many as 25,000 spam attempts a day, so it needed a solution that would allow the site to grow faster and more effectively without being slowed by comment spam. About one year ago, NowPublic implemented Mollom to protect their site against spam. They use Drupal, so all they needed to do was install the Mollom module for Drupal.
Two major challenges arise from trying to control website spam. First, visitors may lose their motivation to comment or contribute content because they are required so often to prove that they are human and not spam by registering. This erodes participation. Secondly, whether visitors are asked to register or not, site moderation becomes more time-consuming and expensive. Website moderators have to scan comments and other content to find spam instead of interact with the community. Mollom differs from other spam protection solutions, in that it tries to address both problems.
While Mollom is not perfect (it is a work in progress), it works really well for the vast majority of our users. In NowPublic's case, Mollom has prevented more than one million spam attempts since they started using Mollom. Plus, because Mollom removed barriers to participation, they saw an 180% increase in the average number of comments posted per month by users since implementing Mollom's spam-filtering service. Last but not least, according to Jordan Yerman, NowPublic's Contributor Support Manager, Mollom saved NowPublic at least one hour per day dealing with spam. So by the end of the first month, they saved more money than Mollom cost them for the year.
Needless to say, NowPublic is one of my favorite Mollom success stories. Now they are one year into using Mollom, it is rewarding to look back and see how well it has worked for them.
(Disclosure: I am an advisor to NowPublic.)
Jacksonville using Drupal
Jacksonville, the largest city in Florida, is using Drupal (and Mollom) at http://jacksonville.com. The Florida Times-Union is the major daily newspaper in Jacksonville and Jacksonville.com is its official website. Cool!