Spam, OpenID and Mollom

There is an interesting discussion about spam and OpenID going on at Matt Mullenweg's blog. The discussion was triggered by the policy decision of social bookmarking site Magnolia to restrict signups to OpenID users. According to the site, 75% of new accounts were being created at Magnolia by spammers using automated tools (our friends the 'spambots'). They say that by restricting access to OpenID users, the rate of spam-account creation decreased. In the discussion, there is a lot of talk about whether OpenID should be used to fight spam, and whether it could be an effective spam-fighting tool in the long term.

Here are my thoughts. Spammers can create OpenIDs too, and a single sign-on system might be many a spammer's wet dream. It gives them easy access to millions of sites in one fell swoop.

Now, OpenID by itself can't prevent spam. All it does is provide a globally unique identifier for any given user on the planet. This is where a tool like Mollom comes in. At Mollom we're already maintaining an internal reputation for each OpenID account we encounter while assessing submitted content. Combine an identity system (OpenID) with a reputation system (Mollom) and it becomes a lot easier to separate spam users from non-spam users. Simon Willison said it best: "a trust system requires identity first". A globally unique identifier combined with reputation tools give us a powerful weapon to fight website spam. OpenID's attribute exchange might become Mollom's best friend ...

Similarly, Tim Berners-Lee is experimenting with combining FOAF ("friend of a friend") and OpenID to fight spam: you can only comment on Tim's blog if you are no more than a certain number of degrees of friendship away from him. Of course, it is a widely accepted theory that we are only six degrees away from everyone in the world so I do wonder how effective this would really be in the long run.

It is still early days in these debates and experiments, but for now, Mollom can already protect your login and submission forms with an image or audio CAPTCHA.

Either way, it is an interesting discussion that makes you wonder. Where will OpenID be in 3 years? Where do you think the website spam problem will be in 3 years? How will this affect online communities?

I have my own thoughts and predictions and it was one of the principal reasons for co-founding Mollom ...

Comments

James Walker (not verified):

Dries - this post is *bang on*. I think, sadly, there is still a lot of misconception about what OpenID is - or more importantly what it *isn't*. But the fact is, Simon's comment - "a trust system requires identify first" - is the real point. Building the kind of trust networks and whitelisting systems that can really effectively nix spam (not to mention empower other interactions) all require identity. OpenID is a first step and a base layer to all of this.

I love that Mollom is on side. It says good things about the future of mollom, IMO. I'm excited for it :)

April 03, 2008
Nick Vidal (not verified):

Hi Dries,

you can only comment on Tim's blog if you are no more than a certain number of degrees of friendship away from him

How about making that 1-degree? That would be 100% sure to be SPAM free and the information that reaches you would be very personalized. But would this scenario limit too much the information that reaches you? Not necessarily, if the right architecture is in place.

The architecture I have in mind is ISS (Instant Syndicating Standards). I'm implementing it for Drupal using BuddyList and FeedAPI as a foundation.

It takes a while to understand how the architecture works, but it's worth studying it. I would love to hear your comments!

Best regards,
Nick Vidal

April 03, 2008
akahn (not verified):

"it is a widely accepted theory that we are only six degrees away from everyone in the world so I do wonder how effective this would really be in the long run."

This would be effective at least in stopping spam, since no legitimate reader/user on a site is 'friends' with a spambot. Real human users would only have other humans as their friends.

April 03, 2008
Matt Boehm (not verified):

Plus you could filter out the people who will inevitably add a spambot to their FOAF file as soon as the first instance of spam happens. Who let that guy in. Oh it was Billy. Bye Billy. Bye Billy's friends. It might be a little heavy handed, but if we're all related by 6 degrees, only the spam bots and those that are they're immediate friends will be left out because the second degree friends will inevitably be on someone else's FOAF file.

April 03, 2008
Kevin Fox (not verified):

Excellent post, I completely agree. OpenID (your identity) definitely needs a reputation component (among other things) to really start being useful.

In 3 years I expect OpenID will be under the hood of every website. With really nice tools and applications driving the technology people wont even know they are using OpenID, much like pop3/smtp today. My mom just gets her email...

Spammers will always be around, they have lots of time on their hands and greed is an excellent motivator for some people. Though things will definitely get more difficult for them. Same for griefers and trolls, they will always be around and it will be up to the online communities to deal with them, hopefully these tools will help them do that more efficiently.

Had not heard of Mollom before, am going to check it out.

April 03, 2008
Matt (not verified):

But the problem is there are numerous examples of very strong trust/identity systems, like the financial and credit card system, where there are still countless millions of frauds per year. Same for CAPTCHA, it's annoying to humans and has not proven a long-term barrier to spammers. Maintaining trust for an Open ID is not that much better or worse than maintaining it for an email address.

In the end you have to have behavior and content analysis for everything.

P.S. Just got - "We are sorry, but the spam filter on this site decided that your submission could be spam. Please fill in the CAPTCHA first." Is the CAPTCHA case-sensitive? I hate these things.

April 03, 2008
Dries:

I'm not sure I'm ready to compare "financial and credit card systems" with "comment and contact forms". The motivations and implications of attacking both are different.

That said, I agree that CAPTCHAs are annoying and that behavior and content analysis is preferred. However, there are plenty of situations where you can't apply behavior or content analysis (i.e. when a user creates an account or when a user resets his password because he can no longer log in).

Mollom tries to solve exactly that. We'll rely on behavior and content analysis when enough data is available and when we are confident in our decisions. We'll fall back to CAPTCHAs otherwise.

The reason you were asked to fill out a CAPTCHA when trying to comment is exactly that. The system is still learning what is spam and what isn't. After a while, you should be able to comment without having to fill out CAPTCHAs. On average, only a fraction of the commenters need to fill out a CAPTCHA, and we expect that number to drop significantly.

It seems you are saying that everything should be solved by behavior and content analysis. Statistics will tell you that is prone to mistakes and errors.

CAPTCHAs are annoying but having to sort through hundreds of messages in your comment moderation queue to look for false positives is annoying too. In fact, it is the reason Defensio was created. Contrast this with Mollom where you don't need a moderation queue at all.

April 04, 2008
Carl (not verified):

I have to agree with Matt. Captcha has been proven mostly ineffective against spam, but VERY effective at irritating users. Sure, they'll reduce spam a bit, but they will not eliminate the problem.

However, I disagree with Matt when he says that OpenID is ineffective against spam. OpenID users should -not- be whitelisted, that's for sure; OpenID accounts are easier to create than email addresses.

The beauty of OpenID is that it gives us some useful information we wouldn't otherwise get. It's true that an anti-spam system can't make decisions just based on OpenID, but the additional info can surely tip it one way or another. It has been proven very effective for us at Defensio.

Btw, I think Magnolia's move is totally wrong. It won't solve their problem, and it might just make it worse in the long run.

PS: Me too, captcha. arg.

April 03, 2008
Chris Messina (not verified):

Hey Carl,

I might suggest you read Larry's post on the subject, as he'll tell you that they didn't make this change *only* to fight spam... it's just a fortunate benefit in the meantime until OpenID takes off more widely:

http://ma.gnolia.com/blog/2008/04/03/on-our-new-front-doors

April 04, 2008
Khürt Williams (not verified):

While I think that spammers would have a hard time using OpenID to create significant amounts of spam, I do think that adding reputation to OpenID is a great idea.

If I might use an anology. In a town hall meeting one may have one or two people, hecklers, who are creating a lot of noise and disturbing the preceedings. We know who they are and they can and when they become a nuisance we can remove them from the building. If we knew about their reputation for causing trouble we would not have allowed them into the building in the first place.

Does that make sense?

April 04, 2008
Carl (not verified):

Chris: I really thought the only idea behind it was to combat spam. So I guess I was wrong in my statement. It's still an interesting move on their part, hope it works out for them. They do great things.

April 04, 2008
Dave (not verified):

i'm not so sure that i agree with any of this, though it all sounds wonderful.

in the real world, this analog physical thing that we deal with each day, when one opens up a storefront he or she allows for any person to enter - and while we reserve the right to refuse service to anybody, it is often not the 'terrifying looking' stranger who causes problems, it's the seemingly innocuous dude who accosts strangers or does something far worse...

and so how is the internet any different? sadly, people have come to believe that if they produce a content site (blog, drupal install or other), they reserve the right to implement draconian measures to keep out unwanted comments and contributions...

but that means different things to different people. if i'm on this very site right now hawking vitamins and mlm schemes, you'll catch my links and bar entry - just as you would stop me at the door of your store if i were carrying a ballpark vendor's tray around my neck trying to sell peanuts to your patrons...

...but what if i turn into that other kind of visitor? fuck this and that, hate slurs and so on, then where do you stand? will you one day move beyond mollom to implement "acceptable standards" for content contributions as yahoo and others have tried to do for years?

you're fighting an uphill battle. a mass ip registry with ip range blocsk is likely to be most effective long term option, with a centralized store - similar to how society aggregates data about known offenders and makes such information available to a local community as well as enforcement agencies...but another captcha riddle-me-this approach? it will be reverse engineered and duped in no time - and because you're doing it in the open source community, it will be that much easier to find a back door...

either allow comments from only people you know (moderate all) or shut them off. if you're after the 'everybody contributes' model of every site out there, then think about more traditional approaches - eg wordpress' option to only allow a commenter to post if he/she has had a previously approved comment. that allows you to keep it all turned off, shut down and still for anybody to participate...

there's no easy answer. spam fighting tools are wonderful...but there's a reason why there are sooooo many, and there's a reason why there's another new one every month or two, with bold new claims, only to discover within months that it has been fooled and figured out...

April 06, 2008
sepeck (not verified):

What kind of an analogy is this?

and so how is the internet any different? sadly, people have come to believe that if they produce a content site (blog, drupal install or other), they reserve the right to implement draconian measures to keep out unwanted comments and contributions...

If I pay for a board and paint and make a sign and post it on my property, then I absolutely have a right to prevent others from defacing that property that I paid for and maintain. In local life those include the ability to contact law enforcement and have them deal appropriately with vandals and other undesirable elements and I can paint over their damage, destruction to my hearts content and obtain reimbursement for that damage in a court of law in many countries.

On the Internet, I have the right to determine what content is on my property and enforce the standards I find acceptable, whether they are consistent and published or whimsical and arbitrary.

Just because you can read something on someone's website doesn't give you the rights to do as you will on someone else's property.

April 12, 2008
Richard Marr (not verified):

Dries, and all,

I don't know if you guys are familiar with Doc Searls' VRM concept, but I think it could provide a compelling trust proposition to add to OpenId... specifically a proposition where people have control over their own data.

I've been thinking about it a lot over the past couple of weeks and it seems to me like a workable solution (in terms of privacy, use cases, etc), although I could easily have missed something. If you or your readers could give me some feedback I'd appreciate it.

Article here:
http://richmarr.wordpress.com/2008/03/31/can-vrm-answer-the-openid-trus…

Rich

April 07, 2008
Chris (not verified):

You guys are doing some interesting stuff with Mollom and I am keeping a close eye on how it advances.

One thing you mention in your post, is the image and audio CAPTCHA. One of the neatest modules I've seen for Drupal is the CAPTCHA one -- specifically how it allows you to completely customize the way it operates (I especially like how you could set the form to display a simple equation and then once the user has successfully entered the correct answer, they are no longer bothered with CAPTCHAs).

In my experience, CAPTCHA is a necessary evil (hindering the user experience) and Mollom looks like it could be the answer to fighting spam -- but it would be great to see it implemented in a way where its stupidly simple to use and has the same kind of control as Drupal's CAPTCHA module.

Just my 5 cents :)

Cheers,
Chris

April 11, 2008
Mike (not verified):

I learned about openID on Monday, tried to read the documentation no Tuesday, found some applications on Wed, discovered they wouldn't work on my server, and moded them on Thu-Fri,

Worked out on Sat, that it was largely a waste of time because its not going to stop spam ... if anything, providing a single automated access point, is going to increase spam!

Really, I still can't believe someone created a system with such a gaping hole in the concept.

The user is authenticated by having their key checked by the OpenID key server that created it ... but there is no way in the standard to authenticate the key provider. Unless I'd installed both a key server and lock without any need to authenticate either with an external site ... and used both on apparently good working sites.

It's the same as saying: "we are setting up an open-money system, whereby any shop can validate any money just by contacting the photocopy shop where it was produced".

... but then again, many shops produce gift vouchers!

The only way I can see me using openID on my site is to specify the key suppliers that I trust ... and will micro$oft will not be one of them?

September 19, 2009

Updates from Dries straight to your mailbox