jump to navigation

Wiki Spam December 13, 2005

Posted by Andy Roberts in : Wiki , trackback

Wiki spam seems to be a problem with parallels to wars of attrition such as in virus/anti-virus. Big projects like wikipedia can keep the problem under control through sheer weight of numbers of contributors, admins and developers. Obscure personal or small group wikis can survive by escaping the attention of the worst spam bots, but there is a middle range where a regular spam attack can cause some people to give up and abandon their wiki creation attempt.

Even Ward Cunningham ( the first wiki creator ) became dispirited:

Less than a year ago, my site, which had been growing exponentially, got to the point where things were happening on the site, spam. Where a dedicated set of volunteers had been defending the site, the posting of spam links put a load on them that wore them out.

posted by Ross Mayfield on Corante

Last summer a sustained spambot attack on the ukcider wiki while I was away caused some contributors to question whether an alternative method should be found. Some were in favour of enforced logins, moderated editing or other such restrictions but in the end a technical solution was found by means of blocking the range of IPs operated by the one particularly destructive bot. I argued that disallowing anonymous edits would mean the loss of some valuable passer-by information and that the bots would simply develop a capability to create logins, as is now starting to happen.

Another ploy they use is to use a ‘div’ command to set the size of their inserts to one pixel, thus creating “invisible spam” presumably in the hope that it won’t be noticed and reverted. But it still shows up in Recent Changes and doesn’t do their search ranking any good anyway since the links are automatically set to “nofollow” as default by the mediawiki software.

I can only guess that somebody somewhere manages to get paid for making it look as if they have publicised a load of sites even where it’s completely ineffective.

The other new development is the ability to switch IP addresses, not just within a range, but almost at random. That makes IP blocking pointless.

So the next step from the anti-spam side will either be to create obstacles to posting, such as requiring login and then using a captcha test to tell humans and robots apart, or to use a common blacklist of known spammers links.

Technorati Tags: , , , ,

RSS feed

8 Comments

Comment by Caspar
2005-12-14 01:19:24

Damn those spammers! These past few days I’ve seen an influx of them and I’m not even mid-range size-wise. I even had a ’smart’ one registering first.

Somebody I know has made his wiki ‘registered-only’ to make it more maintainance free. That’s one of the options I’m thinking of, if this lasts. Personally I don’t see that as an big obstacle as for most of today’s services you’ll have to register first, combined with e-mail verification. Think about fora, google groups, etc. This hasn’t hindered ‘anonymous’, ‘cloned’ or ‘cloaked’ contributions though.

Besides obligatory registration there must be other ways of counter-attacking the automated spambots. Captcha being one.

 
Comment by Andy
2005-12-14 10:28:16

Wikipedia has gone registered users only for creating new pages now, but that’s because of vexatious complainants, not spammers. I’m going to resist that route, and avoid captcha as well if possible, because I wish to present as little obstacle as possible to the spontaneously motivated passer-by contributor.
So I think the blacklist of spammers links is the way to go, even though it will rapidly grow quite large (as in Movable Type) I just hope there isn’t a reponse time penalty.

http://meta.wikimedia.org/wiki/SpamBlacklist_extension

Also worth considering:

http://meta.wikimedia.org/wiki/Proxy_blocking

 
Comment by Frankie Roberto
2005-12-14 12:31:23

The wiki I run has had sustained spam attacks too. Although this trick of adding the spam in an invisible means that most of the spam is ‘non-destructive’, as casual surfers don’t see the spam, and the rel=”nofollow” means that the spammers don’t get any google-credit for it. It does make the Recent Changes harder to read and follow though.

I’m not a big fan of captchas, but I’m considering limiting new page creation to registered users, as reverting this spam (by deleting the page) can only be done by sysadmins…

Another wiki I know has gone into complete lock-down mode by only allowing registered users to edit, and with registrations requiring a forum posting history and human intervention. For this particular community it’s not too damaging as it’s quite a narrow interest topic anyway, but it’s a shame to have to go down this route…

Frankie

 
Comment by Frankie Roberto
2005-12-14 12:39:11

P.S Can you document how to block an IP range?

 
Comment by Andy
2005-12-14 13:02:32

About Recent Changes, it may be worth deleting the revert and delete log entries from the recent changes but I haven’t done that since the database schema changed with version 1.5.

I documented my own sucess with IP range blocking here:

http://blog.ultralab.net/~blogger/andy/archives/001538.html

The extent of the range is defined as binary places which you need to be careful about, see

http://meta.wikimedia.org/wiki/Range_blocks

 
Comment by Caspar
2005-12-15 02:00:48

Currently I’m going down the ‘Bad Behavior’ route as found on http://meta.wikimedia.org/wiki/Spam
It supposedly traps automated spambots.

I have no clue as to it’s effectiveness, because I just installed it. But I’ll let you know if it works. I’ll be keeping an eye on my ’special stats’ these upcoming days… ;-) And keep on cleaning up afterwards. :-(
@Frankie: Blocking IP-ranges is like shooting with hail. You’re punishing an entire block, because one (zombie)client or two behaved badly. IMHO it’s not a good definitive solution. With IP-spoofing, the perpetrator maybe on an entirely different IP-block than the one you are blocking.
As for letting the ‘invisible’ spam stay. Although the spammer isn’t credited for his actions on Google, I think MSN and Yahoo! are still ignoring the rel=”nofollow”. Besides that, Google will index the spam keywords. -> Try a search on your own domain. -> Leads to more spam, because vulnerable wiki’s are harvested from Google. (A couple of days ago I saw a referrer from Google with specific medications leading to my wiki.)

 
Comment by Andy
2005-12-15 11:10:59

‘Bad Behaviour’ looks like it might tip the balance in your favour for quite a while. I hope so.

For a quick fix, I’ve gone for adding a regular expression match for the invisibles into localsettings.php
$wgSpamRegex=”/overflow:\s*auto/”;

A tip I read on the mediawiki-l mailing list.
It worked overnight, anyway.

 
Comment by Caspar
2005-12-16 01:08:50

First day impression:
No wiki-spam, no blog-spam and no referrer-spam to be taken care of. The latter being a surprise. All attempts have been logged and whisked off with a ‘403′.
Will it pay off in the long run? Can’t say. And I can’t say if blocked legitimate traffic as well.
So far, so good.

@Andy: I’m keeping that piece of spam regex in mind.

 

Sorry, the comment form is closed at this time.

  • Main categories

  •  

  • Popular Posts