jump to navigation

Robots.txt, mediawiki and Google Sitemap October 13, 2005

Posted by Andy Roberts in : learning, internet , trackback

I used to have my ukcider mediawiki excluded from most search engines through a robots.txt file which looked like this:

User-agent: *
Disallow: /wiki/

but then I decided I’d like to have another go at allowing the Googlebot to index some of the really useful content which has been building up there recently, so I removed the robots.txt file for a few days and monitored carefully.

What appears to be happening is that the googlebot visits about once per day and spiders a little further down into the Wiki each day, but using up an ever increasing amount of bandwidth as it does so - not good. So the list of french cider producers can already be searched for, but the Asturian Campsites - not as yet.

My own webstats and research told me that Googlebot can get caught up in a wiki site, spidering all of the previous versions, page history, user contributions and so on, and if you are paying for the remote hosting then this needs to be avoided. So rather than disallow /wiki/ I’ve disallowed “oldid” and “contributions” for now, and maybe I’ll tweak it a bit later or go fishing for the definitive mediawiki (not pretty URLs) robots.txt configuration. Meanwhile in my travels, I came across a reference to Google sitemaps which should allow me to tame the over eager googlebot some more. I’ve included data to the effect that the site is updated weekly, which should help towards my goal of having deep-linked pages listed on search results without having all the bandwidth used up by spiders.

Googlebot is not the only search engine spider, there are many others ( such as the enigmatically named “inktomi slurp” it’s just that the Gb is probably the most important and also the most resource consuming.

RSS feed

Comments »

No comments yet.

Name (required)
E-mail (required - never shown publicly)
URI
Subscribe to comments via email
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong> in your comment.

Related Posts from the Past:



  • Main categories

  •  

  • Popular Posts