del.icio.us Bans Search Engine Spiders

by Colin Cochrane 2/16/2008 10:06:00 PM

It appears that within the past 2-3 days the popular social book-marking site del.icio.us has started blocking the major search engine spiders from crawling their site.  This isn't a simple robots.txt exclusion, but rather a 404 response that is now being served based on the requesting User-Agent.

While I was doing some Photoshop work for a site of mine tonight I needed to grab some custom shapes to use to make some icons.  I recalled having bookmarked a good resource for custom shapes in del.icio.us, but after searching my bookmarks using my del.icio.us add-in for Firefix, I couldn't find it, so I pulled up my browser and went to my profile page on del.icio.us to do a search.  To my surprise, I was greeted with this:

del.icio.us 404 Errors User Agent set to Googlebot

After confirming I hadn't mistyped the URL, I checked out the del.icio.us homepage and found that all was fine there.  However, upon trying to perform a search, I was confronted with the same 404 error, and received the same response when trying to navigate to any page other than the homepage. 

At this point I was thinking that there might have been some server issues going on with del.icio.us, but that didn't line up with my Firefox add-in still showing my bookmarks.  I then noticed that my User-Agent switcher add-in was active (not sending the default User-Agent header), and remembered that I had set it to switch my User-Agent to Googlebot earlier because I was checking another site earlier today to see if it was cloaking (it was). 

I reset the User-Agent switcher so it was sending my normal User-Agent header and tried accessing my del.icio.us page again and I was surprised to see that it was no longer responding with a 404 error.  Puzzled by this, I took a look at del.icio.us' robots.txt and found that it was disallowing Googlebot, Slurp, Teoma, and msnbot for the following:

Disallow: /inbox
Disallow: /subscriptions
Disallow: /network
Disallow: /search
Disallow: /post
Disallow: /login
Disallow: /rss

Seeing that the robots.txt was blocking these search engine spiders, I tried accessing del.icio.us with my User-Agent switcher set to each of the disallowed User-Agents and received the same 404 response for each one.  I thought that there might have been some obscure issue with the add-in that was leading to this behaviour, so I popped open Fiddler, a nifty HTTP debugging proxy that I use to sniff HTTP headers.  Fiddler has a convenient feature that allows you to create HTTP requests manually, so I created a simple set of request headers and made HEAD and GET requests using the different User-Agents listed in the robots.txt.  I received the same responses as before.

HEAD Request using Googlebot User-Agent

My interest was definitely piqued at this point.  I ran a site command against del.icio.us in Google restricted to the past 24 hours and found results as fresh as 15 hours old.

Recent Google Search Results for a site command ran against del.icio.us

Running a normal site command on del.icio.us revealed numerous results that Google had a cached version of, many of which were as fresh as only three days ago.

This evidence seems to be indicating that del.icio.us has recently started blocking the major search engine spiders from crawling their site, by way of the requesting User-Agent.  Given the recent crawl dates and cache dates, it looks like this started happening within the past 2-3 days.  This raises some questions as to the intentions of del.icio.us, and perhaps Yahoo!  With Yahoo! recently integrating del.icio.us bookmarks into its search results this could an attempt to enhance the effectiveness of that new feature by preventing competing search engines from indexing content from del.icio.us.  While Yahoo!'s Slurp bot is also blocked, it's unlikely that Yahoo! would need to crawl the content of one of its own sites, as Yahoo! actually owns del.icio.us.

What are your thoughts on this?

Currently rated 4.9 by 210 people

  • Currently 4.933334/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , ,

Search | Yahoo

Comments

2/17/2008 6:02:27 AM

ltdraper

Wow. Makes me wonder if it's worth the trouble of bookmarking in del.icio.us if it's only going to be indexed by Yahoo. Do they have a search engine? :}

ltdraper us

2/17/2008 8:23:15 PM

Samsara

This raises some questions as to the intentions of del.icio.us, and perhaps Yahoo! With Yahoo! recently integrating del.icio.us bookmarks into its search results this could an attempt to enhance the effectiveness of that new feature by preventing competing search engines from indexing content from del.icio.us.

Looks like that's what it is. That's annoying to have to read about it due to an accidental discovery rather than Del.icio.us or Yahoo making some announcement. But I guess that would have made them look petty.

Samsara us

2/18/2008 4:45:04 AM

pingback

Pingback from creativesuit.co.uk

Creativesuit blog » Blog Archive » del.icio.us has started blocking Google

creativesuit.co.uk

2/18/2008 6:10:14 AM

Andy

Del.icio.us are busy working on the roll out of their next big update (according to their blog, anyway) so hopefully this is a temporary measure. Some other functionality, like linkrolls, also seems to be affected.

I've always found their search engine to be pretty slow so often I'll rely on Google if I need to do lots of searches in quick succession. I'll be disappointed if that happens - and likely to switch to ma.gnolia or diig.

Andy gb

2/18/2008 7:31:49 AM

Michael VanDeMar

Colin, I commented on Sphinn but wanted to comment here as well. That robots.txt hasn't changed since Dec 24th, and only blocks the unimportant stuff.

I think Sebastian nailed it when he said that it was most likely blocked by IP, to only get people who spoofed bot user agents.

Michael VanDeMar us

2/18/2008 7:43:30 AM

pingback

Pingback from searchenginejournal.com

Yahoo Blocking Bots from Spideing Delicious Bookmarks

searchenginejournal.com

2/18/2008 8:46:03 AM

Colin Cochrane

It still raises the question of why they would go to the effort to block people who were spoofing those User-Agents.

Colin Cochrane

2/18/2008 9:19:11 AM

Michael VanDeMar

Perhaps, but a) that is a very reasonable thing to do, no reason not to block people who are spoofing, and b) it's a completely different issue than them preventing other search engines from spidering/indexing their content.

Michael VanDeMar us

2/18/2008 10:13:40 AM

Sebastian

It makes sense to block requests with spoofed crawler UA because some scrapers "emulate" crawlers. If in a week or so we can't find crawler fetches after Feb/13 that's worth further investigation.

Sebastian us

2/18/2008 10:22:53 AM

Colin Cochrane

My thoughts exactly Sebastian. It will be interesting to see if any new fetches start appearing in the next few days.

Colin Cochrane

2/18/2008 1:31:16 PM

pingback

Pingback from inanchor.com

In Anchor » SearchCap: The Day In Search, February 18, 2008

inanchor.com

2/18/2008 1:40:25 PM

pingback

Pingback from seofinance.com.au

Seofinance, seo web finance, search enigne optmization services blog » SearchCap: The Day In Search, February 18, 2008

seofinance.com.au

2/18/2008 3:01:13 PM

pingback

Pingback from metamend.com

del.icio.us Find - Metamend SEO Notes Weird Robots.txt Files » Search Engine Optimization Blog

metamend.com

2/18/2008 3:39:29 PM

pingback

Pingback from ditii.com

Yahoo! blocking other search bots from spidering Delicious bookmarks » D' Technology Weblog: Technology, Blogging, Tips, Tricks, Computer, Hardware, Software, Tutorials, Internet, Web, Gadgets, Fashion, LifeStyle, Entertainment, News and more by Deepak Gupta.

ditii.com

2/18/2008 4:34:28 PM

pingback

Pingback from grantwatson.net

links for 2008-02-19

grantwatson.net

2/19/2008 12:35:15 AM

pingback

Pingback from domainbusiness.cc

Yahoo Blocking Bots from Spidering Delicious Bookmarks | DomainBusiness

domainbusiness.cc

2/19/2008 2:21:29 PM

Dan Thies

Colin, they wouldn't necessarily be trying to block *people* who are spoofing those user agents, they'd be trying to block proxy servers that are delivering their content to real bots under the proxy's URLs. Some background:
www.seofaststart.com/blog/google-proxy-hacking

They used to "cloak" a robots meta tag - unless you were an actual validated spider, you got noindex, nofollow.

Dan Thies us

2/19/2008 2:32:46 PM

Chrisitan

Hi There Navneet,
I just wanted to add two cents of something i noticed over the weekend as well. I did some searching on Yahoo on Sunday (over the same time period) and noticed that Yahoo was inserting the number of delicious "bookmarks" into their search results. For example, when i did a search for "business blogs"...i noticed that some listings actually had the number of bookmarks in delicious by their listing. So o think its fair to say that they have been experimenting in feeding delicious bookmarks into their search results to offer more comprehensive search results....much like Google's universal search. However i noted that our site "iBlogbusiness.com" (which has several delicious bookmarks) did not have the little icon by the search results. So im not sure how they are determining who they show the bookmark icon to at the moment, but it will be interesting to see what yahoo decides to roll outSmile

Best Regards,

Christian

Chrisitan us

2/19/2008 5:54:03 PM

Colin Cochrane

Thanks for the feedback Dan. That link brought me up to speed on that angle of things.

Colin Cochrane ca

2/20/2008 3:50:24 AM

pingback

Pingback from blog.exaspring.com

..:: ExaSpring’s Blog ::.. Web Hosting, Web Designing and SEO Blog ::.. » Blog Archive » SearchCap: The Day In Search, February 18, 2008

blog.exaspring.com

2/21/2008 7:21:46 AM

pingback

Pingback from impnerd.com

Old Stuff, New Stuff, Internet Stuff

impnerd.com

2/21/2008 10:16:32 AM

pingback

Pingback from solditblogger.com

Crazy Guide to the Internet » Blog Archive » Yahoo Blocking Bots from Spidering Delicious Bookmarks

solditblogger.com

3/9/2008 5:50:17 PM

Samsara

...so does this means delicious is *allowing* the indexing of users bookmarks or not? i haven't been keeping up. a friend told me his "people who link to him" went down but i didn't bother checking his stats.

Samsara us

3/21/2008 12:33:55 AM

pingback

Pingback from kossatsch.wordpress.com

Links - Mar 20 « roxomatic links

kossatsch.wordpress.com

3/22/2008 2:22:17 AM

pingback

Pingback from lieblinks.wordpress.com

Del.icio.us bans Google « lieblinks

lieblinks.wordpress.com

Add comment


(Will show your Gravatar icon)  

  Country flag

[b][/b] - [i][/i] - [u][/u]- [quote][/quote]



Live preview

5/14/2008 11:23:07 AM

Powered by BlogEngine.NET 1.3.1.0

All Content and Intellectual Property is under Copyright Protection | Colin Cochrane ©2007

About the author

Colin Cochrane

Colin Cochrane

SEO and ASP.NET Developer.

Recent comments

Disclaimer

This is a personal weblog. The opinions expressed here represent my own and not those of my employer. © Copyright Colin Cochrane 2008

Sign in