Colin Cochrane

Colin Cochrane is a Software Developer based in Victoria, BC specializing in C#, PowerShell, Web Development and DevOps. Bans Search Engine Spiders

It appears that within the past 2-3 days the popular social book-marking site has started blocking the major search engine spiders from crawling their site.  This isn't a simple robots.txt exclusion, but rather a 404 response that is now being served based on the requesting User-Agent.

While I was doing some Photoshop work for a site of mine tonight I needed to grab some custom shapes to use to make some icons.  I recalled having bookmarked a good resource for custom shapes in, but after searching my bookmarks using my add-in for Firefix, I couldn't find it, so I pulled up my browser and went to my profile page on to do a search.  To my surprise, I was greeted with this: 404 Errors User Agent set to Googlebot

After confirming I hadn't mistyped the URL, I checked out the homepage and found that all was fine there.  However, upon trying to perform a search, I was confronted with the same 404 error, and received the same response when trying to navigate to any page other than the homepage. 

At this point I was thinking that there might have been some server issues going on with, but that didn't line up with my Firefox add-in still showing my bookmarks.  I then noticed that my User-Agent switcher add-in was active (not sending the default User-Agent header), and remembered that I had set it to switch my User-Agent to Googlebot earlier because I was checking another site earlier today to see if it was cloaking (it was). 

I reset the User-Agent switcher so it was sending my normal User-Agent header and tried accessing my page again and I was surprised to see that it was no longer responding with a 404 error.  Puzzled by this, I took a look at' robots.txt and found that it was disallowing Googlebot, Slurp, Teoma, and msnbot for the following:

Disallow: /inbox
Disallow: /subscriptions
Disallow: /network
Disallow: /search
Disallow: /post
Disallow: /login
Disallow: /rss

Seeing that the robots.txt was blocking these search engine spiders, I tried accessing with my User-Agent switcher set to each of the disallowed User-Agents and received the same 404 response for each one.  I thought that there might have been some obscure issue with the add-in that was leading to this behaviour, so I popped open Fiddler, a nifty HTTP debugging proxy that I use to sniff HTTP headers.  Fiddler has a convenient feature that allows you to create HTTP requests manually, so I created a simple set of request headers and made HEAD and GET requests using the different User-Agents listed in the robots.txt.  I received the same responses as before.

HEAD Request using Googlebot User-Agent

My interest was definitely piqued at this point.  I ran a site command against in Google restricted to the past 24 hours and found results as fresh as 15 hours old.

Recent Google Search Results for a site command ran against

Running a normal site command on revealed numerous results that Google had a cached version of, many of which were as fresh as only three days ago.

This evidence seems to be indicating that has recently started blocking the major search engine spiders from crawling their site, by way of the requesting User-Agent.  Given the recent crawl dates and cache dates, it looks like this started happening within the past 2-3 days.  This raises some questions as to the intentions of, and perhaps Yahoo!  With Yahoo! recently integrating bookmarks into its search results this could an attempt to enhance the effectiveness of that new feature by preventing competing search engines from indexing content from  While Yahoo!'s Slurp bot is also blocked, it's unlikely that Yahoo! would need to crawl the content of one of its own sites, as Yahoo! actually owns

What are your thoughts on this?