Monthly Archives: February 2011

Remove site from Google

We found that Google had indexed a site that shouldn’t be indexed so I setup a robots.txt file to deny all crawlers and locked down the site with http auth. I also put in a request to have the urls removed from the index and cache.

When I did this Google returned ~2,400 results when doing a “site:www.site.com”. A few days later it was returning ~54,000. Today it is returning ~133,000.

I’m not sure how Google managed to mix up “remove my site” with “index it more”. Maybe this is just part of the removal process?

Update: Google is now up to 217,000 results for this site. Maybe removing your site from the index is good for SEO?

Deny access to website, but allow robots.txt

I had a problem where Googlebot was indexing a development site, so we locked it down using apache basic http auth. Now Googlebot was being served with a 401 when accessing the site, but because it had no stored robots.txt it was persistently trying to crawl the site.

Using the following allows anyone to access robots.txt but denies access to the rest of the site:
<Directory “/home/username/www”>
AuthUserFile /home/username/.htpasswd
AuthName “Client Access”
AuthType Basic
require valid-use

<Files “robots.txt”>
AuthType Basic
satisfy any
</Files>
</Directory>

Eventually Googlebot will get the hint and stop indexing the site and we can remove existing content using webmaster tools.