Home > Uncategorized > Deny access to website, but allow robots.txt

Deny access to website, but allow robots.txt

February 15th, 2011 Leave a comment Go to comments

I had a problem where Googlebot was indexing a development site, so we locked it down using apache basic http auth. Now Googlebot was being served with a 401 when accessing the site, but because it had no stored robots.txt it was persistently trying to crawl the site.

Using the following allows anyone to access robots.txt but denies access to the rest of the site:
<Directory “/home/username/www”>
AuthUserFile /home/username/.htpasswd
AuthName “Client Access”
AuthType Basic
require valid-use

<Files “robots.txt”>
AuthType Basic
satisfy any

Eventually Googlebot will get the hint and stop indexing the site and we can remove existing content using webmaster tools.

Categories: Uncategorized Tags: , , ,
  1. No comments yet.
  1. No trackbacks yet.