Deny access to website, but allow robots.txt
I had a problem where Googlebot was indexing a development site, so we locked it down using apache basic http auth. Now Googlebot was being served with a 401 when accessing the site, but because it had no stored robots.txt it was persistently trying to crawl the site.
Using the following allows anyone to access robots.txt but denies access to the rest of the site:
AuthName “Client Access”
Eventually Googlebot will get the hint and stop indexing the site and we can remove existing content using webmaster tools.