#robots.txt - This file is retrieved by search engines before they crawl a #site. It tells them what URLs they should not index # See http://www.robotxt.org #Last modified: 08/25/2003 by John Legato #validated with with robots.txt validator at http://www.searchengineworld.com/cgi-bin/robotcheck.cgi # The /files/ tree came from what used to be our anonymous ftp trees # anonymous ftp contains its own passwd file. This passwd file was # being retrieved by search engines as well as people who searched for passwd # and were led to our site. This set off CIT's alarms. Excluding the # /files tree solves the problem. User-agent: * Disallow: /file # /cddb/search.php is the cddb search engine when called with no arguments # by a search engine crawling the site it generates error messages. Blocking # search engines avoid this. Disallow: /cdd # The LKEM Proteomics Database should not be crawled. An .htaccess file is # in place which will block this but additional protection can't hurt Disallow: /lke # Rules to block crawling of database admin pages. # MRB Publications Database Admin functions - Should not be crawled. Disallow: /pubs/admi # rpdb Database Admin functions - Should not be crawled. Disallow: /rpdb-new/maintenan