SEO Tips

Robots.txt

Robots.txt - SEO Tips

Robots.txt file can be normally found in the root directory of a website. The file is used to inform search engine spiders what directories they can and can not go into. There may be many reasons why you would not want a spider to crawl and index pages or a directory, you may have private images you do not want to share, you may want to disalow spiders from the CGI bin if it holds confidential information or special scripts.

The file itself is a simple plain text file, with one difference it should be created in Unix line render mode, using notepad or a HTML editor will not be good enough. There are may good text editors that can do this task for you, you may be able to get an FTP client that can convert the file into the correct format.

To disallow a spider from crawling a part of the site it requires two lines. One for the User Agent and the second to specify the file or directory it should not index

User-agent: *
Disallow: /images/

This one stops all spiders from indexing the image folder. It stops all spiders because the wildcard feature has been used *.

User-agent: Googlebot
Disallow: /cgi-bin/

This has stopped Googlebot from indexing the CGI bin.

User-agent: *
Disallow: /search_engines/
Disallow: images.html

This one has stopped all spiders from crawling the search_engines and the images html page.

There has been talk of an allow line in the robots file, this has yet to be made a standard and it is unclear if any of the search engines recognise it.If you decide not to use a file, you can upload a blank robots.txt page, this can be used to cut down on the number of 404 error pages that are served.