What is the Robot.txt?
For those of you who don’t know what robot.txt is; it is a protocol which is designed to limit what areas of sites can be crawled by spiders, to understand this completely you should have a rough idea of how spiders work. Using the robot.txt optimisation carries with it major benefits when used correctly. For example forums will regularly use this feature, i.e. in general chatter sections if they do not want the member’s general discussions to confuse a search engines designation of their theme. Think of it as a no follow tag for content, be it images, banners or entire pages.
How do I use robot.txt optimisation?
Well it is relatively simple piece of code, for example if you wanted to prevent the Googlebot from seeing graphics on your site the code you would place in the robot.txt file would be:
User-agent: Googlebot
Disallow: /graphics/
However most of this can be done using the Meta tags if for some reason you cannot upload a robot.txt files.
Another key use of the protocol is preventing any on site duplicate content penalties, for example if you have two pages with similar or identical content you can use the Robot.txt file to tell the spiders to ignore the content which you do not wish to be the authoritative piece. This is very helpful when you have a large site in a competitive or highly plagiarised field.
Limitations of the Robot.txt
The robot.txt file is quite limited for example a search engine may still index a file that it the robot.txt file is telling it not to crawl if another site is linking to that page. Also a more concerning issue is that the Robot.txt file only effects “well behaved” robots; it will not stop a competitor from analysing or attacking your site.
How do I create a robot.txt file?
Robot.txt files are pretty easy to make, not only are there numerous online tools to help you make them, or you can write it yourself in a text editor and upload it to the root directory.