Robot Exclusion Protocol

The robots.txt convention is a file on your site which allows you to specify pages and content on your website that you would prefer search engine web-crawlers not to index. This of course requires the cooperation of the web-crawler. Google, Bing, Yahoo and most major search engines observe the exclusion protocol.

It stands in contrast to a domain’s Sitemap, which could be termed, the Inclusion Protocol. Now that you know what it is, you may wonder what the benefit is in having content on your site that you don’t want indexed.

Control Over Website Image

If there are parts of your website that are exhibiting errors recently or perhaps reflect poorly on your domain, excluding them is better than presenting poor quality content. An example of such content could be some broken flash animations or a page on your website that is dedicated solely to responding to customer complaints.

This isn’t to say you don’t want people finding the page on your domain, just that you don’t want included as one of the primary functions of the website.

Offer Diverse Content While Stopping Theme Bleeding

If you have content on your website that you believe is contributing value to visitors but that you think web-crawlers may consider to be inconsistent with the thematic elements of the website (theme bleeding), robots exclusion will let you keep that content, while still optimising for rankings.

Make sure that the content isn’t too valuable though, as you may be missing out on some effective keyword opportunities just because you’re afraid indexing it may affect your rankings negatively.

Stop Rich Content Theft

This is an important factor for websites that display pictures and videos that have inherent proprietary value or are copyrighted.

You can have a function on your website that stops people from saving these pieces of content. Some websites just stop you from right-clicking (an easily circumvented tactic), while others apply watermarks so its owner is permanently identified. But if they’re indexed then users can save them from the search engine page.

Note: With enough expertise anyone will be able to steal the image, but this may deter opportunists or people who are less au fait with technology.

Dangers of Robots.txt File

• The exclusion standard is purely advisory. Meaning the content still might get indexed.
• If there are too many pages in the convention, search engine may suspect you of cloaking.
• It can be an indicator to malicious bots of what exactly to strip from your site.