Bring Us Your Toughest (SEO) Search Engine Optimization Challenges
« New Google Algorithms ExplainedAdvanced Use of Robots.txt »

Advanced Use of Robots.txt

Permalink 12/22/05 13:04, by darren, Categories: SEO

Link: http://www.kvcindia.com/blog

Advanced Robots.txt Commands and Features

While the basic commands that make up a robots.txt file are two types of information, there are some commands and features that can be used. I should let you know, however, that not all search engine spiders understand these commands. It’s important to know which ones do and which do not.

Crawl Delay

Some robots have been known to crawl web pages at lightening speeds, forcing web servers to ban ip addresses from the robots, or disallowing them to crawl the websites. Some web servers have automatic flood triggers implemented, with automatic ip-banning software in place. If a search engine spider crawls too quickly, it can trigger these ip-bans, blocking the subsequent crawling activities of the search engine. While some of these robots would do well with a ban, there are others more likely that you do not wish banned.

Instead of the following example, which subsequently bans the robot from crawling any of your pages, another solution was offered to this problem. The crawl delay command.

User-agent: MSNbot
Disallow: /

MSNbot was probably the most notorious offender. In an SEO forum, “msndude” gave some insight into this: “With regards to aggressiveness of the crawl: we are definitely learning and improving. We take politeness very seriously and we work hard to make sure that we are fixing issues as they come up… I also want to make folks aware of a feature that MSNbot supports…what we call a crawl delay. Basically it allows you to specify via robots.txt an amount of time (in seconds) that MSNbot should wait before retrieving another page from that host. The syntax in your robots.txt file would look something like:

User-Agent: MSNbot
Crawl-Delay: 20

“This instructs MSNbot to wait 20 seconds before retrieving another page from that host. If you think that MSNbot is being a bit aggressive this is a way to have it slow down on your host while still making sure that your pages are indexed.”

Other search engine spiders that support this command are Slurp, Ocelli, Teoma/AskJeeves, Spiderline and many others. Googlebot does not officially support this command, however it is usually fairly well-mannered and doesn’t need it. If you are not sure which robots understand this command, a simple question presented to the search engine’s support team could easily help you with this. There is a good list of search engine robots at RobotsTxt.org with contact information if you are unsure how to reach them. It’s not always easy to know which website the robot belongs to. You may not know, for example, that Slurp belongs to Yahoo, or that Scooter belonged to AltaVista.

March 2010
Mon Tue Wed Thu Fri Sat Sun
 << <   > >>
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        
Our SEO program includes a comprehensive assessment of your website's internal link structure, link popularity, HTML code validation, META tags, and keyword themed SEO copywriting.

Search

The requested Blog doesn't exist any more!

XML Feeds

powered by b2evolution free blog software
Zinn : Transforming the quality of life on Facebook
Web Services on Facebook

©2010 by Chetan Sharma