This article is based upon my search for web crawler and standards to exclude them from domains.
What is a web crawler?
Web crawlers are automated computer program which searchs the world wide web based upon some alogrithm. to be more precise it is a automated robot which searches WWW and index it in its lists which when will display when user searches based upon the term.
So whats the big deal.....Well most of websites are designed to be viewable by all, But then few of them want to be protective of their content and images ( Image search).
For example I have a portal which shows right managed photographs and i wish not to show these images on any search engine results.... Here you go web crawlers can become villains in this scenario.
So is there a way where I can block web crawlers from searching my website...Thank God the answer is yes.......
Robots.txt
Most of the bots(web crawlers or robots) honor the robots.txt file. Once the web crawler comes to search your domain then it first searchs for robots.txt and based upon the acess details and restrictions it will search.
How to write a robots.txt file
The contents of robots.txt are basically two syntax
2. Disallow:
User agent: It is the name of the search engine web crawler name. Ex: google crawler is known as googlebot, alta vista has scooter and so on... entire list can be seen @ http://www.robotstxt.org/db.html.
Disallow: Is a statement which says allow or disallow of the files in the path mentioned.
Examples:
User-agent: *
Disallow:
This will allow all the webcrawlers to access all the files in the domain. * means all the robots of search engine and empty content after disallow means it is allowed.
User-agent: *
Disallow:/
This will restrict all the files from web crawler , forward slash is for disallow.
User-agent: *
Disallow:/images/
This will restrict all the files in images folder for all the web crawlers.
User-agent: Googlebot-Image
Disallow:/images/
This will restrict all the files in images folder for all the googlebot-image web crawler.

No comments:
Post a Comment