robots.txt file for websitesRobots.txt file This is a special purpose file which is exclusively used by search engine robots. This file has to be located at the root of the site or at highest level of your site. Search engine robots follow the instruction given at this text file about your site to crawl and index pages. Note that to index pages we should have at least one link from any active page to register the presence of it in search engine. Robots.txt page is not a place to add links to different sections of the site. Site Map is best place for this purpose. We need not specify the location of robts.txt file from any location within or outside our site. If your site name is www.mysite.com then the url of your robts.txt file will be www.sitename.com/robots.txt . By default search engines first try for this page so if you don't have this page then page not found error ( 404 ) you will find in your server log against this URL. We will discuss about the purpose of this file first and then some sample code.
Purpose of robots.txt fileAll search engines obey the instructions given by the webmaster through this robots.txt file. So by using this we can communicate to engines. One purpose is to tell crawling robots not to index some page or part of the website. We can tell this to specific crawlers or to all search engine robots.
We may keep some area as archive where we can store copy of pages existing in main areas. We can prevent robots to crawl these duplicate pages by adding that path to the robots.txt file.
However it is clear that spam bots, bad bots won't respect the directive of robots.txt file as expected.
Robots.txt mostly used to tell the robots what not to index than to tell what to index.
Note that robots.txt file is a public document so any one can just open it and see its content. If you have any private URL which you don't want to expose then it is not a good idea to restrict the indexing of this page by adding it to robots.txt file. Any hacker or unauthorized uses can exploit it. Google robots.txt analysis tool To allow all robots to all dir and files
The above code will allow all agents to crawl all pages. Now let us disallow all bots to index our site.
Search Engine Optimization and google articles
Webmaster tools to Diagnose problems and crawl issues
HTTP error code and messages
Search engine friendly page design
Quality Site for higher ranking in Search engines
Checklist for a new website for future ranking
Fast loading Page
Black Hat Search Engine Optimization - SEO
rel=nofollow: Preventing comment spam
robots.txt file of a site
google trends for comparing & monitoring traffic
Searching google within a site for number of pages