SQL PHP HTML ASP JavaScript articles and free scripts to download
Articles

robots.txt file for websites

Robots.txt file This is a special purpose file which is exclusively used by search engine robots. This file has to be located at the root of the site or at highest level of your site. Search engine robots follow the instruction given at this text file about your site to crawl and index pages. Note that to index pages we should have at least one link from any active page to register the presence of it in search engine. Robots.txt page is not a place to add links to different sections of the site. Site Map is best place for this purpose. We need not specify the location of robts.txt file from any location within or outside our site. If your site name is www.mysite.com then the url of your robts.txt file will be www.sitename.com/robots.txt . By default search engines first try for this page so if you don't have this page then page not found error ( 404 ) you will find in your server log against this URL. We will discuss about the purpose of this file first and then some sample code.

Purpose of robots.txt file

All search engines obey the instructions given by the webmaster through this robots.txt file. So by using this we can communicate to engines. One purpose is to tell crawling robots not to index some page or part of the website. We can tell this to specific crawlers or to all search engine robots.

We may keep some area as archive where we can store copy of pages existing in main areas. We can prevent robots to crawl these duplicate pages by adding that path to the robots.txt file.

However it is clear that spam bots, bad bots won't respect the directive of robots.txt file as expected.

Robots.txt mostly used to tell the robots what not to index than to tell what to index.

Note that robots.txt file is a public document so any one can just open it and see its content. If you have any private URL which you don't want to expose then it is not a good idea to restrict the indexing of this page by adding it to robots.txt file. Any hacker or unauthorized uses can exploit it. Google robots.txt analysis tool To allow all robots to all dir and files

User-Agent: *
Allow: /

The above code will allow all agents to crawl all pages. Now let us disallow all bots to index our site.

User-Agent: *
Disallow: /

Ok now let us try to tell all bots not to index one directory ( name of the directory is restrict-dir )

User-Agent: *
Allow: /
Disallow: /restrict-dir/

In the above code we have told not to index or crawl restrict-dir directory.
We can block or allow specific user agent also. Now let us all google bot to all pages.

User-Agent: Googlebot
Disallow: /

Now let us try to disallow google bot to one directory only.

User-Agent: Googlebot
Disallow: /restrict-dir/
Allow: /

From here you can easily understand how to use robotos.txt file. You can use many robots.txt generators available on the internet. Google has one robots.txt file generator inside its webmaster tools, but you must have one google account to use this. Google can analyze and tell you want is wrong in your robots.txt file inside its webmaster tools. If you have accidentally blocked some part of your site then here you can come to know about this.

Sitemap & robots.txt file

You can add the url of your sitemap to the robots.txt file. This will help engines to pick up your sitemap file. Add this line at the end of your robots.txt file

Sitemap: http://www.example.com/sitemap.xml

Try to locate the robots.txt file of this site and see the text inside it.

Number of User Comments : 2

Search Engine Optimization and google articles
abdirashid jeeni11-07-2009
Huh this is very important article thanks all da time
Deepak06-06-2010
Good Information. My blog stopped to coming in search engines. I think this information will help me. Thanks
Post Comment This is for short comments only. Use the forum for more discussions.
Name
Email( not to be displayed)Privacy Policy
1+2=This is to prevent automatic submission by spammers. Please enter the result of the sum as asked



Join Our Email List
Email:  
For Email Newsletters you can trust
HTML . MySQL. PHP. JavaScript. ASP. Photoshop. Articles. FORUM Contact us

©2000-2014 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer