Sunday, November 9, 2008

What Is Robots Exclusion Standard

Writen by Ekta Varma

The robots exclusion standard or robots.txt protocol is a convention to prevent web spiders and other web robots from accessing all or part of a website. The information specifying the parts that should not be accessed is specified in a file called robots.txt in the top-level directory of the website.

You can allow all robots to visit all files by using the wildcard "*" specifies all robots.

For example :

User-agent: *
Disallow:

You can also keeps all robots out :

User-agent: *
Disallow: /

You can also tells a specific crawler not to enter one specific directory :

User-agent: BadBot
Disallow: /private/

But you should not use below codes as it is not a stable standard extension.:

Disallow: *

Instead you can use :

Disallow: /

HTML meta tags for robots :

HTML meta tags can be used to exclude robots according to the contents of web pages.

By using above code within the head section of an HTML document you can tell the search engines such as Google, Yahoo!, or MSN to exclude the page from its index and not to follow any links on this page for further possible indexing.

You can also use robots.txt generator for making your robots.txt file. Then open a text editor, like windows notepad, copy and paste the text from the text box in it, and save the file as robots.txt. Upload the file to your root-directory. (The same directory as your index.htm/html file is in.)

References: Wikipedia –The Free Encyclopedia and Internet.

Visit:
Halfvalue.com
[A unique shopping website]

Ekta Verma

No comments: