The Proper Way To Use The robots.txt
File Update
by Jimmy Whisenhunt
In my last article about the robots.txt file I had spelled it
wrong. It should have been robots.txt instead of robot.txt. The
article should read like this:
When optimizing your web site most webmasters don’t consider
using the robots.txt file.This is a very important file for your
site. It let the spiders and crawlers know what they can and can
not index. This is helpful in keeping them out of folders that you
do not want index like the admin or stats folder.
Here is a list of variables that you can include in a robot.txt
file and there meaning:
1) User-agent: In this field you can specify a specific robot to
describe access policy for or a “*” for all robots more
explained in example.
2) Disallow: In the field you specify the files and folders not
to include in the crawl.
3) The # is to represent comments
Here are some examples of a robots.txt file
User-agent: *
Disallow:
The above would let all spiders index all content.
Here another example
User-agent: *
Disallow: /cgi-bin/
The above would block all spiders from indexing the cgi-bin directory.
User-agent: googlebot
Disallow:
User-agent: *
Disallow: /admin.php
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /stats/
In the above example googlebot can index everything while all other
spiders can not index admin.php, cgi-bin, admin, and stats directory.
Notice that you can block single files like admin.php.
About The Author
Jimmy Whisenhunt is the webmaster at VIP Enterprises http://www.vipenterprises.org.
vipenter@vipenterprises.org
Return to
the Resources Archive
|
|