The Importance of Your robots.txt file to your Website's Health

Search Engine Optimization requires that your focus your efforts on two parts of your website: the on-page optimization of phrases and how your pages look to the search engines. There are two documents critical to the search engines, that help determine how they view your site. The first is your sitemap.xml file and the second is your robots.txt file.

What is the robots.txt file?

Your sitemap.xml file tells the search engines all the pages your site contains, which pages have recently been updated, and the order of importance of your pages. By comparison, your robots.txt file tells search engines what they should not be looking at, from individual pages, to whole folders and directories on your website. You can also allow or disallow certain robots, put most commonly the disallows are relevant to all robots.

It is recommended that you avoid, if at all possible, naming specific pages in your robots.txt file, and stick to folder names. Putting individual page names in the file can be a security issue.

Where do you put your robots.txt file?

Your robots.txt file should be placed in your root folder. In other words, the search engine spiders are going to look for your robots.txt file in the following URL example:

http://www.yourwebsite.com/robots.txt

What folders should you disallow?

The most commonly disallowed folders and directories are those that are either protected, or would cause duplicate issues or other content issues These include print and email versions of your pages, as well as functional folders like a cgi-bin folder. A common robots.txt file looks like this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /print/
Disallow: /email/
Disallow: /users/

Another highly recommended component to your robots.txt file is a link to your sitemap. It is simply one more line of text and would look like this:

Sitemap: http://www.yourwebsite.com/sitemap.xml

How do your know if your robots.txt file is written correctly?

One of the most tragic things you can do is write your robots.txt file incorrectly. With one wrong character, you can tell search engines to ignore your entire site (I’ve seen it happen). In order to avoid this you might want to run your robots.txt file through a checker. There are many out there, but most only check for syntax, and won’t warn you if you are accidentally excluding big chunks of your site. Google’s Webmaster Tools will allow you to test your robots.txt file and show you how many URLs are blocked by it. (you do need to have your site connected to Webmaster Tools to use, however). To test a site’s robots.txt file with Google’s Webmaster Tools, you need to do the following:

1.   On the Webmaster Tools Home page, click the site you want.
2.   Under Health, click Blocked URLs.
3.   If it’s not already selected, click the Test robots.txt tab.
4.   Copy the content of your robots.txt file, and paste it into the first box.
5.   In the URLs box, list the site to test against.
6.   In the User-agents list, select the user-agents you want.
Any changes you make in this tool will not be saved. To save any changes, you’ll need to copy the contents and paste them into your robots.txt file.
Your website can benefit from a well-crafted robots.txt file. Take the time to create one, test it, and load it into your website’s root folder.

The Importance of Your robots.txt file to your Website’s Health

What is the robots.txt file?

Where do you put your robots.txt file?

What folders should you disallow?

How do your know if your robots.txt file is written correctly?

emagine