A Blog from Embitel Technologies (I) Pvt Ltd

Smart e-Commerce: e-Marketing Information

What is the robots.txt file?

Posted by Nagaraju on November 25, 2009 in SEO with No Comments



Web robots – often referred to as crawlers, bots, or spiders – are software programs that constantly travel the web, indexing the information found on millions and millions of websites every single day. Some sites, however, don’t wish to be indexed in search engines or accessed by these Web Robots. Now that you know what a Web Robot is and what it does, it’s important you know what can be done to limit their access to your site if you so desire. There may be a number of reasons for wanting to prevent bot access to a website page or specific directory. The most common reasons are related to security, privacy and duplicate content.

The Robots Exclusion Protocol, more commonly referred to as a /robots.txt file, provides webmasters with the ability to provide instructions on indexing the site to bots. The file, which must reside in the domain’s root directory, serves to limit the bots’ access to files within that domain’s root directory. There are often a large number of pages that make up an entire site, but many of those pages – like registration, login, 404 error, privacy policy and order confirmation pages – should not be indexed by search engines. The /robots.txt file also comes in particularly handy for webmasters with a wide network of sites with identical privacy policies, terms and conditions or e-commerce sites that have checkout pages, shopping carts, etc.

 

Addressing Duplicate Content with /robots.txt

The /robots.txt file can also help to eliminate duplicate content issues that arise with blogging software, such as WordPress. With WordPress – and all blogging software, for that matter – content from blog posts is published on the post URL itself, but copies of that content are also published on category pages, as well as tag and author archives. This inadvertently creates several pages of duplicate content. Since duplicate content can have a negative impact on a site’s ranking in the organic search results, the /robots.txt file can help to reduce the potential for duplicate content that can adversely affect the site’s search marketing strategy.

 

Understanding How to Use /robots.txt

In order to function properly, the /robots.txt file should be accessible at http://www.domain.com/robots.txt and reside in the domain’s root directory. The file itself should be created as a plain text document. Do NOT use Microsoft Word or another word processing program – the standard Notepad program that is installed with Windows or SimpleText/TextEdit with the Mac OS work best. The file name must be robots.txt and uploaded directly to the domain’s root directory. The commands within the file itself can be as simple or complex as your needs demand.

The standard, generic /robots.txt file – one that does not limit access to any of the information in your domain’s root directory – would be formatted like this:

User-agent: *
Disallow:

In order to block bot access to the domain’s root directory completely requires adding only one character to the standard or generic /robots.txt file and would look like this:

User-agent: *
Disallow: /

What if you want to limit bot access only to certain subdirectories or specific pages of the site? Not a problem. You would simply add each individual subdirectory or URL to the /robots.txt files as follows:

User-agent: *
Disallow: /checkout.asp
Disallow: /add_cart.asp
Disallow: /view_cart.asp
Disallow: /error.asp
Disallow: /shipquote.asp

 

The Robots.txt File Is Not Fool Proof

While the /robots.txt file does a good job of blocking a bot’s access to the domain’s root directory, it isn’t fool proof. Each individual page you do not want bots to index should also incorporate a properly formatted robots META tag. The standard robots META tag is configured like this:

<meta name=”robots” content=”index, follow” />

To help to prevent the bots from accessing individual URLs, the robots META tag in the header of the page should look like this:

<meta name=”robots” content=”noindex, nofollow” />

or

<meta name=”robots” content=”noindex, follow” />

 

The Bottom Line

A /robots.txt is a very useful tool and, unfortunately, an often overlooked and neglected aspect of web development. Now that you have a better understanding of what it is, what it does and how to use it, take some time to consider how your site may benefit from having a properly configured /robots.txt file. In the meantime, start checking out the /robots.txt files of the sites you visit to familiarize yourself with different configurations and uses for it.

About Smart e-Commerce

Smart e-Commerce is the Blog from Embitel which will provides the latest trends, information, Trends and strategies about e-Retail Solutions, e-Commerce, Social Media Optimization, Blog Marketing, Online Seminars, Search Engine Optimization, Google Rankings and e-Marketing Solutions in India, Germany, UK, Nordic, Australia, Norway, Sweden, Denmark and Finland.

  • Recent Post

  • RSS e-Commerce Information

    • Webinar on Mobile Commerce for Retailers
      With number crossing more than 4 billion mobile phones in the world, mobile commerce has the potential to be next big thing. This also means that the mobile penetration is much more than PC or any other popular media. Embitel Technologies India Pvt Ltd believes in staying in sync with time with sight set on [...] […]
    • Google Analytics Announces Weighted Sorting
      Google Analytics has announced a new sorting algorithm called weighted sort which weights the sort by the number of data points, getting rid of all those annoying 1 visit = 100% bounce rate visits. To make this easier to understand they have given some examples In their post on the Google Analytics Blog. When you have a [...] […]
    • What is M-Commerce?
      M-Commerce (Mobile Commerce) refers to access to the internet via a mobile device, such as a mobile phone or a PDA. An m-commerce site is a version of a company’s webpage that is designed to fit within the constraints of a mobile phone or PDA. (For more information on what makes a site m-commerce friendly [...] […]
    • Mistakes in Writing the Press Release Content
      Creating a press release for a campaign or assignment requires thought and preparation, without the core fundamentals your release may end up right in the trash folder of your recipients email.  A well-written press release can be an invaluable asset to your company and the success of the content within the release you're trying to [...] […]
    • Google Penalizes Adwords Ads with Poor CTR
        Google penalizes Adwords ads with poor click through rates.  If your ad is not generating clicks, it's making it more expensive for you to advertise.  Discover how to improve the click through rate of your ads. Your Adwords ad is sales copy. You are attempting to persuade a potential consumer to make a decision based on [...] […]