noindex

The noindex value of an HTML robots meta tag requests that automated Internet bots avoid indexing a web page.[1][2] Reasons why one might want to use this meta tag include advising robots not to index a very large database, webpages that are very transitory, pages that one wishes to keep slightly more private, or the printer and mobile-friendly versions of pages. Since the burden of honoring a website's noindex tag lies with the author of the search robot, sometimes these tags are ignored. Also the interpretation of the noindex tag is sometimes slightly different from one search engine company to the next.

Example

<html>
<head>
 <meta name="robots" content="noindex">
 <title>Don't index this page</title>
</head>

Possible values for the meta tag content are: "none", "all", "index", "noindex", "nofollow", and "follow". A combination of the values is also possible.[1] See example:

 <meta name="robots" content="noindex, follow">

Bot-specific directives

There are ways to send the noindex directive only for certain bots, typically by using a different "name" value of the meta tag.

For example, to specifically block Google's bot,[3] specify:

<meta name="googlebot" content="noindex">

Or, to block Yahoo!'s bot,[4] specify:

<meta name="slurp" content="noindex">

Or, to block MSN's bot, specify:

<meta name="msnbot" content="noindex">

<noindex> for specific content on the page

To exclude navigation text from searches, the Russian search engine Yandex has introduced a new <noindex> tag which only prevents indexing of the content between the tags, not the whole web page. To allow the source code to validate, <!--noindex--> can alternatively be used:[5]

<p>
  Do index this text.
  <noindex>Don't index this text.</noindex>
  <!--noindex-->Don't index this text.<!--/noindex-->
</p>

Other indexing spiders also recognize the <noindex> tag, including Atomz.[6]

In 2007, Yahoo! introduced the same functionality for its spider. Yahoo!'s spider looks for the attribute value class="robots-nocontent" in HTML tags:[7]

<p>Do index this text.</p>
<div class="robots-nocontent">Don't index this text.</div>
<span class="robots-nocontent">Don't index this text.</span>
<p class="robots-nocontent">Don't index this text.</p>

There is also a 2005 draft microformats specification which is similar to, but incompatible with, Yahoo!'s technique.[8]

The Google Search Appliance uses structured comments:[9]

<p>
  Do index this text.
<!--googleoff: all-->
  Don't index this text.
<!--googleon: all-->
</p>

Google's main indexing spider, Googlebot, is not known to recognize any of these techniques.

See also

References

  1. 1.0 1.1 Robots and the META element, Official W3 specification
  2. About the Robots <META> tag
  3. Using meta tags to block access to your site, Google Webmasters Tools Help
  4. How to Prevent Yahoo! Search From Indexing Specific Pages, Yahoo! Search Help
  5. "Using HTML tags". webmaster → help. Yandex. Section: <noindex> tag. Retrieved March 25, 2013.
  6. "General Search FAQ". Help. Atomz. 2013. Section: How do I exclude parts of my site from being searched?. Retrieved March 23, 2013. Need to prevent parts of individual pages from being searched? If you want to exclude portions of a page from indexing, surround the text with <noindex> and </noindex> tags. This is useful, for example, if you want to exclude navigation text from searches.(registration required)
  7. Garg, Priyank (May 2, 2007). "Introducing Robots-Nocontent for Page Sections". Yahoo! Search Blog. Yahoo!. Retrieved March 23, 2013.
  8. Janes, Peter (June 18, 2005). "Robot Exclusion Profile". Microformats. Retrieved March 24, 2013.
  9. "Administering Crawl: Preparing for a Crawl". Google Search Appliance. Google Inc. August 23, 2012. Section: Excluding Unwanted Text from the Index. Archived from the original on November 23, 2012. Retrieved March 23, 2013.

External links