Robot Wars: Robots.txt vs. meta robots

  • follow us in feedly
Published June 18, 2013 by Brad Knutson
Robots.txt vs. meta robots

I recently came across a site that had conflicts between the Robots.txt file and the on-page meta robots tag. It got me wondering which of the two holds the ultimate authority if there are conflicts. This may seem like common sense, but it may not be intrinsically clear to even the most advanced SEOs and web masters. I decided to do some research and consolidate my findings.

First, a short lesson.

What is a Robots.txt file anyways?

The Robots.txt file is a web masters way to communicate with search engine and web crawlers, and give them directions about how to handle your content. The file is placed in the root directory of the domain, and is always named the same. Here is an example URL:

In a nutshell:

Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

This file can be used to give generic rules for robots, or specific rules for specific pages.

What is the meta robots tag?

The meta robots tag is used on a page by page basis (it’s not a requirement), and gives similar instructions to robots that the Robots.txt file does.

In a nutshell:

You can use a special HTML tag to tell robots not to index the content of a page, and/or not scan it for links to follow.

The standard meta tag looks like this:

<meta name="robots" content="noindex, nofollow">

You could also explicitly ask robots to index and follow, but since these are the default settings if no meta tag exists, it’s unnecessary.

So what happens if there is a conflict?

Robots.txt requests noindex, meta robots requests index

Result: Bots and web crawlers will be given the instructions to not index the page, and therefore will not index it when crawling your site. However, bots and web crawlers may come across your page through an external link, bypass the Robots.txt file, see the meta robots tag and decide to index the page. In rare cases, this page could be included in search results. Google will correctly decide to not index the page, but other search engines may not use the same logic.

Robots.txt requests index, meta robots requests noindex

Result: Search engine bots and web crawlers will be given permission to crawl the page based on the Robots.txt file, but as soon as they hit the meta robots tag, they will be given the instructions to not index the page and will move on. We crawlers hitting this page from external resources that are linking to the page will also stop at the meta robots tag, so the page will not show up in search results whatsoever.

meta robots requests noindex then second meta robots requests index

This might sound confusing, but what I mean by this is literally having two conflicting meta robots tags in the head section of your HTML like below.

<meta name="robots" content="noindex, nofollow">
<meta name="robots" content="index, follow">

Result: The page will not be indexed and will not be shown in any search results. Basically, if there is a “noindex” meta robots tag on the page, you cannot override it, even with another meta robots tag.

I had a client once who had an inhouse CMS built for their site, and for whatever reason (I suspect it was leftover from a development or staging environment) the template files themselves had a “noindex” meta robots tag in the head section. My client noticed their site wasn’t being indexed (before employing my services), and attempted to fix this by implicitly adding a “index” meta robots tag. They didn’t realize at the time that their efforts were all-for-naught. This situation might not be as uncommon as you think.

What about following links?

This is where things get a little unclear.

Generically speaking, if a search engine is able to crawl a page, and the meta robots tag is either not set, or set to “follow”, then the links will be followed and indexed by search engines. This is the most common case.

If a search engine bot or web crawler is not given instructions to crawl a page, but the meta robots tag is not set or set to “index” then the links will be followed. You would encounter this situation with the phantom “noindex,follow” meta tag:

<meta name="robots" content="noindex, follow">

Matt Cutts addresses this situation in a video that he posted to YouTube:

So basically, the page will not be indexed, but the links will be followed. This is good to know as SEOs that even if our links appear on pages that are not indexed (like archives, category pages, etc.), they are still followed links.

In Conclusion

On the most basic level, neither the meta robots tag or the Robots.txt has authority over the other – but rather the “noindex” request has authority over the “index” request.

What I’ve suggested to development teams I’ve worked with in the past is to set the Robots.txt file to allow all crawlers, then use the meta robots tag to request that a page not be indexed on a page by page basis. This is easy to remember, easy to update, and makes it easy to avoid conflicts and confusion.

The following two tabs change content below.
Founder at Inbounderish
Brad Knutson is a Web Developer in the Twin Cities area of Minnesota. He has experience working with WordPress and Drupal, and also has an interest in SEO and Inbound Marketing.

Keep Up-to-Date



See a complete list of topics discussed in blog posts here.

Check These Out

Get 2 Weeks Free! Sign Up Today! Premium Managed WordPress Hosting Genesis Framework for WordPress SEO is complex. Tools should be simple. Thesis Theme for WordPress:  Options Galore and a Helpful Support Community

Share Your Thoughts

Your email address will not be shown.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">