If you want to influence the content on your website that search engines crawl and index, you have several options. Two of these options involve using robots meta tags and robots.txt. Whilst they may sound similar, they differ in important ways.
Of course, you can simply do nothing and leave the crawling and indexing of your website entirely to Google. However, this has potential disadvantages, especially with larger sites.
The crawling may take longer than necessary, and content might appear in the search results that should not be displayed there. Fortunately, both of these things can be prevented. What is important is that you take the appropriate action. This brings us to the topic of choosing either robots meta tags or robots.txt.
What are robots meta tags?
Robots meta tags are snippets that you place in the head section of a page. They look like this:
<meta name="robots" content="noindex" />
You mark which search engine you want to address with the name attribute, and the content attribute indicates the desired action. In this example, the tag prevents the content from being indexed by all search engines.
What is robots.txt?
A robots.txt file (Robots Exclusion Standard Protocol) is a text file that tells search engine crawlers which files or pages they can crawl. To do this, you must upload it to the website’s root directory.
The search engine or its crawler is identified in the robots.txt file with user-agent. The disallow and allow commands can be used to specify which directories should and should not be crawled. You can also refer to the location of a sitemap in the robots.txt file.
The result looks like this, for example:
# Group 1
User-agent: Googlebot
Disallow: /nogooglebot/
# Group 2
User-agent: *
Allow: /
Sitemap: http://www.example.com/sitemap.xml
Robots meta tags vs. robots.txt – when you should use what
The key difference between robots meta tags and robots.txt is as follows.
The robots.txt file is not suitable for safely excluding content from indexation. Incoming links may still cause content to be indexed under certain circumstances.
Google therefore advises that you use the robots.txt file to manage crawling traffic and prevent image, video and audio files from appearing in search results.
By using robots meta tags with the noindex instruction, you reliably prevent pages from appearing in search results. However, you cannot use them to exclude individual image, audio or video files from indexation.
Tip: Make sure that the two measures do not interfere with each other. If a robots.txt file prohibits the crawling of a page, for example, the crawler will not be able to read the robots meta tags. This can, under certain circumstances, lead to the page being indexed – which is something that you of course want to avoid.