Robots.txt and meta robots tags
Robots.txt and meta robots tags are used by search engine optimisation agencies and webmasters to order instructions to crawlers traversing and indexing a website. They inform the search spider what to do with the specific web page, this may include requesting that the spider does not crawl the page at all or crawls the page but does not include it in Google’s index.
What is robots.txt?
Robots.txt, short for The Robots Exclusion Protocol, is a text file used to instruct robots or ‘crawlers’ how to index pages on their site. Search engine optimisation agencies that use robots.txt effectively can tell crawlers what to visit on asite and give you control over how your site is searched.
Noindex: This permits the crawling of the page, but not the indexing. It also tells the search engines that, if the page is currently in the index, it should be removed.
Disallow: Disallows all crawling and indexing of the page.
Nofollow: Tells the search engines not to follow links on the page. This is an extremely important to search engine optimisation, and so we have gone into greater detail at Nofollow tags. The reverse directive of this is ‘follow’.
Nocache: Tells the search engines not to save a copy of the web page in their cache.
You can view a site’s robots.txt file by going to: www.example.com/robots.txt.
What are meta robots tags?
Meta robots tags are used in addition to the robots.txt file to focus on individual pages rather than the site as a whole. Meta robots tags let you control the behaviour of search bots at the page level, with a header-level directive. This gives users what Google refers to as ‘fine grain control’ over asite.
The code looks like this:
<meta name=’ROBOTS’ content=”NOINDEX, NOFOLLOW”>