Robot Exclusion Standard Protocol or Robots.txt is a text file which is placed at the root of website. The standard was proposed by Martijn Koster in 1994. The main aim of robots.txt file is to tell web crawler or robots to index or not index a website/webpage.
When a robot wants to visit a web page (i.e.-http://www.techigniter.in) it first checks robots.txt file (i.e.-http://www.techigniter.in/blogs/robots.txt) and then checks permission.
User-agent: <user agent name> Disallow: <page/directory>
User-agent: This section applies to robots and crawler.
Disallow: Tells crawler should not visit pages or specified directory.
If you want to block all the contents of your website for all web crawlers, use the following code:
User-agent: * Disallow: /
To allow all robots for access:
User-agent: * Disallow:
Block some parts of your website for crawlers:
User-agent: * Disallow: /cgi-bin/ Disallow: /Images/ Disallow: /api/
To disallow a single bot from accessing your site:
User-agent: <Bot name> Disallow: /
You can also use meta tags in your website to tell a bot to index a page content or not:
<meta name="robots" content="selection">
1. The crawler will not index web page content:
<meta name="robots" content="Noindex, Nofollow">
2. The crawler will index web page content:
<meta name="robots" content="index, follow">
Hope you find this article helpful.