Home > Robots.txt > Common Robots.txt Mistakes People Make And How To Avoid Them

Advanced SEO Robots.txt

Common Robots.txt Mistakes People Make And How To Avoid Them

Monal Panchal
Last updated on May 24, 2023
4 min read

Robots.txt is one of the most crucial elements of SEO. It is the first thing a crawler checks when it visits your website. It is used to direct a crawler about which parts of the website it is allowed and disallowed to crawl. A small mistake in any directive in this file can lead to poor crawlability, which directly affects website rankings.

In this blog post, we will cover some of the most common mistakes that people make while creating a robots.txt file that you should avoid.

Common Robots.txt Mistakes

Not placing the file in the root directory

One of the common mistakes that people make is forgetting to place the file in the correct location. A robots.txt file should always be placed in the root directory of your website. Placing it within other sub-directories makes the file undiscoverable for the crawler when it visits your website.

Incorrect way – https://www.example.com/assets/robots.txt

Correct way – https://www.example.com/robots.txt

Improper use of wildcards

Wildcards are special characters used in the directives defined for crawlers in a robots.txt file. There are two wildcards that can be used in a robots file: * and $. The character * is used to represent “all” or “0 or more instances of any valid character”. And the character $ is used to represent the end of a URL. Understand how wildcards work in the below example and use them wisely.

Example of correct implementation

User-Agent: * (Here * is used to represent all types of user agents)

Disallow: /assets* (Here * represents that any URL with “/assets” present in it will be blocked)

Disallow: *.pdf$ (This directive denotes any URL ending with .pdf extension should be blocked)

Do not use wildcards unnecessarily or you might end up blocking an entire folder instead of a single URL.

Unnecessary use of trailing slash

Another common mistake is using trailing slash while blocking/allowing a URL in robots.txt. For example, if you wish to block a URL: https://www.example.com/category.

What happens if you add an unnecessary trailing slash?

User-Agent: *

Disallow: /category/

This will indicate to Googlebot to not crawl any URLs inside the “/category/” folder. However, this command will not block the URL example.com/category, as this URL does not have a trailing slash at the end

The ideal way to block the URL

User-Agent: *

Disallow: /category

Using the NoIndex directive in robots.txt

This is an old practice that people have now discontinued. Google officially announced that the NoIndex directive wouldn’t work in robots.txt files from September 1, 2019. In case you are using it, you should get rid of it. Instead, you should define a NoIndex attribute in the robots meta tag for the URLs you don’t want indexed by Google.

Example of NoIndex in robots.txt

Use the meta robots tag instead

Use this code snippet in the page code of the URLs you want to block Google from indexing rather than using a NoIndex directive in robots.txt file.

Not mentioning the sitemap URL

People often forget to mention the sitemap location in the robots.txt file, which is not desirable. Specifying the sitemap location will help the crawler discover the sitemap from the robots file itself. Googlebot won’t have to spend time finding the sitemap as it has been mentioned upfront. Making things easier for crawlers will always help your website.

How to define sitemap location in robots file?

Simply use the command mentioned below in your robots.txt file to declare your sitemap.

Sitemap: https://www.example.com/sitemap.xml

Blocking CSS and JS

People often think that CSS and JS files may get indexed by Googlebot and hence end up blocking them in robots.txt. Google’s John Mueller has himself advised not to block JS and CSS files as Googlebot needs to crawl them for rendering the page efficiently. If Googlebot is unable to render pages, it is most likely that it won’t index or rank those pages. You can read more about Mueller’s advice here.

Not creating a dedicated robots.txt file for each sub-domain

It is suggested that each and every sub-domain of a website, including the staging sub-domains, should have a dedicated robots.txt file. Not doing so can lead to crawling and indexing of unwanted sub-domains (for example staging, APIs, and so on) and inefficient crawling of important sub-domains. Hence it is highly recommended to ensure that a robots.txt file is defined and customized for each sub-domain.

Not blocking crawlers from accessing the staging site

All development efforts for a website are first tested on a staging or test website and then deployed on the main website. But one important thing people forget is that for Googlebot, a staging website is just like any other website. It can discover, crawl and index your staging website just like any other normal website. And if you don’t block the crawlers from crawling your staging site, there is a high chance that your staging URLs will get indexed and might even rank for some queries. This is the last thing you want.

People often use the same robots.txt file from their main website on the staging website, which is completely wrong. Always block crawlers from crawling your staging site. You can do it by simply following these commands:

User-Agent: *

Disallow: /

Ignoring case sensitivity

It is important to remember that URLs are case sensitive for crawlers. For example, https://www.example.com/category and https://www.example.com/Category are two different URLs for a crawler. Hence, when defining directives in robots.txt file, ensure that you are maintaining case sensitivity for URLs.

Let’s say you want to block the URL https://www.example.com/news

Incorrect approach

User-Agent: *

Disallow: /News

Correct approach

User-Agent: *

Disallow: /news

Conclusion

These were some of the most common mistakes related to robots.txt file that can drastically harm SEO. Robots.txt is a small, yet very important file that is easy to set up. Hence, you should take the utmost care when setting up a robots.txt file and refrain from making any errors.

Have you made any of these errors? What was the impact? Let us know in the comments section below.

How to use ChatGPT for Keyword Research?

3 min read

Advanced SEO

How to use ChatGPT for Keyword Research?

3 min read
Jun 02, 2023
Ankit Thakkar

Advanced SEO SEO Basics Technical SEO

How to Create a Cross-Network Advertising Strategy

4 min read
Jun 02, 2023
Ankit Thakkar

Advanced SEO On Page SEO SEO Basics

Complete Guide on Real-Time Marketing

4 min read
Jun 02, 2023
Ankit Thakkar

Advanced SEO

How AI Can Help In SEO

4 min read
May 05, 2023
Ankit Thakkar

Advanced SEO

12 Biggest SEO Challenges of 2023

6 min read
May 05, 2023
Ankit Thakkar

Advanced SEO

Video SEO: Complete Guide & How To Optimize It

6 min read
May 05, 2023
Ankit Thakkar

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Common Robots.txt Mistakes People Make And How To Avoid Them

Common Robots.txt Mistakes

Not placing the file in the root directory

Improper use of wildcards

Example of correct implementation

Unnecessary use of trailing slash

What happens if you add an unnecessary trailing slash?

The ideal way to block the URL

Using the NoIndex directive in robots.txt

Example of NoIndex in robots.txt

Use the meta robots tag instead

Not mentioning the sitemap URL

How to define sitemap location in robots file?

Blocking CSS and JS

Not creating a dedicated robots.txt file for each sub-domain

Not blocking crawlers from accessing the staging site

Ignoring case sensitivity

Incorrect approach

Correct approach

Conclusion

Popular Searches

People also read

People also read

RELATED ARTICLES

Boost your online visibility organically.

Boost your online visibility organically.

3 thoughts on “Common Robots.txt Mistakes People Make And How To Avoid Them”

Leave a Comment Cancel Reply

Share this article

Common Robots.txt Mistakes People Make And How To Avoid Them