Bots crawl your site all the time. Some are good bots from search engines like Google. Some are bad bots, like link spammers and hackers. The robots.txt file allows you to give bots the direction those bots need to crawl your site in the most expedient way, and keep them out of areas where they have no business being.
What is a robots.txt file?
Think of it as a roadmap of your site.
What can be in my robots.txt file?
This special file can be given a set of directives to tell bots:
- where the XML sitemap of your site is located
- which site files should be scanned and indexed
- which site files should be ignored
- how frequently bots can scan your site files
- what type of bot can access certain areas
What should be in my robots.txt file?
First, be sure you have one!!
Not all hosts install it for you when you set up your site.
If there is a default robots.txt file, then it likely only has a crawl delay.
It should at least have the URL path to your XML sitemap so bots can quickly find it.
If you are using the Yoast SEO plugin, I recommend using its XML sitemap generation tool. It will even give you an easy way to find the URL of the master map. You’ll want to submit that to Google Search Console too!
Where is my robots.txt file?
It should be in the same host folder where your site files are located. If you have one site, that will likely be in the public_html root.
How do I see my robots.txt file?
This is a publicly viewable file. In a browser, enter the home page URL of your site, followed by /robots.txt.
For BlogAid, that would be http://www.blogaid.net/robots.txt
How is robots.txt different from my .htaccess file?
What’s in the .htaccess file is the law.
All rules in it must be followed by everything that visits your site, both human and bot.
The robots.txt file is specifically for bots. And it contains directives, or suggestions.
Bots can, and do choose to ignore those directives at will.
For the most part, well-behaved bots, like Googlebots, will obey the directives. Ill-behaved bots will not.
What else do I need in my robots.txt file?
I don’t mean to give you a vague answer on this. But, this is one of those questions that if you ask 10 different pro SEO folks, you’re going to get at least 6 different answers (a few of them will actually agree with one another).
Here’s how I’ve determined what should be in my robots.txt file.
- I have pages that I don’t want to be indexed by search engines
- I have performance test data from my sites and 100s of site audit client sites to show the elements in my file improve site load time and reduce host server resources.Google has no issues with my robots.txt file
How a robots.txt file can harm your site rankings
You need to be especially careful with any blocking directives in this special file.
If Google determines that it needs access to areas of your site that you have blocked, it could penalize your rankings.
If you set too high a crawl delay to improve performance, Google could decide that it takes too long to index your site and slap you on the SEO wrist.
And Google does move the cheese on this. I saw this when the Google Mobile Friendly tester launched. It checked different site elements and places than Google PageSpeed Insights checked. And the mobile render of the site in each of those was different too.
I had to change my robots.txt file to suit Google’s new testers.
What’s good for Google may not be good for your site!
Here’s the thing. Google would prefer that you have no robots.txt and/or that you allow it any access it desires.
The problem is, Google is not the only army of bots crawling your site.
I’ve seen performance issues when bots are allowed to just sit there and chew on the site unabated.
Do your homework
So, do your own research and determine what directives are best for your site.
Make sure you read PLENTY of posts on it!
No matter what advice you find in one, you’ll find another post that contradicts it. And both will have data to back up their claims.
The robots.txt file is just one of the security checks I do in site audits.
In my Webmaster Training I teach designers and technical VAs how to install and maintain secure sites, including exactly what should be in the robots.txt file.