How to use robots.txt

What is robots.txt?

robots.txt is a file that tells search engines how to crawl your website. It is a text file that is placed in the root directory of your website. The file is used to tell search engine crawlers which pages or files the crawler can or can’t request from your site.

How to create a robots.txt file

To create a robots.txt file, you can use a text editor like Notepad or TextEdit. The file should be named robots.txt and placed in the root directory of your website.

Here is an example of a simple robots.txt file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
User-agent: *

Allow: /
Allow: /archives/
Allow: /page/
Allow: /schedule/
Allow: /tags/
Allow: /categories/
Allow: /search/
Allow: /about/
Allow: /links/
Allow: /friends/
Allow: /sitemap.xml
Allow: /sitemap.txt
Allow: /atom.xml
Allow: /feed.xml
Allow: /ads.txt
Allow: /manifest.json

Disallow: /js/
Disallow: /css/
Disallow: /images/

Sitemap: https://models.net.cn/sitemap.xml
Sitemap: https://models.net.cn/sitemap.txt

In the example above, the User-agent: * line specifies that the rules apply to all search engine crawlers. The Allow lines specify which directories or files the crawler is allowed to access, while the Disallow lines specify which directories or files the crawler is not allowed to access. The Sitemap lines specify the location of the sitemap file for the website.

How to test your robots.txt file

To test your robots.txt file, you can use the robots.txt Tester tool in Google Search Console. This tool allows you to test how Google’s web crawler will interpret your robots.txt file.

To use the tool, follow these steps:

  1. Go to Google Search Console and sign in.
  2. Click on the property for which you want to test the robots.txt file.
  3. Click on the “URL Inspection” tool in the left-hand menu.
  4. Enter the URL of the robots.txt file in the search bar and click “Enter”.
  5. Click on the “Test robots.txt” button to see how Google’s web crawler will interpret your robots.txt file.

robots.txt best practices

Here are some best practices for using robots.txt:

  1. Make sure your robots.txt file is located in the root directory of your website.
  2. Use the User-agent: * line to apply rules to all search engine crawlers.
  3. Use the Allow and Disallow lines to specify which directories or files the crawler can or can’t access.
  4. Use the Sitemap line to specify the location of the sitemap file for the website.
  5. Test your robots.txt file using the robots.txt Tester tool in Google Search Console.

By following these best practices, you can ensure that search engine crawlers are able to crawl your website effectively and efficiently.

References