Create a properly formatted robots.txt file to control how search engines crawl your website. Add rules, sitemaps, and crawl delays.
robots.txt is a text file placed in your website's root directory that tells search engine crawlers which pages or sections they can or cannot access. It's part of the Robots Exclusion Protocol (REP) standard. While well-behaved bots respect robots.txt, it is NOT a security measure β it's merely a suggestion.
| Directive | Purpose | Example |
|---|---|---|
| User-agent | Target specific crawler | User-agent: Googlebot |
| Disallow | Block path from crawling | Disallow: /admin/ |
| Allow | Override disallow rule | Allow: /admin/public/ |
| Sitemap | Point to sitemap URL | Sitemap: https://example.com/sitemap.xml |
| Crawl-delay | Time between requests | Crawl-delay: 10 |
No! robots.txt only prevents crawling, not indexing. If other sites link to a blocked page, Google may still index it (showing the URL without content). To prevent indexing, use the meta robots "noindex" tag instead.
That depends on your preference. Blocking GPTBot prevents OpenAI from using your content for training. Many publishers now block AI crawlers while keeping search engines allowed. This tool lets you selectively block specific AI bots.