A single misplaced Disallow rule in your robots.txt can deindex your entire website overnight. This guide covers the correct syntax, the most common errors, how to block AI crawlers, and how to test your file before it causes damage.
Robots.txt is a text file placed at the root of your website that tells crawlers — search engines, AI systems, and other bots — which pages they are allowed to visit. It is one of the simplest files on the web, yet a single mistake can have catastrophic consequences: accidentally blocking Google from your entire site will cause your pages to disappear from search results within days.
This guide covers the correct robots.txt syntax, the most common mistakes, how to block AI crawlers like GPTBot and ClaudeBot, and how to test your file before deploying it.
Robots.txt is not a security mechanism — it is a courtesy protocol. Compliant crawlers (Google, Bing, and most reputable bots) read the file before crawling your site and follow its rules. Non-compliant crawlers ignore it entirely. You cannot use robots.txt to prevent a determined bad actor from accessing your content; it only governs bots that choose to respect it.
The file must be placed at yourdomain.com/robots.txt — exactly that URL, no subdirectory. Google fetches it with every crawl and caches the rules for up to 24 hours before re-fetching.
The file is made up of blocks called “records,” each beginning with a User-agent line that identifies which bot the rules apply to, followed by one or more Disallow or Allow lines.
User-agent: *
Disallow:
Sitemap: https://example.com/sitemap.xml
An empty Disallow value means no pages are blocked. This is the correct configuration for most public-facing websites. Adding a Sitemap directive helps crawlers discover your content faster.
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /
Sitemap: https://example.com/sitemap.xml
User-agent: BadBot
Disallow: /
Several AI companies have deployed their own web crawlers to collect training data. If you want to prevent your content from being used to train AI models, you can block these crawlers by name in robots.txt:
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
Important: these rules only affect compliant crawlers. OpenAI’s GPTBot and Anthropic’s ClaudeBot are documented as respecting robots.txt. However, many third-party scrapers that may supply AI training data do not. Robots.txt is a meaningful signal, not a guarantee of enforcement.
This is one of the most common points of confusion in SEO:
Never use robots.txt to block pages you also want deindexed. Use <meta name="robots" content="noindex"> on those pages instead, and leave robots.txt open so Googlebot can read the noindex instruction.
/assets/, /static/, or similar resource directories.Disallow: /admin and Disallow: /admin/ behave differently. Without a trailing slash, the rule applies to any URL starting with /admin, including /administrator. Use trailing slashes for directories.The SlugGenius Robots.txt Generator includes preset configurations for the most common scenarios: allow all crawlers, block all crawlers, or block AI crawlers specifically. You can select multiple user-agents, add a crawl delay, and set your sitemap URL — the tool outputs a valid, ready-to-deploy robots.txt file with no syntax errors.
After deploying your file, test it using Google Search Console’s robots.txt tester (found under Settings › robots.txt) or the standalone robots.txt tester at search.google.com. Paste your file and test individual URLs to confirm that the rules behave exactly as intended before Google re-crawls your site.
A correct robots.txt is invisible — it causes no problems and you never think about it. An incorrect one can silently destroy months of SEO work. Get it right once and revisit it any time you restructure your site.
No sign-up required — use them instantly in your browser.