Guide5 min readApril 16, 2026

Robots.txt for AI Crawlers: Allow GPTBot, ClaudeBot & More (2026)

Which AI bots crawl the web, how to configure your robots.txt to allow or block them, and how to test your configuration.

Surfacedd Team

Your robots.txt file determines whether AI crawlers can access your site. If GPTBot is blocked, ChatGPT cannot cite your content. If PerplexityBot is blocked, Perplexity cannot include you in search results. Many brands unknowingly block AI crawlers with overly restrictive robots.txt rules and lose AI visibility as a result. This guide lists every major AI crawler, shows how to configure robots.txt correctly, and explains how to test it.

AI Crawlers You Need to Know

Here are the AI bots actively crawling the web in 2026:

OpenAI Bots

GPTBot — Used to improve and train OpenAI models. Allowing it increases the chance ChatGPT knows about your brand
OAI-SearchBot — Used specifically for ChatGPT's search feature and ChatGPT Shopping
ChatGPT-User — Makes real-time requests when a ChatGPT user asks for current information

Anthropic Bots

ClaudeBot — Used by Anthropic to train Claude models
anthropic-ai — Secondary identifier for Anthropic crawling

Perplexity Bots

PerplexityBot — Crawls pages in real time to generate answers for Perplexity search

Google AI Bots

Googlebot — Used for both traditional search and AI Overviews (they share the same crawler)
Google-Extended — Controls whether your content is used for Gemini and AI training (separate from search indexing)

Other AI Bots

CCBot — Common Crawl bot, used as training data by many AI models
Bytespider — TikTok/ByteDance crawler, used for AI features
Applebot-Extended — Apple's crawler for AI features including Apple Intelligence
cohere-ai — Cohere's web crawler
Meta-ExternalAgent — Meta's AI training crawler

Recommended robots.txt Configuration

For brands that want maximum AI visibility, allow all major AI crawlers:

# Allow AI crawlers for maximum AI visibility User-agent: GPTBot Allow: / User-agent: OAI-SearchBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: ClaudeBot Allow: / User-agent: anthropic-ai Allow: / User-agent: PerplexityBot Allow: / User-agent: Google-Extended Allow: / User-agent: Applebot-Extended Allow: / User-agent: cohere-ai Allow: / # Standard crawlers User-agent: Googlebot Allow: / User-agent: * Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Selective Access: Allow Search, Block Training

Some brands want to appear in AI search results but do not want their content used for model training. Here is how to configure that:

Allow OAI-SearchBot (ChatGPT search) but block GPTBot (training)
Allow Googlebot (search and AI Overviews) but block Google-Extended (Gemini training)
Allow PerplexityBot (search only — Perplexity does not use content for training)
Allow ChatGPT-User (real-time user requests)

# Allow AI search crawlers User-agent: OAI-SearchBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: PerplexityBot Allow: / # Block AI training crawlers User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: ClaudeBot Disallow: /

User-agent: CCBot Disallow: /

Note: Blocking training crawlers means future model versions may have less knowledge of your brand. This is a tradeoff each brand must evaluate.

Common Mistakes

Blanket Disallow: / for unknown bots: Many robots.txt files include a catch-all rule like User-agent: * / Disallow: / which blocks all AI crawlers not explicitly allowed
Blocking GPTBot but expecting ChatGPT visibility: GPTBot feeds ChatGPT's knowledge base. Blocking it reduces your brand's presence in ChatGPT answers
Not specifying AI bots at all: If your robots.txt only addresses Googlebot and has a permissive default, you are fine. But if it has a restrictive default, AI bots are blocked by implication
Forgetting to update after CMS migrations: Many site migrations introduce new robots.txt files that inadvertently block AI crawlers
Confusing Googlebot and Google-Extended: Blocking Googlebot removes you from Google Search entirely. Google-Extended only controls AI training use

How to Test Your robots.txt

Manual check: Visit https://yourdomain.com/robots.txt and review the rules for each AI bot user-agent
Google's robots.txt tester: Available in Google Search Console, tests Googlebot and Google-Extended access
Server log analysis: Check your server logs for requests from GPTBot, ClaudeBot, and PerplexityBot to confirm they are reaching your site
AI query test: Ask ChatGPT, Perplexity, and Claude about your brand. If they have no information or outdated information, crawler access may be the issue
Use the AI Visibility Checker to verify that AI platforms can access and cite your content

robots.txt and llms.txt: Working Together

robots.txt controls access — which bots can crawl your site. llms.txt controls understanding — what those bots should know about your brand. Both files should be configured together:

Ensure AI crawlers are allowed in robots.txt
Create an llms.txt at your domain root with your brand summary
Implement structured data on key pages for detailed product and content information

This three-layer approach — access, summary, and structured data — gives AI crawlers everything they need to accurately represent your brand.

Next Steps

Check your current robots.txt at yourdomain.com/robots.txt
Add explicit Allow rules for GPTBot, ClaudeBot, PerplexityBot, and OAI-SearchBot
Deploy an llms.txt file to provide brand context
Add structured data to your key pages
Run an AI visibility audit to measure the impact of your changes
Review the AI Advertising Getting Started guide if you want to supplement organic visibility with paid placements

robots-txtai-crawlerstechnicalconfiguration