Skip to main content
Guide5 min readApril 16, 2026

Robots.txt for AI Crawlers: Allow GPTBot, ClaudeBot & More (2026)

Which AI bots crawl the web, how to configure your robots.txt to allow or block them, and how to test your configuration.

S
Surfacedd Team

Your robots.txt file determines whether AI crawlers can access your site. If GPTBot is blocked, ChatGPT cannot cite your content. If PerplexityBot is blocked, Perplexity cannot include you in search results. Many brands unknowingly block AI crawlers with overly restrictive robots.txt rules and lose AI visibility as a result. This guide lists every major AI crawler, shows how to configure robots.txt correctly, and explains how to test it.

AI Crawlers You Need to Know

Here are the AI bots actively crawling the web in 2026:

OpenAI Bots

    1. GPTBot — Used to improve and train OpenAI models. Allowing it increases the chance ChatGPT knows about your brand
    2. OAI-SearchBot — Used specifically for ChatGPT's search feature and ChatGPT Shopping
    3. ChatGPT-User — Makes real-time requests when a ChatGPT user asks for current information
Anthropic Bots
    1. ClaudeBot — Used by Anthropic to train Claude models
    2. anthropic-ai — Secondary identifier for Anthropic crawling
Perplexity Bots
    1. PerplexityBot — Crawls pages in real time to generate answers for Perplexity search
Google AI Bots
    1. Googlebot — Used for both traditional search and AI Overviews (they share the same crawler)
    2. Google-Extended — Controls whether your content is used for Gemini and AI training (separate from search indexing)
Other AI Bots
    1. CCBot — Common Crawl bot, used as training data by many AI models
    2. Bytespider — TikTok/ByteDance crawler, used for AI features
    3. Applebot-Extended — Apple's crawler for AI features including Apple Intelligence
    4. cohere-ai — Cohere's web crawler
    5. Meta-ExternalAgent — Meta's AI training crawler

For brands that want maximum AI visibility, allow all major AI crawlers:

# Allow AI crawlers for maximum AI visibility
User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: cohere-ai
Allow: /

# Standard crawlers
User-agent: Googlebot
Allow: /

User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Selective Access: Allow Search, Block Training

Some brands want to appear in AI search results but do not want their content used for model training. Here is how to configure that:

  1. Allow OAI-SearchBot (ChatGPT search) but block GPTBot (training)
  2. Allow Googlebot (search and AI Overviews) but block Google-Extended (Gemini training)
  3. Allow PerplexityBot (search only — Perplexity does not use content for training)
  4. Allow ChatGPT-User (real-time user requests)
# Allow AI search crawlers
User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

Note: Blocking training crawlers means future model versions may have less knowledge of your brand. This is a tradeoff each brand must evaluate.

Common Mistakes

  1. Blanket Disallow: / for unknown bots: Many robots.txt files include a catch-all rule like User-agent: * / Disallow: / which blocks all AI crawlers not explicitly allowed
  2. Blocking GPTBot but expecting ChatGPT visibility: GPTBot feeds ChatGPT's knowledge base. Blocking it reduces your brand's presence in ChatGPT answers
  3. Not specifying AI bots at all: If your robots.txt only addresses Googlebot and has a permissive default, you are fine. But if it has a restrictive default, AI bots are blocked by implication
  4. Forgetting to update after CMS migrations: Many site migrations introduce new robots.txt files that inadvertently block AI crawlers
  5. Confusing Googlebot and Google-Extended: Blocking Googlebot removes you from Google Search entirely. Google-Extended only controls AI training use

How to Test Your robots.txt

  1. Manual check: Visit https://yourdomain.com/robots.txt and review the rules for each AI bot user-agent
  2. Google's robots.txt tester: Available in Google Search Console, tests Googlebot and Google-Extended access
  3. Server log analysis: Check your server logs for requests from GPTBot, ClaudeBot, and PerplexityBot to confirm they are reaching your site
  4. AI query test: Ask ChatGPT, Perplexity, and Claude about your brand. If they have no information or outdated information, crawler access may be the issue
  5. Use the AI Visibility Checker to verify that AI platforms can access and cite your content

robots.txt and llms.txt: Working Together

robots.txt controls access — which bots can crawl your site. llms.txt controls understanding — what those bots should know about your brand. Both files should be configured together:

  1. Ensure AI crawlers are allowed in robots.txt
  2. Create an llms.txt at your domain root with your brand summary
  3. Implement structured data on key pages for detailed product and content information
This three-layer approach — access, summary, and structured data — gives AI crawlers everything they need to accurately represent your brand.

Next Steps

  1. Check your current robots.txt at yourdomain.com/robots.txt
  2. Add explicit Allow rules for GPTBot, ClaudeBot, PerplexityBot, and OAI-SearchBot
  3. Deploy an llms.txt file to provide brand context
  4. Add structured data to your key pages
  5. Run an AI visibility audit to measure the impact of your changes
  6. Review the AI Advertising Getting Started guide if you want to supplement organic visibility with paid placements
robots-txtai-crawlerstechnicalconfiguration