Robots.txt for AI Crawlers: Allow GPTBot, ClaudeBot & More (2026)
Which AI bots crawl the web, how to configure your robots.txt to allow or block them, and how to test your configuration.
Your robots.txt file determines whether AI crawlers can access your site. If GPTBot is blocked, ChatGPT cannot cite your content. If PerplexityBot is blocked, Perplexity cannot include you in search results. Many brands unknowingly block AI crawlers with overly restrictive robots.txt rules and lose AI visibility as a result. This guide lists every major AI crawler, shows how to configure robots.txt correctly, and explains how to test it.
AI Crawlers You Need to Know
Here are the AI bots actively crawling the web in 2026:
OpenAI Bots
GPTBot— Used to improve and train OpenAI models. Allowing it increases the chance ChatGPT knows about your brandOAI-SearchBot— Used specifically for ChatGPT's search feature and ChatGPT ShoppingChatGPT-User— Makes real-time requests when a ChatGPT user asks for current information
ClaudeBot— Used by Anthropic to train Claude modelsanthropic-ai— Secondary identifier for Anthropic crawling
PerplexityBot— Crawls pages in real time to generate answers for Perplexity search
Googlebot— Used for both traditional search and AI Overviews (they share the same crawler)Google-Extended— Controls whether your content is used for Gemini and AI training (separate from search indexing)
CCBot— Common Crawl bot, used as training data by many AI modelsBytespider— TikTok/ByteDance crawler, used for AI featuresApplebot-Extended— Apple's crawler for AI features including Apple Intelligencecohere-ai— Cohere's web crawlerMeta-ExternalAgent— Meta's AI training crawler
Recommended robots.txt Configuration
For brands that want maximum AI visibility, allow all major AI crawlers:
# Allow AI crawlers for maximum AI visibility
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: cohere-ai
Allow: /
# Standard crawlers
User-agent: Googlebot
Allow: /
User-agent: *
Allow: /
Sitemap: https://yourdomain.com/sitemap.xml
Selective Access: Allow Search, Block Training
Some brands want to appear in AI search results but do not want their content used for model training. Here is how to configure that:
- Allow OAI-SearchBot (ChatGPT search) but block GPTBot (training)
- Allow Googlebot (search and AI Overviews) but block Google-Extended (Gemini training)
- Allow PerplexityBot (search only — Perplexity does not use content for training)
- Allow ChatGPT-User (real-time user requests)
# Allow AI search crawlers
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: PerplexityBot
Allow: /
# Block AI training crawlers
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
Note: Blocking training crawlers means future model versions may have less knowledge of your brand. This is a tradeoff each brand must evaluate.
Common Mistakes
- Blanket
Disallow: /for unknown bots: Many robots.txt files include a catch-all rule likeUser-agent: * / Disallow: /which blocks all AI crawlers not explicitly allowed - Blocking GPTBot but expecting ChatGPT visibility: GPTBot feeds ChatGPT's knowledge base. Blocking it reduces your brand's presence in ChatGPT answers
- Not specifying AI bots at all: If your robots.txt only addresses Googlebot and has a permissive default, you are fine. But if it has a restrictive default, AI bots are blocked by implication
- Forgetting to update after CMS migrations: Many site migrations introduce new robots.txt files that inadvertently block AI crawlers
- Confusing Googlebot and Google-Extended: Blocking Googlebot removes you from Google Search entirely. Google-Extended only controls AI training use
How to Test Your robots.txt
- Manual check: Visit
https://yourdomain.com/robots.txtand review the rules for each AI bot user-agent - Google's robots.txt tester: Available in Google Search Console, tests Googlebot and Google-Extended access
- Server log analysis: Check your server logs for requests from GPTBot, ClaudeBot, and PerplexityBot to confirm they are reaching your site
- AI query test: Ask ChatGPT, Perplexity, and Claude about your brand. If they have no information or outdated information, crawler access may be the issue
- Use the AI Visibility Checker to verify that AI platforms can access and cite your content
robots.txt and llms.txt: Working Together
robots.txt controls access — which bots can crawl your site. llms.txt controls understanding — what those bots should know about your brand. Both files should be configured together:
- Ensure AI crawlers are allowed in robots.txt
- Create an llms.txt at your domain root with your brand summary
- Implement structured data on key pages for detailed product and content information
Next Steps
- Check your current robots.txt at
yourdomain.com/robots.txt - Add explicit
Allowrules for GPTBot, ClaudeBot, PerplexityBot, and OAI-SearchBot - Deploy an llms.txt file to provide brand context
- Add structured data to your key pages
- Run an AI visibility audit to measure the impact of your changes
- Review the AI Advertising Getting Started guide if you want to supplement organic visibility with paid placements