← 返回
未分类 中文

Robots Ai

Analyze and generate robots.txt files with AI crawler awareness. Detect which AI bots (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, etc.) are blocked o...
利用AI爬虫感知分析与生成robots.txt,检测GPTBot、ClaudeBot等AI机器人拦截情况。
sharozdawa sharozdawa 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 386
下载
💾 2
安装
1
版本
#latest

概述

robots-ai

Analyze, audit, and generate robots.txt files with full awareness of 20+ AI crawlers.

Capabilities

  • Analyze any website's robots.txt to see which AI bots are blocked/allowed
  • Generate a robots.txt with toggleable AI bot blocking
  • Audit existing robots.txt for completeness and issues
  • List all known AI crawlers with their user-agents, companies, and documentation links

AI Bots Database

You know about these AI crawlers and their user-agents:

BotUser-AgentCompanyType
-------------------------------
GPTBotGPTBotOpenAIAI Crawler
ChatGPT-UserChatGPT-UserOpenAIAI Search
OAI-SearchBotOAI-SearchBotOpenAIAI Search
ClaudeBotClaudeBotAnthropicAI Crawler
anthropic-aianthropic-aiAnthropicAI Crawler
Google-ExtendedGoogle-ExtendedGoogleAI Crawler
PerplexityBotPerplexityBotPerplexityAI Search
CCBotCCBotCommon CrawlAI Crawler
BytespiderBytespiderByteDanceAI Crawler
DiffbotDiffbotDiffbotAI Crawler
cohere-aicohere-aiCohereAI Crawler
AmazonbotAmazonbotAmazonAI Crawler
Meta-ExternalAgentMeta-ExternalAgentMetaAI Crawler
Meta-ExternalFetcherMeta-ExternalFetcherMetaAI Crawler
Applebot-ExtendedApplebot-ExtendedAppleAI Crawler
YouBotYouBotYou.comAI Search
TimpibotTimpibotTimpiAI Crawler
img2datasetimg2datasetOpen SourceAI Crawler

Important Notes

  • Google-Extended controls Gemini training access but does NOT affect Google Search indexing
  • Blocking Googlebot removes the site from Google Search entirely — never do this unless explicitly asked
  • CCBot feeds Common Crawl, which is used by many AI companies for training data
  • Bytespider (ByteDance) and Timpibot are commonly blocked by default due to aggressive crawling

How to Analyze

When asked to analyze a robots.txt:

  1. Fetch the robots.txt from the URL (append /robots.txt if not included)
  2. Parse all User-agent directives and their Allow/Disallow rules
  3. Check each AI bot against the rules
  4. Report: which bots are blocked, which are allowed, and any issues found
  5. Suggest improvements if relevant

How to Generate

When asked to generate a robots.txt:

  1. Ask which AI bots to block (or accept "block all AI" / "allow all AI")
  2. Ask for sitemap URL(s)
  3. Ask for any custom rules (e.g., Disallow: /admin/)
  4. Generate clean robots.txt with comments explaining each section
  5. Always include User-agent: * with Allow: / as the default
  6. Group blocked AI bots together with comments
  7. Add sitemap directives at the end

Output Format

Always format the generated robots.txt in a code block with syntax highlighting. Add comments explaining what each section does. Example:

# Allow all crawlers by default
User-agent: *
Allow: /

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

# Sitemap
Sitemap: https://example.com/sitemap.xml

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-31 02:55 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

it-ops-security

MoltGuard - Security & Antivirus & Guardrails

thomaslwang
MoltGuard — OpenClaw 安全守卫,由 OpenGuardrails 提供。安装后可防止您和您的用户受到提示注入、数据泄露及恶意行为的侵害。
★ 116 📥 31,031
it-ops-security

OpenClaw Backup

alex3alex
备份与恢复 OpenClaw 数据。适用于创建备份、设置自动备份计划、从备份恢复或管理备份轮转。处理 ~/.openclaw 目录归档并包含适当的排除规则。
★ 90 📥 31,095
it-ops-security

Free Ride - Unlimited free AI

shaivpidadi
管理OpenClaw的OpenRouter免费AI模型,自动按质量排名模型,配置速率限制备用方案,并更新opencla...
★ 472 📥 78,674