← 返回
数据分析 中文

Robots.txt Generator

Generate, validate, and analyze robots.txt files for websites. Use when creating robots.txt from scratch, validating existing robots.txt syntax, checking if...
Generate, validate, and analyze robots.txt files for websites. Use when creating robots.txt from scratch, validating existing robots.txt syntax, checking if...
johnnywang2001
数据分析 clawhub v1.0.0 1 版本 99829.9 Key: 无需
★ 0
Stars
📥 587
下载
💾 10
安装
1
版本
#crawler#latest#robots#seo#web

概述

robots-txt-gen

Generate, validate, and test robots.txt files from the command line.

Quick Start

# Generate a robots.txt for a platform
python3 scripts/robots_txt_gen.py generate --preset nextjs --sitemap https://example.com/sitemap.xml

# Validate an existing robots.txt
python3 scripts/robots_txt_gen.py validate --file robots.txt

# Validate a remote robots.txt
python3 scripts/robots_txt_gen.py validate --url https://example.com/robots.txt

# Test if a URL is allowed for a user-agent
python3 scripts/robots_txt_gen.py test --file robots.txt --url /admin/dashboard --agent Googlebot

# Generate with custom rules
python3 scripts/robots_txt_gen.py generate --allow "/" --disallow "/admin" --disallow "/api" --disallow "/private" --sitemap https://example.com/sitemap.xml --agent "*"

Commands

generate

Create a robots.txt file with custom rules or platform presets.

Options:

  • --preset — Use a platform preset: wordpress, nextjs, django, rails, laravel, static, spa, ecommerce
  • --agent — User-agent (default: *). Repeat for multiple agents.
  • --allow — Allow path. Repeatable.
  • --disallow — Disallow path. Repeatable.
  • --sitemap — Sitemap URL. Repeatable.
  • --crawl-delay — Crawl delay directive.
  • --block-ai — Add rules to block common AI crawlers (GPTBot, ChatGPT-User, CCBot, Google-Extended, anthropic-ai, etc.)
  • --output — Write to file instead of stdout.

validate

Check a robots.txt file for syntax errors and best-practice warnings.

Options:

  • --file — Local file to validate.
  • --url — Remote robots.txt URL to fetch and validate.

test

Test whether a specific URL path is allowed or disallowed for a given user-agent.

Options:

  • --file — robots.txt file to test against.
  • --url — URL path to test (e.g., /admin/login).
  • --agent — User-agent to test as (default: Googlebot).

Platform Presets

PresetWhat it blocksNotes
------------------------------
wordpress/wp-admin/, /wp-includes/, query paramsAllows /wp-admin/admin-ajax.php
nextjs/_next/static/, /api/, /.next/Standard Next.js paths
django/admin/, /static/admin/, /media/private/Django admin and private media
rails/admin/, /assets/, /tmp/Rails conventions
laravel/admin/, /storage/, /vendor/Laravel conventions
staticNothing blockedSimple allow-all with sitemap
spa/api/, /assets/Single-page app pattern
ecommerce/cart/, /checkout/, /account/, /search?Prevents crawling user sessions

AI Crawler Blocking

The --block-ai flag adds disallow rules for known AI training crawlers:

  • GPTBot, ChatGPT-User (OpenAI)
  • Google-Extended (Google AI)
  • CCBot (Common Crawl)
  • anthropic-ai (Anthropic)
  • Bytespider (ByteDance)
  • ClaudeBot (Anthropic)
  • FacebookBot (Meta)

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-19 20:58 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 165 📥 60,015
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 198 📥 65,120
productivity

Agent Invoice Generator

johnnywang2001
根据自然语言或结构化数据生成专业的PDF发票,适用于用户请求创建发票、向客户计费或生成收据等场景。
★ 0 📥 651