← 返回
未分类 中文

DocsForAI

Crawl and read documentation websites using DocsForAI. Use when you need to learn a new library, framework, or tool by reading its official docs; when you wa...
使用 DocsForAI 爬取并阅读文档网站。适用于需要阅读官方文档来学习新库、框架或工具的场景等。
dx2331lxz dx2331lxz 来源
未分类 clawhub v0.7.0 1 版本 100000 Key: 无需
★ 1
Stars
📥 418
下载
💾 0
安装
1
版本
#latest

概述

DocsForAI — Documentation Crawler Skill

Crawl any documentation website into structured, persistent Markdown files and read them on demand — so you always work from accurate, up-to-date documentation rather than training-data guesses.

Source: https://pypi.org/project/docsforai/ | https://github.com/dx2331lxz/DocsForAI | Latest: 0.6.0


Install (one-time)

uv tool install docsforai   # recommended: isolated, no system Python pollution
pip install --break-system-packages docsforai  # fallback if uv unavailable

Verify: docsforai --version


Core Principles

Always use multi-md format. It preserves the site's original chapter hierarchy as individual files, so you can navigate to exactly the section you need without loading the entire documentation into context.

Output rule: docsforai writes directly to // — no extra subdirectory is created.

Docs are persistent. Once crawled, they live on disk across sessions. Check before crawling; never re-crawl what already exists.


Workflow

Step 1 — Check if docs already exist

Before doing anything else, check both the local filesystem and MEMORY.md:

ls ~/.openclaw/workspace/skills/docsforai/docs/

Also look up the 「已下载文档(DocsForAI)」 section in MEMORY.md for a record of previously crawled sites and their paths.

If the site folder already exists → skip to Step 3.

Step 2 — Crawl (only if not already downloaded)

Always pass the skill's docs/ directory as -o. DocsForAI creates / inside it automatically.

docsforai crawl <URL> -f multi-md \
  -o ~/.openclaw/workspace/skills/docsforai/docs

Common examples:

URLSite nameFinal path
---------
https://vitepress.dev/guidevitepressdocs/vitepress/
https://docs.pydantic.devpydanticdocs/pydantic/
https://docusaurus.io/docsdocusaurusdocs/docusaurus/
https://react.dev/learnreactdocs/react/
https://docs.python.org/3pythondocs/python/

After crawling completes, proceed to Step 2b.

Step 2b — Record to MEMORY.md (required)

Append a row to the 「已下载文档(DocsForAI)」section in MEMORY.md. Create the section if it doesn't exist yet:

## 已下载文档(DocsForAI)

| Site | Local path | Crawled |
|---|---|---|
| vitepress | ~/.openclaw/workspace/skills/docsforai/docs/vitepress/ | 2026-04-02 |

Never overwrite existing rows — always append.

Step 3 — Map the structure

Before reading any file, get a full picture of the directory tree:

find ~/.openclaw/workspace/skills/docsforai/docs/<site-name> -name "*.md" | sort

Scan the output. Identify which subdirectories and files correspond to the topic you need. This costs nothing and saves you from loading irrelevant chapters.

Step 4 — Read on demand (the most important step)

Load only what is directly relevant to the current task. Follow this decision tree:

4a. You need a quick orientation

Read the top-level index first:

read ~/.openclaw/workspace/skills/docsforai/docs/<site-name>/index.md

4b. You know roughly what you need

Read the specific chapter file directly:

read ~/.openclaw/workspace/skills/docsforai/docs/<site-name>/guide/configuration.md
read ~/.openclaw/workspace/skills/docsforai/docs/<site-name>/reference/api.md

4c. You need to find where something is documented

Search across all files for a keyword, then read only the matching file:

# Find which file covers a specific topic
grep -rl "defineConfig\|plugin\|vite" \
  ~/.openclaw/workspace/skills/docsforai/docs/<site-name>/ | head -10

4d. You need to understand a full feature area

Read the section index, then follow up with the specific sub-pages you need:

# Read section overview
read ~/.openclaw/workspace/skills/docsforai/docs/<site-name>/guide/index.md

# Then read only the sub-pages that apply
read ~/.openclaw/workspace/skills/docsforai/docs/<site-name>/guide/routing.md

Rules:

  • Never read the entire docs tree in one go
  • Stop reading once you have enough to proceed
  • If you read something and it's not what you needed, search more precisely rather than loading more files

When to Consult Docs (decision guide)

Use this skill proactively whenever you are about to:

SituationAction
------
Use an API you haven't used in this sessionRead the relevant API reference page
Write configuration for a frameworkRead the configuration guide
Debug an unexpected behaviorSearch docs for the error or behavior, read matching section
Use a CLI tool you're unfamiliar withRead the CLI reference page
Implement a non-trivial featureRead the feature's guide page before writing code
Upgrade a library versionCheck migration or changelog docs first

Do not guess at API signatures, config options, or CLI flags when the docs are available on disk. A 2-second read beats a hallucinated parameter.


CLI Reference

# Standard crawl
docsforai crawl <URL> -f multi-md -o <output-dir>

# Force framework type (skip auto-detection)
docsforai crawl <URL> --type nextdocs -f multi-md -o <output-dir>
docsforai crawl <URL> --type mkdocs -f multi-md -o <output-dir>

# Polite crawling (for rate-sensitive sites)
docsforai crawl <URL> -f multi-md --concurrency 2 --delay 0.5 -o <output-dir>

# Limit pages (generic mode only)
docsforai crawl <URL> -f multi-md --max-pages 100 -o <output-dir>

Supported Frameworks (auto-detected)

FrameworkDetection signal
------
VitePress.VPSidebar CSS class / generator meta
Docsify$docsify global variable — fetches raw .md source
Mintlifyx-llms-txt response header — single request for full content
Docusaurusgenerator meta / .theme-doc-sidebar-container
mdBook#mdbook-sidebar / ol.chapter
MkDocsgenerator meta / .md-nav--primary (Material + default themes)
Starlight#starlight__sidebar / .sl-markdown-content
GitBookgenerator meta GitBook / sitemap-based discovery
NextDocs/_next/ assets + .mdx-content — sitemap discovery + sidebar fallback
Feishu Docsopen.feishu.cn domain — internal API
GenericBFS link traversal — fallback for any other site

Tips

  • Mintlify sites fetch everything in one request — near-instant
  • Cloudflare-protected sites — DocsForAI auto-retries with system curl
  • Count total pages: find ~/.openclaw/workspace/skills/docsforai/docs/ -name "*.md" | wc -l
  • Re-crawl to refresh: delete the site folder first, then crawl again

版本历史

共 1 个版本

  • v0.7.0 当前
    2026-05-03 08:27 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

knowledge-management

web-tools-guide

user_ec205dbb
MANDATORY before calling web_search, web_fetch, browser, or opencli. Contains required error-handling procedures (web_se
★ 81 📥 166,221
knowledge-management

Obsidian

steipete
操作 Obsidian 仓库(纯 Markdown 笔记)并通过 obsidian-cli 自动化。
★ 447 📥 105,567
knowledge-management

Baidu web search

ide-rea
使用百度AI搜索引擎(BDSE)进行网络搜索。适用于获取实时信息、文档资料或研究课题。
★ 246 📥 108,552