← 返回
数据分析 中文

Markdown Docs Full-Text Search

Full-text search across structured Markdown documentation archives using SQLite FTS5. Use when you need to search large collections of Markdown articles that...
使用 SQLite FTS5 对结构化 Markdown 文档库进行全文搜索,适用于需要快速、高效搜索大量 Markdown 文章的场景。
carev01
数据分析 clawhub v1.0.2 1 版本 100000 Key: 无需
★ 0
Stars
📥 853
下载
💾 15
安装
1
版本
#latest

概述

Markdown Documentation Full-Text Search

Fast, indexed full-text search across Markdown documentation archives using SQLite FTS5 with BM25 relevance ranking.

When to Use

  • Searching documentation archives for specific features, capabilities, or information
  • Finding official source URLs to cite in reports
  • Looking up technical specifications or configuration details
  • Research across multiple documentation sources

Document Format Expected

Articles separated by --- delimiter with *Source: URL:

# Article Title

*Source: https://docs.example.com/path/to/article.html*

Article content here...

---

# Next Article Title

*Source: https://docs.example.com/another/article.html*

More content...

Quick Start

# 1. Index the documentation (one-time or when docs change)
scripts/docs.py index ./docs

# 2. Search
scripts/docs.py search "kubernetes backup" --max 5

# 3. Check index status
scripts/docs.py status

Primary Tool: docs.py

The unified CLI handles all operations:

Indexing

# Index documentation directory
scripts/docs.py index ./docs

# Force full rebuild
scripts/docs.py index ./docs --rebuild

# Custom database location
scripts/docs.py index ./docs --db /path/to/custom.db

Searching

# Basic search
scripts/docs.py search "kubernetes backup"

# Boolean operators
scripts/docs.py search "AWS AND S3 AND snapshot"

# Phrase search
scripts/docs.py search '"exact phrase match"'

# Prefix search
scripts/docs.py search "kube*"

# Exclude terms
scripts/docs.py search "backup NOT restore"

# Title-only search
scripts/docs.py search "kubernetes" --title-only

# Output formats
scripts/docs.py search "kubernetes" --format json
scripts/docs.py search "kubernetes" --format markdown

# More context around matches
scripts/docs.py search "kubernetes" --context 400

# Include full content in JSON
scripts/docs.py search "kubernetes" --format json --full-content

FTS5 Query Syntax

SyntaxMeaning
-----------------
term1 term2Documents with term1 OR term2 (ranked)
term1 AND term2Documents with both terms
term1 OR term2Documents with either term
"exact phrase"Exact phrase match
prefix*Words starting with prefix
term1 NOT term2term1 without term2
title:termSearch only titles

Getting Specific Articles

# Get article by partial URL or title
scripts/docs.py get "system_requirements" --full

# Find all matching articles
scripts/docs.py get "backup" --all

Status

# Check index statistics
scripts/docs.py status

Workflow for Research Tasks

Discovery Phase

# Check what's indexed
scripts/docs.py status

# Explore topics with broad searches
scripts/docs.py search "<feature>" --max 20

Research Phase

# Narrow down with boolean operators
scripts/docs.py search "<feature> AND <platform>"

# Find specific information
scripts/docs.py search "limitation OR restriction OR 'not supported'"

Citation Phase

Every search result includes the Source: URL — use this in your reports:

According to documentation, [finding]...

Source: https://docs.example.com/path/to/article.html

Multi-Source Setup

Each agent or project can have their own documentation and index:

~/docs/VendorA/
    ├── docs_part_01.md
    ├── docs.db      # Index lives with docs
    └── ...

~/docs/VendorB/
    ├── docs.md
    ├── docs.db
    └── ...

The docs.py script auto-detects the database location.

Advanced Scripts

For specialized needs:

  • scripts/fts_search.py — Direct FTS5 search with more options
  • scripts/index_docs.py — Standalone indexing
  • scripts/list_sources.py — List all source URLs
  • scripts/get_article.py — Direct article retrieval
  • scripts/search_docs.py — Regex-based search (no index needed)

Research Patterns

For common search patterns (feature research, architecture, security, etc.), see references/search-patterns.md.

Example Session

# What's available?
scripts/docs.py status
# Output: Files indexed: 37, Articles indexed: 32065

# Find information
scripts/docs.py search "kubernetes backup" --max 5

# Narrow to specific platform
scripts/docs.py search "kubernetes AND AWS" --max 5

# Find limitations
scripts/docs.py search "limitation OR 'not supported'"

# Get full article for citation
scripts/docs.py get "system_requirements" --full

Best Practices

  1. Index once, search many times — FTS5 is fast because it's indexed
  2. Use boolean operatorsAND, OR, NOT for precision
  3. Phrase search for exact terms"exact match" with quotes
  4. Always cite sources — Include Source: URLs in reports
  5. Rebuild periodically — Re-index when documentation updates
  6. Use JSON for analysis — Pipe to jq or other tools for processing

版本历史

共 1 个版本

  • v1.0.2 当前
    2026-03-29 20:22 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 368 📥 140,628
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 199 📥 65,181
data-analysis

Stock Analysis

udiedrichsen
{"answer":"基于雅虎财经数据,分析股票与加密货币。支持投资组合管理、自选股预警、股息分析、8维评分、热门趋势扫描及传闻/早期信号探测。适用于股票分析、持仓追踪、财报异动、加密监控、热门股追踪或提前发掘非主流传闻。"}
★ 270 📥 57,003