← 返回
数据分析 中文

Boof

Convert PDFs and documents to markdown, index them locally for RAG retrieval, and analyze them token-efficiently. Use when asked to: read/analyze/summarize a...
将PDF和文档转换为Markdown,在本地建立索引用于RAG检索,并以token高效方式进行分析。适用于需要读取/分析/摘要等任务。
chiefsegundo
数据分析 clawhub v4.0.0 2 版本 99934.1 Key: 无需
★ 0
Stars
📥 1,516
下载
💾 84
安装
2
版本
#documents#latest#pdf#rag#research

概述

Boof 🍑

Local-first document processing: PDF → markdown → RAG index → token-efficient analysis.

Documents stay local. Only relevant chunks go to the LLM. Maximum knowledge absorption, minimum token burn.

Powered by opendataloader-pdf — #1 in PDF parsing benchmarks (0.90 overall, 0.93 table accuracy). CPU-only, no GPU required.

Quick Reference

Convert + index a document

bash {SKILL_DIR}/scripts/boof.sh /path/to/document.pdf

Convert with custom collection name

bash {SKILL_DIR}/scripts/boof.sh /path/to/document.pdf --collection my-project

Query indexed content

qmd query "your question" -c collection-name

Core Workflow

  1. Boof it: Run boof.sh on a PDF. This converts it to markdown via opendataloader-pdf (local Java engine, no API, no GPU) and indexes it into QMD for semantic search.
  1. Query it: Use qmd query to retrieve only the relevant chunks. Send those chunks to the LLM — not the entire document.
  1. Analyze it: The LLM sees focused, relevant excerpts. No wasted tokens, no lost-in-the-middle problems.

When to Use Each Approach

"Analyze this specific aspect of the paper" → Boof + query (cheapest, most focused)

"Summarize this entire document" → Boof, then read the markdown section by section. Summarize each section individually, then merge summaries. See advanced-usage.md.

"Compare findings across multiple papers" → Boof all papers into one collection, then query across them.

"Find where the paper discusses X"qmd search "X" -c collection for exact match, qmd query "X" -c collection for semantic match.

Output Location

Converted markdown files are saved to knowledge/boofed/ by default (override with --output-dir).

Setup

If boof.sh reports missing dependencies, see setup-guide.md for installation instructions (Java + opendataloader-pdf + QMD).

Environment

  • ODL_ENV — Path to opendataloader-pdf Python venv (default: ~/.openclaw/tools/odl-env)
  • QMD_BIN — Path to qmd binary (default: ~/.bun/bin/qmd)
  • BOOF_OUTPUT_DIR — Default output directory (default: ~/.openclaw/workspace/knowledge/boofed)

版本历史

共 2 个版本

  • v4.0.0 当前
    2026-05-03 02:52 安全 安全
  • v1.0.0
    2026-03-29 04:11 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 199 📥 65,226
data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 166 📥 60,181
data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 368 📥 140,750