← 返回
未分类 中文

Accounting Skill

Process accounting documents — invoices (hóa đơn GTGT), purchase orders, and bank statements. Extract structured data from PDF (digital and scanned), JPG, an...
处理会计文档——发票(增值税发票)、采购订单和银行对账单。从 PDF(电子版和扫描件)、JPG 等格式中提取结构化数据。
dvnghiem dvnghiem 来源
未分类 clawhub v0.1.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 651
下载
💾 5
安装
1
版本
#latest

概述

Accounting Skill

Extract structured data from accounting documents (invoices, POs, bank statements) into Excel tracking sheets with JSON backups. Handles digital PDFs, scanned PDFs, and images via automatic OCR.

Prerequisites

Install system OCR dependencies before first use. See {baseDir}/references/ocr-setup.md for full guide.

# Ubuntu / Debian
sudo apt install tesseract-ocr tesseract-ocr-vie poppler-utils

# Verify
uv run {baseDir}/scripts/ocr_utils.py check

Quick Start

1. Classify an unknown document

uv run {baseDir}/scripts/classify_document.py /path/to/document.pdf

Returns JSON with type (invoice / po / statement / other), confidence, and a ready-to-run extraction command.

2. Extract an invoice

uv run {baseDir}/scripts/extract_invoice.py /path/to/invoice.pdf -o invoice_tracking.xlsx

Appends to the Excel tracking sheet. Use --dry-run to preview parsed data without writing.

3. Extract a bank statement

uv run {baseDir}/scripts/extract_statement.py /path/to/statement.pdf

Creates statement_{bank}_{date}.xlsx with transactions. Use -o to specify output path.

4. Extract a purchase order

uv run {baseDir}/scripts/extract_po.py /path/to/po.pdf -o po_tracking.xlsx

Tracks delivery dates and flags overdue/urgent POs.

5. Generate empty Excel templates

uv run {baseDir}/scripts/generate_templates.py all -o ~/accounting/

Creates blank tracking sheets: invoice_tracking.xlsx, po_tracking.xlsx, statement_template.xlsx.

Common Options (all extractors)

FlagDescription
-------------------
`--format excel\json\both`Output format (default: both)
--dry-runParse and validate only, print JSON to stdout
--json-dir DIRDirectory for JSON backup files
-o FILEOutput Excel file path

Workflow

Single Document

File → classify_document.py → route → extract_*.py → Excel + JSON

Batch Processing

For a folder of mixed documents, classify first, then route:

for f in /path/to/docs/*; do
  uv run {baseDir}/scripts/classify_document.py "$f" --output-dir ~/accounting/
done

Then run the suggested extraction commands from each classification result.

OCR Strategy

All scripts share {baseDir}/scripts/ocr_utils.py which auto-selects the best extraction method:

  1. Digital PDFs → pdfplumber (fast, no OCR needed)
  2. Scanned PDFs → pdf2image + pytesseract at 300 DPI (fallback when pdfplumber gets <50 chars/page)
  3. Images (JPG/PNG/TIFF) → pytesseract with grayscale preprocessing

Each result includes ocr_confidence and extraction_confidence percentages. Documents below 85% are automatically flagged needs_review.

Validation Rules

  • Invoices: Subtotal + VAT = Total (auto-checks math), duplicate detection by invoice number + vendor
  • Bank statements: Opening balance + credits − debits = closing balance
  • POs: Delivery date tracking with overdue/urgent alerts

Reference Documents

Read these for field schemas, Vietnamese format details, and validation logic:

  • {baseDir}/references/invoice-fields.md — Vietnamese VAT invoice fields, tax rates, patterns
  • {baseDir}/references/bank-formats.md — Vietnamese bank names, transaction formats, amount patterns
  • {baseDir}/references/po-fields.md — PO fields, delivery status logic, payment terms
  • {baseDir}/references/ocr-setup.md — OCR installation, troubleshooting, confidence scoring

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-03-30 22:54 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

professional

All-Market Financial Data Hub

financial-ai-analyst
基于东方财富数据库,支持自然语言查询金融数据,覆盖A股、港股、美股、基金、债券等资产,提供实时行情、公司信息、估值、财务报表等,适用于投资研究、交易复盘、市场监控、行业分析、信用研究、财报审计、资产配置等场景,满足机构与个人需求。返回结果为
★ 124 📥 41,669
professional

Stock Analysis

udiedrichsen
{"answer":"基于雅虎财经数据,分析股票与加密货币。支持投资组合管理、自选股预警、股息分析、8维评分、热门趋势扫描及传闻/早期信号探测。适用于股票分析、持仓追踪、财报异动、加密监控、热门股追踪或提前发掘非主流传闻。"}
★ 277 📥 57,562
office-efficiency

Google Drive Skill

dvnghiem
用于在公开 Google Drive 上列出、读取、创建、更新或删除文件/文件夹,仅处理只读...
★ 0 📥 663