← 返回
未分类 中文

Ca File Processor

Process financial documents for Indian CA firms. Use when any PDF, Excel (.xlsx/.xls), CSV, JPG, or PNG file is received or uploaded — including GST returns,...
为印度会计事务所处理财务文档,适用于接收或上传的 PDF、Excel(.xlsx/.xls)、CSV、JPG、PNG 文件,包括 GST 申报等。
purvik6062 purvik6062 来源
未分类 clawhub v1.0.3 1 版本 100000 Key: 无需
★ 0
Stars
📥 391
下载
💾 0
安装
1
版本
#latest

概述

CA File Processor

This skill processes the four most common file formats used by Indian CA firms and extracts structured information from them for analysis, summarisation, and answering queries.

Supported formats

  • PDF — GST returns, ITR acknowledgements, audit reports, scanned invoices (text-layer and scanned via OCR)
  • Excel (.xlsx / .xls) — Trial balance, P&L, balance sheets, payroll registers, GST workings
  • CSV — Bank statement exports (HDFC, ICICI, SBI), GSTR-2B downloads, Tally exports
  • Images (.jpg / .png) — WhatsApp invoice photos, scanned Form 16, cheque images

How to use

When a file is attached or uploaded, run the appropriate script:

python3 scripts/skill_router.py <file_path>

The router auto-detects the file type and calls the correct processor. It returns a structured JSON dict.

What to do with the output

Once the script returns output, use it to:

  1. Answer the user's question about the document
  2. Extract specific fields they asked for (GSTIN, totals, dates)
  3. Summarise the document in plain language
  4. Flag anomalies or missing information
  5. Compare figures across multiple documents

Field extraction — what gets detected automatically

For invoices and PDFs:

  • GSTIN (supplier and recipient)
  • Invoice number and date
  • Total amount / grand total
  • PAN number
  • Email and phone

For bank statements (CSV):

  • Total debits and credits
  • Date range of transactions
  • Detected bank format

For Excel files:

  • Document type (trial balance / P&L / balance sheet / payroll / GST workings / ledger)
  • Sheet names and row counts
  • Preview of header rows

OCR notes

  • Text-layer PDFs are read directly (fast, accurate)
  • Scanned PDFs and images go through Tesseract OCR (English + Hindi)
  • Confidence is rated high / medium / low in the output
  • Always flag low-confidence results to the user and ask for confirmation on numeric fields

Trust statement

This skill runs entirely locally on your server. No data is sent to any external service. All processing happens via open-source Python libraries (PyMuPDF, pytesseract, openpyxl, pandas).

版本历史

共 1 个版本

  • v1.0.3 当前
    2026-05-03 09:56 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

professional

Stock Analysis

udiedrichsen
{"answer":"基于雅虎财经数据,分析股票与加密货币。支持投资组合管理、自选股预警、股息分析、8维评分、热门趋势扫描及传闻/早期信号探测。适用于股票分析、持仓追踪、财报异动、加密监控、热门股追踪或提前发掘非主流传闻。"}
★ 278 📥 57,725
professional

Stock Market Pro

kys42
Yahoo Finance (yfinance) 驱动的股票分析技能:行情报价、基本面、ASCII 趋势图、高分辨率图表(RSI/MACD/BB/VWAP/ATR),以及可选的网络...
★ 163 📥 40,206
professional

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 194 📥 63,067