← 返回
未分类 中文

universal-data-analyst-en

Performs automated, LLM-driven data analysis including loading, validation, method selection, script generation, execution, and comprehensive reporting for d...
执行自动化、LLM驱动的数据分析,包括数据加载、验证、方法选择、脚本生成、执行及综合报告。
yamaz49 yamaz49 来源
未分类 clawhub v1.0.3 1 版本 100000 Key: 无需
★ 0
Stars
📥 429
下载
💾 0
安装
1
版本
#latest

概述

Universal Data Analyst (通用数据分析专家)

Introduction

An intelligent data analysis skill based on Data Ontology. Unlike keyword-based approaches, this skill uses LLM reasoning for every analysis, automatically identifying data types, selecting analysis methods, generating scripts, and outputting reports.

Supports both economic data (retail, subscription, finance, etc.) and non-economic data (scientific measurements, social networks, text, etc.), handling multiple formats including CSV, Excel, Parquet, JSON, and more.


How to Trigger

Simply upload a data file or send any of these types of messages:

  • "Help me analyze this data"
  • "What patterns are in this CSV?"
  • "Explore this dataset"
  • "Check the data quality for me"
  • Directly upload .csv / .xlsx / .parquet / .json files

Core Design: Four-Layer Analysis Framework

Layer 1: Data Ontology
        ↓  What kind of existence is this? Entity type? Generation mechanism?
Layer 2: Problem Typology
        ↓  Descriptive / Diagnostic / Predictive / Prescriptive / Causal?
Layer 3: Methodology Mapping
        ↓  Match domain-recognized analysis frameworks
Layer 4: Validation & Output
           Data quality report + Analysis scripts + HTML/MD reports

Each layer invokes LLM reasoning without any hardcoded rules.


Analysis Workflow (7 Steps)

StepContentDescription
----------------------------
1Data LoadingAuto-recognize formats, support multiple file types
2Ontology RecognitionLLM judges entity type and generation mechanism
3Quality ValidationAuto-detect missing values, outliers, duplicates, output quality score
4Plan GenerationLLM selects analysis framework and path based on user intent
5Script GenerationLLM generates executable Python analysis scripts
6Execute AnalysisRun scripts, generate charts and numerical results
7Comprehensive ReportOutput HTML + Markdown dual-format reports

Flow Health Monitoring (NEW)

Each step has status tracking and error handling:

  • Step Dependency Check - Automatically prevents subsequent steps when prerequisites fail
  • Clear Error Messages - Provides explicit failure reasons and fix suggestions
  • Flow Health Report - Outputs complete execution status and issue summary

If a step fails, you'll see:

⚠️ Flow Interrupted!
   Reason: Critical step 'Data Loading' failed: Encoding error

Fix Suggestions:
  1. File encoding may not be UTF-8, try manually specifying encoding parameter
  2. Common Chinese encodings: gbk, gb2312, gb18030

Supported Data Types

Economic Data

Data CharacteristicsRecognized AsAuto-matched Framework
----------------------------------------------------------
Orders + Price + SKURetail EconomyValue Chain / ABC-XYZ / RFM
User + Subscription Cycle + ChurnSubscription EconomyLTV / Cohort / Retention Curves
Click / Add-to-cart / Purchase EventsAttention EconomyFunnel Analysis / AARRR
GMV + Platform MatchingCommission EconomyTwo-sided Network Effects / Unit Economics
Position + Skills + SalaryLabor MarketSkill Premium / Experience Elasticity
OHLCV Price DataFinancial Time SeriesTechnical Analysis / Volatility Models

Non-Economic Data

Data TypeAuto-matched Framework
---------------------------------
Sensors / Time Series ContinuousTime Series Decomposition, Extreme Value Analysis
Social / Network RelationshipCentrality Analysis, Community Detection
Geographic / SpatialSpatial Autocorrelation, Hotspot Analysis
Text CorpusTopic Modeling, Sentiment Analysis
BiomedicalSurvival Analysis, Differential Expression

Supported File Formats

  • CSV / TSV (.csv, .tsv, .txt) - Auto encoding detection, supports utf-8, gbk, latin1, etc.
  • Excel (.xlsx, .xls)
  • Parquet (.parquet, .pq)
  • JSON (.json)
  • SQL Database (via connection string)

Encoding Fault Tolerance

CSV loading automatically tries multiple encodings:

  • Auto encoding detection (if chardet library available)
  • Fallback encodings: utf-8, utf-8-sig, gbk, gb2312, gb18030, latin1, etc.
  • Engine fallback: Auto-switches to Python engine when C engine fails, skipping corrupted rows

Output Contents

Each analysis generates:

session_YYYYMMDD_HHMMSS/
├── step2_ontology_prompt.txt     # Ontology recognition prompts (reusable)
├── step3_validation_report.json  # Data quality report
├── step3_cleaning_report.txt     # Data cleaning recommendations
├── step4_planning_prompt.txt     # Analysis planning prompts (reusable)
├── step5_script_prompt.txt       # Script generation prompts (reusable)
├── analysis_report.html          # Comprehensive HTML report (with charts)
├── analysis_report.md            # Markdown report
└── charts/                       # All analysis charts (PNG)

Usage Examples

Example 1: Analyzing E-commerce Sales Data

User: Help me analyze this sales data, want to know which products sell well and which customers are high-value

[Upload orders.csv]

Skill automatically:

  1. Recognizes as "Retail Economy × Transaction/Event Data"
  2. Selects RFM Customer Value Analysis + ABC Product Classification framework
  3. Generates and executes analysis scripts
  4. Outputs customer segmentation distribution, product sales ranking, RFM heatmap, and HTML report

Example 2: Analyzing User Behavior Logs

User: This is our App's user behavior log, want to see the user conversion funnel

[Upload events.csv]

Skill automatically:

  1. Recognizes as "Attention/Conversion Economy × Event Sequence Data"
  2. Selects Funnel Analysis + Session Sequence Mining framework
  3. Outputs conversion rates at each step, churn node analysis, user path Sankey diagram

Example 3: Analyzing Meteorological Observation Data

User: Help me analyze this weather station observation record, understand temperature and precipitation patterns

[Upload weather.csv]

Skill automatically:

  1. Recognizes as "Earth Science × Time Series/Trajectory Data × Instrument Measurement"
  2. Selects Time Series Decomposition + Seasonality Analysis + Extreme Value Statistics framework
  3. Outputs trend charts, seasonal decomposition charts, outlier reports

Dependencies

pandas >= 1.3
numpy >= 1.21
matplotlib >= 3.4
seaborn >= 0.11
scipy >= 1.7
openpyxl >= 3.0   # Excel support
chardet >= 4.0    # Auto encoding detection (optional but recommended)
pyarrow >= 6.0    # Parquet support (optional)
sqlalchemy >= 1.4 # SQL support (optional)

Version

v1.1.0 · Author: Claude · License: CC BY-NC-SA 4.0

v1.1.0 Updates (2026-03-23)

  1. Flow Health Monitoring - Added step status tracking, dependency checks, error messages
  2. Enhanced Encoding Fault Tolerance - Auto-try multiple encodings for CSV/TSV (utf-8, gbk, latin1, etc.)
  3. Engine Fallback - Auto-switches to Python engine when C engine fails, skipping corrupted rows

v1.0.0

  • Initial version: Four-layer analysis framework + 7-step analysis workflow

版本历史

共 1 个版本

  • v1.0.3 当前
    2026-05-03 09:06 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 207 📥 67,152
data-analysis

AdMapix

fly0pants
AdMapix 原始数据层,提供广告创意、应用、排名、下载/收入及市场元数据。返回 AdMapix API 的结构化 JSON;调用方...
★ 296 📥 138,204
data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 272 📥 100,032