← 返回
未分类 中文

Data

Work with data across the full lifecycle from extraction and cleaning to analysis, visualization, and reporting.
负责数据全生命周期工作,包括提取、清洗、分析、可视化与报告。
ivangdavila ivangdavila 来源
未分类 clawhub v1.0.1 1 版本 99809 Key: 无需
★ 2
Stars
📥 1,005
下载
💾 2
安装
1
版本
#latest

概述

When to Use

User needs to: extract data from sources (databases, APIs, files), clean and transform messy datasets, analyze and find patterns, visualize results, or automate recurring data tasks. Agent handles the full data workflow.

Quick Reference

AreaFileFocus
-------------------
Querying & Extractionquerying.mdSQL generation, API fetching, multi-source
Cleaning & Transformationcleaning.mdNulls, duplicates, normalization, joins
Analysis & Statisticsanalysis.mdEDA, statistical tests, insights
Visualization & Reportingvisualization.mdCharts, dashboards, exports
Quality & Validationquality.mdData checks, anomaly detection, drift
Workflow Patternspatterns.mdCommon data workflows, automation

Core Operations

Query generation: User describes what data they need → Agent writes SQL/query, handles joins, filters, aggregations → Returns results or explains execution plan.

Data cleaning: Load messy dataset → Detect issues (nulls, duplicates, outliers, inconsistent formats) → Apply appropriate fixes → Document transformations.

Exploratory analysis: New dataset arrives → Generate descriptive stats, distributions, correlations → Surface interesting patterns and anomalies → Produce summary with key findings.

Visualization: Analysis complete → Generate appropriate chart type → Export in requested format (PNG, SVG, interactive HTML) → Ready for stakeholders.

Recurring reports: Define report once → Agent runs on schedule → Updates charts and metrics → Delivers summary with highlights.

Critical Rules

  • Always preview transformations before applying — show sample of what will change
  • Document every data transformation with source, operation, and rationale
  • Validate data types and ranges before analysis — garbage in, garbage out
  • Use appropriate statistical tests — check assumptions first
  • Generate reproducible outputs — include seeds, versions, timestamps
  • Handle missing data explicitly — document chosen strategy (drop, impute, flag)
  • Match chart type to data type — categorical, continuous, time series

User Modes

ModeFocusTrigger
----------------------
AnalystSQL, exploration, insights"What does this data tell us?"
EngineerPipelines, transformations, quality"Clean this and load it there"
BusinessKPIs, dashboards, plain language"How are we doing vs last quarter?"
ResearcherStatistical rigor, reproducibility"Is this difference significant?"
DeveloperSchema design, API data, types"Generate types from this JSON"

See patterns.md for workflows per mode.

On First Use

  1. Identify data source (database, file, API)
  2. Establish connection or load file
  3. Initial EDA — shape, types, quality issues
  4. Clean and transform as needed
  5. Analyze or visualize per user goal

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-12 05:23 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 273 📥 100,256
office-efficiency

Word / DOCX

ivangdavila
创建、检查和编辑 Microsoft Word 文档及 DOCX 文件,支持样式、编号、修订记录、表格、分节符及兼容性检查等功能。
★ 457 📥 152,042
ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,396 📥 322,719