← 返回
未分类 中文

Data Analyzer

Load structured CSV, Excel, or JSON data to compute stats, detect anomalies, analyze trends and correlations, and generate summary reports with chart suggest...
加载结构化CSV、Excel或JSON数据,计算统计指标、检测异常、分析趋势与相关性,并生成摘要报告和图表建议。
zhaocaixia888 zhaocaixia888 来源
未分类 clawhub v1.1.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 107
下载
💾 0
安装
1
版本
#latest

概述

Data Analyzer — 数据分析工具

Load, analyze, and report on structured data from CSV, Excel, and JSON files. Compute statistics, detect anomalies, identify trends, and generate reports with visualization recommendations.

Workflow

1. Load data     → Read the file, inspect structure
2. Profile       → Column types, missing values, basic stats
3. Analyze       → Statistics, trends, anomalies, correlations
4. Report        → Summary with visual recommendations

Step 1 — Data Loading

Supported Formats

FormatHow to ReadNotes
:---:---:---
CSVRead the file directly, parse header row + data rowsCheck delimiter (comma, tab, semicolon). Handle quoted fields.
Excel (.xlsx)Read via openpyxl or pandas. If unavailable, convert to CSV first.Handle multiple sheets. Note which sheet was used.
JSONParse as structured objects. Detect if array-of-objects or object-of-arrays.Flatten nested structures where possible.
TSVSame as CSV with tab delimiter.

If Python is available (recommended for large datasets):

pip install pandas openpyxl  # if missing
python3 -c "
import pandas as pd
df = pd.read_csv('data.csv')
print(df.info())
print(df.describe())
print(df.head())
"

If Python is not available, parse manually:

  1. Read the file line by line
  2. Identify headers (first row)
  3. Identify column types (numeric vs text vs date)
  4. Store as an array of rows or objects

Initial Inspection

After loading, always answer these questions:

  • Shape: How many rows and columns?
  • Column names: What are they and what data types?
  • Missing values: Which columns have gaps, and how many?
  • Date/time columns: Are they parsed as datetime objects?
  • Unique values: For categorical columns, how many unique categories?

Step 2 — Descriptive Statistics

Numeric Columns

Compute and report:

StatisticWhat It Tells You
:---:---
CountNumber of non-null values
MeanAverage value
MedianMidpoint (50th percentile) — more robust than mean for skewed data
Std DevSpread around the mean
Min / MaxFull range
25th / 75th PercentileInterquartile range bounds
SkewnessSymmetry of the distribution. Positive = right tail, negative = left tail.

Formula reference (manual calculation):

Mean       = sum(x) / n
Median     = middle value when sorted
Std Dev    = sqrt(sum((x - mean)^2) / (n-1))
Percentile = sort values, take value at position (p/100 * n)

Categorical Columns

StatisticWhat It Tells You
:---:---
CountTotal non-null values
UniqueNumber of distinct categories
TopMost frequent category
FrequencyHow often the top category appears
DistributionShare of each category (as percentages)

Step 3 — Analysis

3a. Anomaly Detection

Method: IQR (Interquartile Range)

Q1 = 25th percentile
Q3 = 75th percentile
IQR = Q3 - Q1
Lower fence = Q1 - 1.5 * IQR
Upper fence = Q3 + 1.5 * IQR
Anomaly = any value outside [Lower fence, Upper fence]

Method: Z-Score (for approximately normal distributions)

z = (x - mean) / std_dev
Anomaly = |z| > 3 (values more than 3 std devs from mean)

Output anomalies: For each detected anomaly, report:

  • Row index
  • Column name
  • Anomalous value
  • Distance from expected (how many IQRs or std devs)

3b. Trend Analysis

For time-series data (data with a date/time column):

  1. Identify the time column — Sort by date
  2. Aggregate by period — Group by day/week/month/quarter/year
  3. Direction — Is the metric increasing, decreasing, or flat?
  4. Rate of change — Period-over-period percentage change
  5. Seasonality — Recurring patterns (monthly, quarterly, yearly)
  6. Breakout — Sudden jumps or drops (potential regime changes)

Output format:

📈 Trend: [Metric Name]
Period: [Date Range]
Direction: [Up/Down/Flat] (slope: ±X%)
Key Points:
- [Date]: Value = X (↗/↘/→)
- Highest point: [Date] = X
- Lowest point: [Date] = X

For non-time-series data, analyze rank order and distribution shape:

Top 5 by [metric]:
1. [Category] = X
2. [Category] = Y
...
Bottom 5 by [metric]:

3c. Correlation Analysis

Pearson correlation coefficient (for linear relationships between two numeric variables):

r = sum((x - mean_x) * (y - mean_y)) / (n * std_x * std_y)

Interpretation:

r valueStrengthDirection
:--------:---------:----------
0.7 to 1.0StrongPositive (both rise together)
0.3 to 0.7ModeratePositive
0 to 0.3WeakPositive
-0.3 to 0WeakNegative (one rises, other falls)
-0.7 to -0.3ModerateNegative
-1.0 to -0.7StrongNegative

Caveats:

  • Correlation ≠ causation. Always note this.
  • Pearson only captures linear relationships.
  • Outliers can distort correlation heavily — check after removing anomalies.

Step 4 — Report Generation

Visualization Recommendations

For each finding, recommend the best chart type:

Analysis TypeRecommended ChartWhy
:---:---:---
Distribution of one variableHistogramShows shape, skew, peaks
Comparison across categoriesBar chartEasy to compare magnitudes
Trend over timeLine chartEmphasizes direction and continuity
Relationship between 2 variablesScatter plotShows correlation, clusters, outliers
Part of a wholePie / Donut chartUse only for 2-5 categories
Composition over timeStacked area chartShows both total and parts
Rank orderHorizontal bar chartEasy to read sorted values
Comparing multiple distributionsBox plotShows median, IQR, outliers
Heatmap (correlation matrix)HeatmapQuick visual of many correlations

Full Report Template

# Data Analysis Report: [Dataset Name]
Date: [YYYY-MM-DD]

## 1. Overview
- Rows: X | Columns: Y
- Missing data: X cells (X%)
- Key columns: [list with types]

## 2. Descriptive Statistics
### Numeric Columns
[Table: col_name, count, mean, median, std, min, 25%, 75%, max]

### Categorical Columns
[Table: col_name, unique_count, top_value, frequency%]

## 3. Key Findings

### Finding 1: [Title]
[Description of finding]
📊 Recommended chart: [Chart type]
Supporting data: [stats/view]

### Finding 2: [Title]
...

## 4. Anomalies Detected
[Table: row, column, value, severity]

## 5. Correlations
[Notable correlations >|0.3| or < -|0.3|]

## 6. Recommendations
[Data-driven suggestions based on analysis]

One-Page Summary (Quick)

For quick results, use this compact format:

📊 [Dataset]: [N] rows × [M] cols

📈 Key metrics:
- [metric1]: mean=X, median=Y, range=[min, max]
- [metric2]: ...

🔍 Top findings:
1. [Finding] — [chart recommendation]
2. [Finding] — [chart recommendation]

⚠️ Anomalies: X detected

Python Script (Optional)

For complex analysis, create and run a Python script:

import csv, json, statistics
from collections import Counter

# Load data
with open('data.csv') as f:
    reader = csv.DictReader(f)
    rows = list(reader)

# Get numeric columns
# (column name → list of float values, filtering out blanks)
# Compute mean, median, stdev, percentiles
# Detect outliers via IQR
# Compute correlations between pairs
# Print formatted results

Run with:

python3 analysis.py

版本历史

共 1 个版本

  • v1.1.1 当前
    2026-06-04 14:20

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

🔗 相关推荐

data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 276 📥 101,379
data-analysis

AdMapix

fly0pants
AdMapix 原始数据层,提供广告创意、应用、排名、下载/收入及市场元数据。返回 AdMapix API 的结构化 JSON;调用方...
★ 297 📥 142,705
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 214 📥 71,212