← 返回
数据分析 中文

Pandas

Analyze, transform, and clean DataFrames with efficient patterns for filtering, grouping, merging, and pivoting.
使用高效模式分析、转换和清理DataFrame,支持过滤、分组、合并及透视操作。
ivangdavila
数据分析 clawhub v1.0.1 1 版本 99786 Key: 无需
★ 0
Stars
📥 1,865
下载
💾 230
安装
1
版本
#latest

概述

Setup

On first use, create ~/pandas/ and read setup.md for initialization. User preferences are stored in ~/pandas/memory.md — users can view or edit this file anytime.

When to Use

User needs to work with tabular data in Python. Agent handles DataFrame operations, data cleaning, aggregations, merges, pivots, and exports.

Architecture

Memory lives in ~/pandas/. See memory-template.md for structure.

~/pandas/
├── memory.md     # User preferences and common patterns
└── snippets/     # Saved code patterns (optional)

Quick Reference

TopicFile
-------------
Setup processsetup.md
Memory templatememory-template.md

Core Rules

1. Use Vectorized Operations

  • NEVER iterate with for loops over DataFrame rows
  • Use .apply() only when vectorized alternatives don't exist
  • Prefer df['col'].str.method() over apply(lambda x: x.method())

2. Chain Methods for Readability

# Good: method chaining
result = (df
    .query('age > 30')
    .groupby('city')
    .agg({'salary': 'mean'})
    .reset_index())

# Bad: intermediate variables everywhere
filtered = df[df['age'] > 30]
grouped = filtered.groupby('city')
result = grouped.agg({'salary': 'mean'}).reset_index()

3. Handle Missing Data Explicitly

  • Always check df.isna().sum() before analysis
  • Choose strategy: dropna(), fillna(), or interpolation
  • Document WHY missing values exist before removing them

4. Use Categorical for Repeated Strings

# Memory savings for columns with few unique values
df['status'] = df['status'].astype('category')
df['country'] = df['country'].astype('category')

5. Merge with Validation

# Always specify how and validate
result = pd.merge(
    df1, df2,
    on='id',
    how='left',
    validate='m:1'  # Many-to-one: catch unexpected duplicates
)

6. Prefer query() for Complex Filters

# Readable
df.query('age > 30 and city == "NYC" and salary < 100000')

# Hard to read
df[(df['age'] > 30) & (df['city'] == 'NYC') & (df['salary'] < 100000)]

7. Set Index When Appropriate

# Faster lookups, cleaner merges
df = df.set_index('user_id')
user_data = df.loc[12345]  # O(1) lookup

Common Traps

  • SettingWithCopyWarning → Use .loc[] for assignment: df.loc[mask, 'col'] = value
  • Slow loops → Replace iterrows() with vectorized ops or apply()
  • Memory explosion → Use dtype in read_csv(): pd.read_csv(f, dtype={'id': 'int32'})
  • Silent data loss → Check shape before/after merge: print(f"Before: {len(df1)}, After: {len(result)}")
  • Index confusion → Use reset_index() after groupby() to get clean DataFrame
  • Chained indexingdf['a']['b'] fails silently; use df.loc[:, ['a', 'b']]

Security & Privacy

Data storage:

  • User preferences stored in ~/pandas/memory.md
  • All DataFrame operations run locally
  • No data is sent externally

This skill does NOT:

  • Upload data to any service
  • Access files outside ~/pandas/ and the working directory
  • Modify source data files without explicit instruction

User control:

  • View stored preferences: cat ~/pandas/memory.md
  • Clear all data: rm -rf ~/pandas/

Related Skills

Install with clawhub install if user confirms:

  • data-analysis — general data analysis patterns
  • csv — CSV file handling
  • sql — database queries
  • excel-xlsx — Excel file operations

Feedback

  • If useful: clawhub star pandas
  • Stay updated: clawhub sync

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-03-29 07:50 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Stock Analysis

udiedrichsen
{"answer":"基于雅虎财经数据,分析股票与加密货币。支持投资组合管理、自选股预警、股息分析、8维评分、热门趋势扫描及传闻/早期信号探测。适用于股票分析、持仓追踪、财报异动、加密监控、热门股追踪或提前发掘非主流传闻。"}
★ 270 📥 56,928
data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 164 📥 59,860
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,355 📥 317,996