← 返回
开发者工具 中文

Etl

Build ETL pipelines with data ingestion, cleaning, and validation steps. Use when ingesting sources, transforming formats, validating data, or scheduling loads.
构建包含数据摄入、清洗和验证步骤的ETL管道。适用于摄入数据源、转换格式、验证数据或调度加载任务。
bytesagain1
开发者工具 clawhub v2.0.1 2 版本 100000 Key: 无需
★ 0
Stars
📥 741
下载
💾 7
安装
2
版本
#latest

概述

ETL

Extract-Transform-Load data toolkit (v2.0.0). Record and manage data pipeline activities across the full ETL lifecycle — ingest, transform, query, filter, aggregate, visualize, export, sample, schema definition, validation, pipeline orchestration, and data profiling. Each command logs timestamped entries to its own log file, giving you a structured record of all data operations.

Commands

CommandDescription
----------------------
etl ingest Record a data ingestion event (source, format, row count, etc.). Without args, shows recent ingest entries.
etl transform Log a transformation step (column rename, type cast, normalization, etc.). Without args, shows recent transforms.
etl query Record a query operation or SQL statement. Without args, shows recent queries.
etl filter Log a filtering rule or condition applied to data. Without args, shows recent filters.
etl aggregate Record an aggregation step (GROUP BY, SUM, AVG, etc.). Without args, shows recent aggregations.
etl visualize Log a visualization request or chart configuration. Without args, shows recent visualizations.
etl export Record an export operation (destination, format, row count). Without args, shows recent exports.
etl sample Log a data sampling step (sample size, method, seed). Without args, shows recent samples.
etl schema Record a schema definition or schema change. Without args, shows recent schema entries.
etl validate Log a data validation rule or result. Without args, shows recent validations.
etl pipeline Record a pipeline configuration or execution step. Without args, shows recent pipeline entries.
etl profile Log a data profiling result (null counts, distributions, anomalies). Without args, shows recent profiles.
etl statsShow summary statistics: entry counts per category, total entries, data size, and earliest record date.
etl export Export all logged data to a file. Supported formats: json, csv, txt. (Note: this is a different code path from the export log command — it exports the tool's own data.)
etl search Search across all log files for a keyword (case-insensitive).
etl recentShow the 20 most recent entries from the activity history log.
etl statusHealth check: version, data directory, total entries, disk usage, last activity.
etl helpShow the built-in help with all available commands.
etl versionPrint the current version (v2.0.0).

Data Storage

All data is stored as plain-text log files in ~/.local/share/etl/:

  • Per-command logs — Each command (ingest, transform, query, etc.) writes to its own .log file (e.g., ingest.log, transform.log).
  • History log — Every operation is also appended to history.log with a timestamp and command name.
  • Export files — Generated in the same directory as export.json, export.csv, or export.txt.

Entries are stored in timestamp|value format, making them easy to grep, parse, or pipe into downstream tools.

Requirements

  • Bash 4.0+ (uses set -euo pipefail)
  • coreutilsdate, wc, du, head, tail, grep, basename, cut
  • No external dependencies, API keys, or network access required
  • Works fully offline on any POSIX-compatible system

When to Use

  1. Logging data pipeline steps — Record each stage of your ETL process (ingest → transform → validate → export) with timestamps, creating a complete audit trail of data movements.
  2. Schema management and validation — Use schema to document table structures and validate to log data quality rules and their pass/fail results.
  3. Data profiling and exploration — Use profile to record column statistics, null rates, and distribution anomalies; use sample to log sampling parameters for reproducibility.
  4. Pipeline orchestration tracking — Use pipeline to record multi-step workflow configurations, execution order, and dependencies between ETL stages.
  5. Cross-team data operations review — Run stats for aggregate counts, search to find specific operations by keyword, and export json to share pipeline logs with team members or load into dashboards.

Examples

# Log a data ingestion from S3
etl ingest "s3://data-lake/raw/users_2024.csv — 1.2M rows, CSV format"

# Record a transformation step
etl transform "Normalize email to lowercase, cast created_at to UTC timestamp"

# Log a validation rule
etl validate "NOT NULL check on user_id: 0 violations out of 1,200,000 rows"

# Record schema for a new table
etl schema "users_dim: id INT PK, email VARCHAR(255), created_at TIMESTAMP, country CHAR(2)"

# Define a pipeline
etl pipeline "daily_user_load: ingest(s3) -> dedupe -> validate -> load(postgres)"

# Search for anything related to 'users'
etl search users

# Export all ETL logs to CSV for analysis
etl export csv

# View summary statistics
etl stats

# Check system health
etl status

Tips

  • Run any data command without arguments to see recent entries (e.g., etl ingest shows the last 20 ingest entries).
  • Use etl recent for a quick overview of all activity across all categories.
  • Combine with cron to auto-log pipeline runs: 0 2 * etl pipeline "nightly_load completed at $(date)"
  • Back up your data by copying ~/.local/share/etl/ to your preferred backup location.

Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

版本历史

共 2 个版本

  • v2.0.1 当前
    2026-03-29 15:00 安全 安全
  • v1.0.5
    2026-03-19 08:40

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Cad

bytesagain1
CAD参考工具
★ 0 📥 2,749
developer-tools

CodeConductor.ai

larsonreever
AI驱动平台,提供快速全栈开发、智能体、工作流自动化及低代码AI集成的可扩展产品创建。
★ 68 📥 180,488
developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 672 📥 324,544