概述

ETL

Extract-Transform-Load data toolkit (v2.0.0). Record and manage data pipeline activities across the full ETL lifecycle — ingest, transform, query, filter, aggregate, visualize, export, sample, schema definition, validation, pipeline orchestration, and data profiling. Each command logs timestamped entries to its own log file, giving you a structured record of all data operations.

Commands

Command	Description
---------	-------------
`etl ingest`	Record a data ingestion event (source, format, row count, etc.). Without args, shows recent ingest entries.
`etl transform`	Log a transformation step (column rename, type cast, normalization, etc.). Without args, shows recent transforms.
`etl query`	Record a query operation or SQL statement. Without args, shows recent queries.
`etl filter`	Log a filtering rule or condition applied to data. Without args, shows recent filters.
`etl aggregate`	Record an aggregation step (GROUP BY, SUM, AVG, etc.). Without args, shows recent aggregations.
`etl visualize`	Log a visualization request or chart configuration. Without args, shows recent visualizations.
`etl export`	Record an export operation (destination, format, row count). Without args, shows recent exports.
`etl sample`	Log a data sampling step (sample size, method, seed). Without args, shows recent samples.
`etl schema`	Record a schema definition or schema change. Without args, shows recent schema entries.
`etl validate`	Log a data validation rule or result. Without args, shows recent validations.
`etl pipeline`	Record a pipeline configuration or execution step. Without args, shows recent pipeline entries.
`etl profile`	Log a data profiling result (null counts, distributions, anomalies). Without args, shows recent profiles.
`etl stats`	Show summary statistics: entry counts per category, total entries, data size, and earliest record date.
`etl export`	Export all logged data to a file. Supported formats: `json`, `csv`, `txt`. (Note: this is a different code path from the `export` log command — it exports the tool's own data.)
`etl search`	Search across all log files for a keyword (case-insensitive).
`etl recent`	Show the 20 most recent entries from the activity history log.
`etl status`	Health check: version, data directory, total entries, disk usage, last activity.
`etl help`	Show the built-in help with all available commands.
`etl version`	Print the current version (v2.0.0).

Data Storage

All data is stored as plain-text log files in ~/.local/share/etl/:

Per-command logs — Each command (ingest, transform, query, etc.) writes to its own .log file (e.g., ingest.log, transform.log).
History log — Every operation is also appended to history.log with a timestamp and command name.
Export files — Generated in the same directory as export.json, export.csv, or export.txt.

Entries are stored in timestamp|value format, making them easy to grep, parse, or pipe into downstream tools.

Requirements

Bash 4.0+ (uses set -euo pipefail)
coreutils — date, wc, du, head, tail, grep, basename, cut
No external dependencies, API keys, or network access required
Works fully offline on any POSIX-compatible system

When to Use

Logging data pipeline steps — Record each stage of your ETL process (ingest → transform → validate → export) with timestamps, creating a complete audit trail of data movements.
Schema management and validation — Use schema to document table structures and validate to log data quality rules and their pass/fail results.
Data profiling and exploration — Use profile to record column statistics, null rates, and distribution anomalies; use sample to log sampling parameters for reproducibility.
Pipeline orchestration tracking — Use pipeline to record multi-step workflow configurations, execution order, and dependencies between ETL stages.
Cross-team data operations review — Run stats for aggregate counts, search to find specific operations by keyword, and export json to share pipeline logs with team members or load into dashboards.

Examples

# Log a data ingestion from S3
etl ingest "s3://data-lake/raw/users_2024.csv — 1.2M rows, CSV format"

# Record a transformation step
etl transform "Normalize email to lowercase, cast created_at to UTC timestamp"

# Log a validation rule
etl validate "NOT NULL check on user_id: 0 violations out of 1,200,000 rows"

# Record schema for a new table
etl schema "users_dim: id INT PK, email VARCHAR(255), created_at TIMESTAMP, country CHAR(2)"

# Define a pipeline
etl pipeline "daily_user_load: ingest(s3) -> dedupe -> validate -> load(postgres)"

# Search for anything related to 'users'
etl search users

# Export all ETL logs to CSV for analysis
etl export csv

# View summary statistics
etl stats

# Check system health
etl status

Tips

Run any data command without arguments to see recent entries (e.g., etl ingest shows the last 20 ingest entries).
Use etl recent for a quick overview of all activity across all categories.
Combine with cron to auto-log pipeline runs: 0 2 * etl pipeline "nightly_load completed at $(date)"
Back up your data by copying ~/.local/share/etl/ to your preferred backup location.

Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

版本历史

共 2 个版本

v2.0.1 当前

2026-03-29 15:00 安全安全
v1.0.5

2026-03-19 08:40

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)