概述

data-pipeline-builder

> Build data pipelines without framework expertise. Extract from any source, transform with code, load to any destination — all with natural language commands.

What It Does

Extract data — From databases, APIs, files, S3, GCS, Kafka
Transform — Filters, mappings, aggregations, joins, custom code
Load — To databases, data warehouses, files, APIs
Schedule — Cron-based or event-triggered execution
Monitor — Pipeline status, throughput, error rates
Validate — Schema checks, data quality rules

Quick Start

# 1. Create a simple pipeline
create pipeline from mysql users to postgres users_backup

# 2. Add transformation
add transform to users-backup: filter where active = true

# 3. Schedule it
schedule users-backup daily at 2:00 AM

# 4. Run and monitor
run pipeline users-backup
check pipeline status

Common Use Cases

🔄 Database Synchronization

# Sync production to analytics warehouse
create pipeline from mysql production.orders \
  to bigquery analytics.orders

# Run incremental sync every hour
schedule orders-sync hourly

📊 API Data Extraction

# Pull data from REST API
create pipeline from api https://api.shop.com/orders \
  to postgres analytics.orders

# Add authentication
set source auth: bearer token xxx

🧹 Data Cleaning

# Clean and transform data
create pipeline from csv raw_data.csv to postgres clean_data

add transform: \
  remove duplicates on email \
  fill nulls in age with 0 \
  validate email format

📈 Analytics Preparation

# Aggregate for dashboards
create pipeline from postgres transactions \
  to postgres daily_summary

add transform: \
  group by date, product \
  aggregate sum(revenue), count(*) \
  where date >= yesterday

All Commands

Command	Purpose
---------	---------
`create pipeline from to`	Define new pipeline
`add transform`	Add transformation step
`schedule`	Set run schedule
`run pipeline`	Execute immediately
`check pipeline status`	View running pipelines
`pause pipeline`	Stop scheduled runs
`view logs`	See execution history
`validate`	Test without executing

Supported Sources & Destinations

Databases: MySQL, PostgreSQL, MongoDB, Redis, SQLite

Cloud Storage: S3, GCS, Azure Blob

Data Warehouses: BigQuery, Snowflake, Redshift

Streaming: Kafka, Kinesis, Pub/Sub

Files: CSV, JSON, Parquet, Excel

Requirements

Node.js 18+ or Python 3.8+
Source/destination connectors (auto-installed)
Optional: Airflow, Dagster for orchestration

版本历史

共 1 个版本

v1.0.1 当前

2026-05-07 19:35 安全安全

安全检测

暂无安全检测报告

Skylv Data Pipeline Builder

概述