← 返回
未分类

Skylv Data Pipeline Builder

Build ETL/data pipelines with natural language. Extract from databases/APIs, transform with code, load to destinations. No pipeline framework expertise needed.
sky-lv
未分类 clawhub v1.0.1 100000 Key: 无需
★ 0
Stars
📥 360
下载
💾 1
安装

概述

data-pipeline-builder

> Build data pipelines without framework expertise. Extract from any source, transform with code, load to any destination — all with natural language commands.

What It Does

  • Extract data — From databases, APIs, files, S3, GCS, Kafka
  • Transform — Filters, mappings, aggregations, joins, custom code
  • Load — To databases, data warehouses, files, APIs
  • Schedule — Cron-based or event-triggered execution
  • Monitor — Pipeline status, throughput, error rates
  • Validate — Schema checks, data quality rules

Quick Start

# 1. Create a simple pipeline
create pipeline from mysql users to postgres users_backup

# 2. Add transformation
add transform to users-backup: filter where active = true

# 3. Schedule it
schedule users-backup daily at 2:00 AM

# 4. Run and monitor
run pipeline users-backup
check pipeline status

Common Use Cases

🔄 Database Synchronization

# Sync production to analytics warehouse
create pipeline from mysql production.orders \
  to bigquery analytics.orders

# Run incremental sync every hour
schedule orders-sync hourly

📊 API Data Extraction

# Pull data from REST API
create pipeline from api https://api.shop.com/orders \
  to postgres analytics.orders

# Add authentication
set source auth: bearer token xxx

🧹 Data Cleaning

# Clean and transform data
create pipeline from csv raw_data.csv to postgres clean_data

add transform: \
  remove duplicates on email \
  fill nulls in age with 0 \
  validate email format

📈 Analytics Preparation

# Aggregate for dashboards
create pipeline from postgres transactions \
  to postgres daily_summary

add transform: \
  group by date, product \
  aggregate sum(revenue), count(*) \
  where date >= yesterday

All Commands

CommandPurpose
------------------
create pipeline from to Define new pipeline
add transform Add transformation step
schedule Set run schedule
run pipeline Execute immediately
check pipeline statusView running pipelines
pause pipeline Stop scheduled runs
view logs See execution history
validate Test without executing

Supported Sources & Destinations

Databases: MySQL, PostgreSQL, MongoDB, Redis, SQLite

Cloud Storage: S3, GCS, Azure Blob

Data Warehouses: BigQuery, Snowflake, Redshift

Streaming: Kafka, Kinesis, Pub/Sub

Files: CSV, JSON, Parquet, Excel


Requirements

  • Node.js 18+ or Python 3.8+
  • Source/destination connectors (auto-installed)
  • Optional: Airflow, Dagster for orchestration

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-07 19:35 安全 安全

安全检测

暂无安全检测报告