概述

Data Pipelines

Pipelines fail on silent schema drift, partial writes, and unclear ownership. Design for at-least-once delivery, idempotent sinks, and observable stages.

When to Offer This Workflow

Trigger conditions:

Batch or streaming ingestion (Kafka, Fivetran, Airflow, Dagster, Spark, etc.)
Late data, backfills, or schema changes breaking jobs
SLA misses on freshness or row counts

Initial offer:

Use six stages: (1) requirements & SLAs, (2) source contracts, (3) transforms & idempotency, (4) orchestration & dependencies, (5) quality & monitoring, (6) lineage & operations). Confirm batch vs stream and cloud stack.

Stage 1: Requirements & SLAs

Goal: Freshness (latency), completeness expectations, cost ceiling, failure tolerance (quarantine vs stop-the-line).

Exit condition: SLA table: pipeline → metric → threshold.

Stage 2: Source Contracts

Goal: Schema versioning; CDC vs snapshot pulls; API rate limits.

Practices

Raw landing zone immutable; curated layers downstream

Stage 3: Transforms & Idempotency

Goal: Deterministic transforms; upsert keys; partition strategy for rewinds.

Practices

Watermark progress for incremental loads

Stage 4: Orchestration & Dependencies

Goal: Clear DAG; retry policy; backfill without double counting; SLA miss alerts.

Stage 5: Quality & Monitoring

Goal: Data quality checks (null spikes, row bounds, referential checks); metrics on lag, duration, error rate.

Stage 6: Lineage & Operations

Goal: Column-level lineage where valuable; on-call runbook; ownership per pipeline.

Final Review Checklist

[ ] SLAs and failure policy explicit
[ ] Source contracts and schema evolution path
[ ] Idempotent writes and checkpointing
[ ] Orchestration with retries and safe backfill
[ ] Data quality checks and alerts
[ ] Lineage and ownership documented

Tips for Effective Guidance

Separate compute from storage cost awareness for large shuffles.
Pair with etl-design for batch patterns and message-queues for streaming handoffs.

Handling Deviations

Single-script pipelines: still document inputs, outputs, and schedule.

版本历史

共 1 个版本

v1.0.0 当前

2026-05-03 08:21 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

Data Pipelines

概述

Data Pipelines

When to Offer This Workflow

Stage 1: Requirements & SLAs

Stage 2: Source Contracts

Practices

Stage 3: Transforms & Idempotency

Practices

Stage 4: Orchestration & Dependencies

Stage 5: Quality & Monitoring

Stage 6: Lineage & Operations

Final Review Checklist

Tips for Effective Guidance

Handling Deviations

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Tavily 搜索

AdMapix

Visual