← 返回
内容创作 中文

ClickHouse

Query, optimize, and administer ClickHouse OLAP databases with schema design, performance tuning, and data ingestion patterns.
查询、优化与管理ClickHouse OLAP数据库,涵盖模式设计、性能调优及数据摄入模式。
ivangdavila
内容创作 clawhub v1.0.1 1 版本 99898.9 Key: 无需
★ 1
Stars
📥 968
下载
💾 41
安装
1
版本
#latest

概述

ClickHouse 🏠

Real-time analytics on billions of rows. Sub-second queries. No indexes needed.

Setup

On first use, read setup.md for connection configuration.

When to Use

User needs OLAP analytics, log analysis, time-series data, or real-time dashboards. Agent handles schema design, query optimization, data ingestion, and cluster administration.

Architecture

Memory lives in ~/clickhouse/. See memory-template.md for structure.

~/clickhouse/
├── memory.md        # Connection profiles + query patterns
├── schemas/         # Table definitions per database
└── queries/         # Saved analytical queries

Quick Reference

TopicFile
-------------
Setup & connectionsetup.md
Memory templatememory-template.md
Query patternsqueries.md
Performance tuningperformance.md
Data ingestioningestion.md

Core Rules

1. Always Specify Engine

Every table needs an explicit engine. Default to MergeTree family:

-- Time-series / logs
CREATE TABLE events (
    timestamp DateTime,
    event_type String,
    data String
) ENGINE = MergeTree()
ORDER BY (timestamp, event_type);

-- Aggregated metrics
CREATE TABLE daily_stats (
    date Date,
    metric String,
    value AggregateFunction(sum, UInt64)
) ENGINE = AggregatingMergeTree()
ORDER BY (date, metric);

2. ORDER BY is Your Index

ClickHouse has no traditional indexes. The ORDER BY clause determines data layout:

  • Put high-cardinality filter columns first
  • Put range columns (dates, timestamps) early
  • Match your most common WHERE patterns
-- Good: filters by user_id, then date range
ORDER BY (user_id, date, event_type)

-- Bad: date first when you filter by user_id
ORDER BY (date, user_id, event_type)

3. Use Appropriate Data Types

Use CaseTypeWhy
---------------------
TimestampsDateTime or DateTime64Native time functions
Low-cardinality stringsLowCardinality(String)10x compression
Enums with few valuesEnum8 or Enum16Smallest footprint
Nullable only if neededNullable(T)Adds overhead
IPsIPv4 or IPv64 bytes vs 16+

4. Batch Inserts

Never insert row-by-row. ClickHouse is optimized for batch writes:

# Good: batch insert
clickhouse-client --query="INSERT INTO events FORMAT JSONEachRow" < batch.json

# Bad: individual inserts in a loop
for row in data:
    INSERT INTO events VALUES (...)

Minimum batch: 1,000 rows. Optimal: 10,000-100,000 rows.

5. Prewarm Queries with FINAL

Queries on ReplacingMergeTree/CollapsingMergeTree need FINAL for accuracy:

-- May return duplicates/old versions
SELECT * FROM users WHERE id = 123;

-- Guaranteed latest version
SELECT * FROM users FINAL WHERE id = 123;

FINAL has performance cost. For dashboards, consider materialized views.

6. Materialized Views for Speed

Pre-aggregate expensive computations:

CREATE MATERIALIZED VIEW hourly_events
ENGINE = SummingMergeTree()
ORDER BY (hour, event_type)
AS SELECT
    toStartOfHour(timestamp) AS hour,
    event_type,
    count() AS events
FROM events
GROUP BY hour, event_type;

7. Check System Tables First

Before debugging, check system tables:

-- Running queries
SELECT * FROM system.processes;

-- Recent query performance
SELECT query, elapsed, read_rows, memory_usage
FROM system.query_log
WHERE type = 'QueryFinish'
ORDER BY event_time DESC
LIMIT 10;

-- Table sizes
SELECT database, table, formatReadableSize(total_bytes) as size
FROM system.tables
ORDER BY total_bytes DESC;

Common Traps

  • String instead of LowCardinality → 10x larger storage for status/type columns
  • Wrong ORDER BY → Full table scans instead of index lookups
  • Row-by-row inserts → Massive part fragmentation, slow writes
  • Missing TTL → Unbounded table growth, disk full
  • SELECT * → Reads all columns, kills columnar advantage
  • Nullable everywhere → Overhead + NULL handling complexity
  • Forgetting FINAL → Stale/duplicate data in merge tables

Performance Checklist

Before running expensive queries:

  1. Check EXPLAIN: EXPLAIN SELECT ... shows execution plan
  2. Sample first: SELECT ... FROM table SAMPLE 0.01 for 1% sample
  3. Limit columns: Only SELECT what you need
  4. Use PREWHERE: Filters before reading all columns
  5. Check parts: SELECT count() FROM system.parts WHERE table='X'
-- PREWHERE optimization
SELECT user_id, event_type, data
FROM events
PREWHERE date = today()
WHERE event_type = 'click';

Cluster Administration

Adding TTL for Data Retention

-- Delete old data
ALTER TABLE events
MODIFY TTL timestamp + INTERVAL 90 DAY;

-- Move to cold storage
ALTER TABLE events
MODIFY TTL timestamp + INTERVAL 30 DAY TO VOLUME 'cold';

Monitoring Disk Usage

SELECT
    database,
    table,
    formatReadableSize(sum(bytes_on_disk)) as disk_size,
    sum(rows) as total_rows,
    count() as parts
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes_on_disk) DESC;

External Endpoints

EndpointData SentPurpose
------------------------------
localhost:8123SQL queriesHTTP interface
localhost:9000SQL queriesNative TCP interface

No external services contacted. All queries run against user-specified ClickHouse instances.

Security & Privacy

Data saved locally (with user consent):

  • Connection profiles (host, port, database) in ~/clickhouse/memory.md
  • Query patterns and schema documentation
  • Authentication method preferences (password vs certificate)

Important: If you provide database passwords, they are stored in plain text in ~/clickhouse/. Consider using environment variables or connection profiles managed by clickhouse-client instead.

This skill does NOT:

  • Connect to any ClickHouse without explicit user configuration
  • Send data to external services
  • Automatically collect or store credentials without asking

Related Skills

Install with clawhub install if user confirms:

  • sql — SQL query patterns
  • analytics — data analysis workflows
  • data-analysis — structured data exploration

Feedback

  • If useful: clawhub star clickhouse
  • Stay updated: clawhub sync

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-03-29 16:41 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,362 📥 318,768
productivity

Word / DOCX

ivangdavila
创建、检查和编辑 Microsoft Word 文档及 DOCX 文件,支持样式、编号、修订记录、表格、分节符及兼容性检查等功能。
★ 440 📥 147,877
content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 295 📥 136,537