← 返回
未分类

Capacity Planner

Forecast infrastructure capacity needs using historical metrics, growth projections, and cost modeling. Identify bottlenecks before they cause outages and ri...
使用历史指标、增长预测和成本建模来预测基础设施容量需求。在故障发生前识别瓶颈和风险。
charlie-morrison charlie-morrison 来源
未分类 clawhub v1.0.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 319
下载
💾 2
安装
1
版本
#latest

概述

Capacity Planner

Forecast when your infrastructure will hit limits. Analyze historical metrics (CPU, memory, disk, network, database connections), project growth curves, identify approaching bottlenecks, and recommend right-sizing — so you scale proactively instead of reactively.

Use when: "when will we run out of space", "capacity forecast", "right-size our instances", "are we over-provisioned", "plan for traffic growth", "infrastructure scaling plan", "when do we need to upgrade", or before budget planning.

Commands

1. forecast — Project Resource Exhaustion

Step 1: Collect Historical Metrics

# Prometheus — CPU utilization over last 30 days
curl -s "$PROMETHEUS_URL/api/v1/query_range" \
  --data-urlencode 'query=avg(rate(node_cpu_seconds_total{mode!="idle"}[5m])) by (instance)' \
  --data-urlencode "start=$(date -d '30 days ago' +%s)" \
  --data-urlencode "end=$(date +%s)" \
  --data-urlencode 'step=1h' | python3 -c "
import json, sys
data = json.load(sys.stdin)
for result in data['data']['result']:
    instance = result['metric']['instance']
    values = [float(v[1]) for v in result['values']]
    avg = sum(values) / len(values)
    peak = max(values)
    trend = (values[-1] - values[0]) / len(values)  # slope per hour
    print(f'{instance}: avg={avg:.1%} peak={peak:.1%} trend={trend:+.4%}/hr')
"

# Disk usage over time
df -h / /data /var 2>/dev/null
# Historical disk growth (if monitoring available)
curl -s "$PROMETHEUS_URL/api/v1/query_range" \
  --data-urlencode 'query=node_filesystem_avail_bytes{mountpoint="/"}' \
  --data-urlencode "start=$(date -d '30 days ago' +%s)" \
  --data-urlencode "end=$(date +%s)" \
  --data-urlencode 'step=1d'

# Memory usage
free -h
# Database connections
curl -s "$PROMETHEUS_URL/api/v1/query" \
  --data-urlencode 'query=pg_stat_activity_count / pg_settings_max_connections'

If Prometheus unavailable, use CloudWatch, Datadog, or system tools:

# Last 30 days of CloudWatch CPU
aws cloudwatch get-metric-statistics --namespace AWS/EC2 \
  --metric-name CPUUtilization --statistics Average Maximum \
  --dimensions Name=InstanceId,Value=i-0abc123 \
  --start-time $(date -d '30 days ago' -u +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) --period 86400

Step 2: Fit Growth Model

For each resource, determine growth pattern:

  • Linear: constant rate of increase (disk filling at 2GB/day)
  • Exponential: accelerating growth (user base doubling quarterly)
  • Seasonal: cyclical patterns (weekend dips, end-of-month spikes)
  • Flat: no growth (stable, well-bounded workload)

Calculate days until exhaustion:

days_remaining = (capacity - current_usage) / daily_growth_rate

For exponential: use doubling time to project.

Step 3: Generate Forecast Report

# Capacity Forecast Report — [date]

## Critical (exhaustion < 30 days)
| Resource | Current | Capacity | Growth/day | Exhaustion | Action |
|----------|---------|----------|------------|------------|--------|
| Disk (/) | 45 GB | 50 GB | 180 MB/day | ~28 days | Expand volume or add cleanup cron |
| DB connections | 85/100 | 100 | +2/week | ~5 weeks | Increase max_connections or add pgbouncer |

## Warning (exhaustion 30-90 days)
| Resource | Current | Capacity | Growth/day | Exhaustion |
|----------|---------|----------|------------|------------|
| Memory | 12/16 GB | 16 GB | 50 MB/day | ~82 days |

## Healthy (>90 days or no growth)
- CPU: avg 35%, peak 72%, flat trend — no action needed
- Network: avg 200 Mbps of 1 Gbps — no concern

## Over-Provisioned (wasting money)
| Resource | Used | Provisioned | Utilization | Savings |
|----------|------|-------------|-------------|---------|
| worker-pool-3 | 2 vCPU avg | 8 vCPU | 25% | Downsize to 4 vCPU, save ~$150/mo |
| Redis cluster | 512 MB | 8 GB | 6% | Downsize to 2 GB, save ~$80/mo |

2. rightsize — Recommend Instance Sizes

Given current utilization and growth projections:

  • Map workload to optimal instance family (compute, memory, storage-optimized)
  • Factor in reserved instance / savings plan pricing
  • Account for headroom (recommend 60-70% target utilization, not 95%)
  • Compare across cloud providers if multi-cloud

3. cost-model — Project Infrastructure Costs

Given the capacity forecast:

  • Calculate current monthly spend
  • Project spend at 3, 6, 12 months based on growth
  • Identify the biggest cost drivers
  • Suggest cost optimization levers (spot instances, reserved pricing, auto-scaling, compression, archival)

4. bottleneck — Identify Scaling Bottlenecks

Analyze the system for the component that will fail first under load:

  • Database (connections, IOPS, lock contention)
  • Application (CPU-bound, memory-bound, thread pool exhaustion)
  • Network (bandwidth, DNS resolution, TLS handshake overhead)
  • External dependencies (rate limits, API quotas, third-party SLAs)

Rank bottlenecks by "time to impact" and recommend mitigation order.

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-08 00:36 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

it-ops-security

OpenClaw Backup

alex3alex
备份与恢复 OpenClaw 数据。适用于创建备份、设置自动备份计划、从备份恢复或管理备份轮转。处理 ~/.openclaw 目录归档并包含适当的排除规则。
★ 90 📥 31,046
it-ops-security

Free Ride - Unlimited free AI

shaivpidadi
管理OpenClaw的OpenRouter免费AI模型,自动按质量排名模型,配置速率限制备用方案,并更新opencla...
★ 471 📥 78,421
it-ops-security

MoltGuard - Security & Antivirus & Guardrails

thomaslwang
MoltGuard — OpenClaw 安全守卫,由 OpenGuardrails 提供。安装后可防止您和您的用户受到提示注入、数据泄露及恶意行为的侵害。
★ 116 📥 31,004