← 返回
未分类 中文

Cluster

Perform data clustering analysis using k-means and hierarchical algorithms. Use when you need to group, classify, or segment datasets.
使用k‑means和层次聚类算法进行数据聚类分析,适用于需要对数据集进行分组、分类或分割的场景。
ckchzh ckchzh 来源
未分类 clawhub v1.0.0 1 版本 99841.3 Key: 无需
★ 1
Stars
📥 609
下载
💾 1
安装
1
版本
#latest

概述

Cluster — Data Clustering Analysis Tool

Cluster is a command-line data clustering analysis tool that supports k-means and hierarchical clustering algorithms. It reads numerical data from CSV/JSONL sources, performs clustering, evaluates cluster quality, and exports results.

Data is stored in ~/.cluster/data.jsonl as JSONL records. Each record represents a clustering run with its parameters, assignments, centroids, and evaluation metrics.

Prerequisites

  • Python 3.8+ with standard library (no external packages required for basic operations)
  • bash shell

Commands

run

Run a clustering algorithm on input data.

Environment Variables:

  • INPUT (required) — Path to input CSV/JSONL file with numerical data
  • K — Number of clusters (default: 3)
  • ALGORITHM — Algorithm to use: kmeans or hierarchical (default: kmeans)
  • MAX_ITER — Maximum iterations for k-means (default: 100)
  • SEED — Random seed for reproducibility

Example:

INPUT=/path/to/data.csv K=5 ALGORITHM=kmeans bash scripts/script.sh run

assign

Assign new data points to existing clusters from a previous run.

Environment Variables:

  • RUN_ID (required) — ID of the clustering run to use
  • INPUT (required) — Path to new data points (CSV/JSONL)

Example:

RUN_ID=abc123 INPUT=/path/to/new_data.csv bash scripts/script.sh assign

centroids

Display or export centroid coordinates for a clustering run.

Environment Variables:

  • RUN_ID (required) — ID of the clustering run
  • FORMAT — Output format: table, json, csv (default: table)

evaluate

Evaluate clustering quality with silhouette score, inertia, and Davies-Bouldin index.

Environment Variables:

  • RUN_ID (required) — ID of the clustering run to evaluate

visualize

Generate a text-based or ASCII visualization of cluster assignments.

Environment Variables:

  • RUN_ID (required) — ID of the clustering run
  • DIMS — Dimensions to plot, comma-separated (default: first two)

export

Export clustering results to a file.

Environment Variables:

  • RUN_ID (required) — ID of the run to export
  • OUTPUT — Output file path (default: stdout)
  • FORMAT — Export format: json, csv, jsonl (default: json)

import

Import a previously exported clustering run.

Environment Variables:

  • INPUT (required) — Path to the file to import

config

View or update configuration settings.

Environment Variables:

  • KEY — Configuration key to set
  • VALUE — Configuration value

list

List all stored clustering runs with summary info.

Environment Variables:

  • LIMIT — Maximum runs to display (default: 20)
  • SORT — Sort field: date, k, score (default: date)

stats

Show aggregate statistics across all clustering runs.

help

Display usage information and available commands.

version

Display the current version of the cluster tool.

Data Storage

All clustering runs are stored in ~/.cluster/data.jsonl. Each line is a JSON object with fields:

  • id — Unique run identifier
  • timestamp — ISO 8601 creation time
  • algorithm — Algorithm used
  • k — Number of clusters
  • centroids — List of centroid coordinates
  • assignments — Mapping of data point indices to cluster IDs
  • metrics — Evaluation metrics (silhouette, inertia, etc.)
  • input_file — Source data file path
  • num_points — Number of data points clustered

Configuration

Config is stored in ~/.cluster/config.json. Available keys:

  • default_k — Default number of clusters (default: 3)
  • default_algorithm — Default algorithm (default: kmeans)
  • max_iterations — Default max iterations (default: 100)
  • random_seed — Default random seed (default: 42)

Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-02 01:27 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

professional

Legal Advisor

ckchzh
生成劳动、消费、租赁及交通纠纷法律模板,适用于撰写纠纷信函、审查租户权利、准备索赔等。
★ 4 📥 4,112
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 208 📥 68,603
data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 273 📥 100,334