← 返回
未分类

customer-segment-eng

Analyze uploaded bank customer data to segment and profile customers by assets, transactions, and behavior, outputting clusters, statistics, and visual charts.
分析上传的银行客户数据,按资产、交易和行为对客户进行细分与画像,输出聚类、统计及可视化图表。
yukirang yukirang 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 366
下载
💾 0
安装
1
版本
#latest

概述

Customer Segmentation Skill

Financial customer segmentation analysis: Stratify customers based on assets, transaction behaviors, activity levels, and other dimensions, outputting actionable segmentation results and visualizations.

Workflow

Step 1 — Data Loading and Cleaning

Read user-uploaded CSV or Excel files, automatically identifying column names.

Priority fields to retain:

  • customer_id / 客户ID — Unique customer identifier
  • age / 年龄
  • gender / 性别
  • balance / 资产余额
  • txn_amount / 交易金额
  • txn_count / 交易次数
  • last_date / 最近交易日期
  • product_count / 持有产品数
  • branch / 网点

Missing value handling:

  • Numeric: Fill with median
  • Categorical: Fill with mode
  • Columns with >30% missing: Delete and notify user
import pandas as pd

df = pd.read_csv(file_path)
df.columns = df.columns.str.strip().str.lower()

Step 2 — Feature Engineering

Build RFM + extended features:

FeatureDescription
----------------------
RecencyDays since last transaction (smaller = more active)
FrequencyTransaction frequency (number of transactions in specified period)
MonetaryTransaction amount (total amount in specified period)
TenureCustomer duration (months)
Product_DepthNumber of products held
AgeCustomer age

Data standardization: Use StandardScaler (Z-score) to normalize all numeric features.

Step 3 — Clustering Analysis

Use K-Means algorithm, automatically determine K value (Elbow Method, SSE inflection point).

from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(features)

# Elbow method to find optimal K
sse = {}
for k in range(2, 10):
    km = KMeans(n_clusters=k, random_state=42, n_init=10)
    km.fit(X_scaled)
    sse[k] = km.inertia_
optimal_k = min(sse, key=sse.get)  # Simply take k with minimum SSE

K=5 can also be fixed based on business needs (high/medium-high/medium/medium-low/low value customers).

Step 4 — Segment Profiling

Output core statistics for each cluster:

Cluster 0 (High-Value Customers): Avg. assets 850k, Avg. transaction frequency 28/month, Gender distribution 62% male
Cluster 1 (Potential Customers): Avg. assets 320k,明显 younger trend
...

Recommended label system (five categories):

  • 🌟 High-Value Customers (VIP)
  • ⬆️ Potential Customers
  • 🟢 Stable Customers
  • 🔄 Active Transaction Customers
  • ⚠️ Dormant/Churn Warning Customers

Step 5 — Visualization

Generate the following charts (saved as PNG):

  1. Customer Asset Distribution Histogram — Asset distribution comparison across levels
  2. Radar Chart — Feature comparison across segments
  3. Heatmap — Cluster feature mean matrix
  4. Scatter Plot — Customer distribution with assets × transaction frequency as coordinates
import matplotlib.pyplot as plt
import matplotlib
matplotlib.use('Agg')
plt.rcParams['font.sans-serif'] = ['WenQuanYi Micro Hei', 'SimHei']

fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Asset distribution
axes[0].hist([g['balance'] for _, g in df.groupby('cluster')], bins=30, label=[f'C{i}' for i in range(k)])
axes[0].set_title('Customer Balance Distribution by Cluster')
# Heatmap
import seaborn as sns
sns.heatmap(cluster_means.T, annot=True, fmt='.1f', ax=axes[1])
axes[1].set_title('Cluster Feature Heatmap')
plt.tight_layout()
plt.savefig(output_path, dpi=150)

Step 6 — Output Results

Output content:

  1. Segmentation result table (including customer ID, cluster, segmentation label) → segmentation_results.csv
  2. Cluster feature statistics → cluster_summary.csv
  3. Visualization charts → segmentation_charts.png
  4. Analysis summary (Markdown format) → segmentation_report.md

For detailed clustering and parameter documentation:

  • RFM model explanation: Refer to references/rfm-guide.md
  • Clustering parameter explanation: Refer to references/clustering-guide.md

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 14:52 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

professional

risk-sentiment-scanner

yukirang
企业舆情信用风险扫描:用户提供企业名单后,自动抓取最新公开舆情(新闻、公告、监管信息),结合风险评分模型生成结构化信用风险报告。触发场景:1)用户说“扫描风险”“舆情分析”“信用风险评级”;2)需要批量评估企业名单;3)上传含企业名称的CS
★ 2 📥 716
data-analysis

AdMapix

fly0pants
AdMapix 原始数据层,提供广告创意、应用、排名、下载/收入及市场元数据。返回 AdMapix API 的结构化 JSON;调用方...
★ 296 📥 138,352
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 208 📥 67,167