← 返回
未分类 中文

contribution-analysis

Calculate the relative contribution of different factors to a response variable using R² decomposition. Use when you need to quantify how much each factor ex...
使用R²分解计算各因素对响应变量的相对贡献。用于量化各因素的贡献程度时使用。
wu-uk wu-uk 来源
未分类 clawhub v0.1.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 301
下载
💾 0
安装
1
版本
#latest

概述

Contribution Analysis Guide

Overview

Contribution analysis quantifies how much each factor contributes to explaining the variance of a response variable. This skill focuses on R² decomposition method.

Complete Workflow

When you have multiple correlated variables that belong to different categories:

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from factor_analyzer import FactorAnalyzer

# Step 1: Combine ALL variables into one matrix
pca_vars = ['Var1', 'Var2', 'Var3', 'Var4', 'Var5', 'Var6', 'Var7', 'Var8']
X = df[pca_vars].values
y = df['ResponseVariable'].values

# Step 2: Standardize
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 3: Run ONE global PCA on all variables together
fa = FactorAnalyzer(n_factors=4, rotation='varimax')
fa.fit(X_scaled)
scores = fa.transform(X_scaled)

# Step 4: R² decomposition on factor scores
def calc_r2(X, y):
    model = LinearRegression()
    model.fit(X, y)
    y_pred = model.predict(X)
    ss_res = np.sum((y - y_pred) ** 2)
    ss_tot = np.sum((y - np.mean(y)) ** 2)
    return 1 - (ss_res / ss_tot)

full_r2 = calc_r2(scores, y)

# Step 5: Calculate contribution of each factor
contrib_0 = full_r2 - calc_r2(scores[:, [1, 2, 3]], y)
contrib_1 = full_r2 - calc_r2(scores[:, [0, 2, 3]], y)
contrib_2 = full_r2 - calc_r2(scores[:, [0, 1, 3]], y)
contrib_3 = full_r2 - calc_r2(scores[:, [0, 1, 2]], y)

R² Decomposition Method

The contribution of each factor is calculated by comparing the full model R² with the R² when that factor is removed:

Contribution_i = R²_full - R²_without_i

Output Format

contributions = {
    'Category1': contrib_0 * 100,
    'Category2': contrib_1 * 100,
    'Category3': contrib_2 * 100,
    'Category4': contrib_3 * 100
}

dominant = max(contributions, key=contributions.get)
dominant_pct = round(contributions[dominant])

with open('output.csv', 'w') as f:
    f.write('variable,contribution\n')
    f.write(f'{dominant},{dominant_pct}\n')

Common Issues

IssueCauseSolution
------------------------
Negative contributionSuppressor effectCheck for multicollinearity
Contributions don't sum to R²Normal behaviorR² decomposition is approximate
Very small contributionsFactor not importantMay be negligible driver

Best Practices

  • Run ONE global PCA on all variables together, not separate PCA per category
  • Use factor_analyzer with varimax rotation
  • Map factors to category names based on loadings interpretation
  • Report contribution as percentage
  • Identify the dominant (largest) factor

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-05-07 21:38 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

pdf

wu-uk
全面PDF工具,支持文本/表格提取、新PDF创建、合并/拆分文档、表单处理。当Claude需要...
★ 0 📥 517

xlsx

wu-uk
全面的电子表格创建、编辑与分析,支持公式、格式化、数据分析和可视化。当Claude需要工作时...
★ 0 📥 527
office-efficiency

modora

wu-uk
使用此技能,可通过远程 MoDora HTTP 服务分析 PDF,凭据通过声明的环境变量管理,不会在服务器上存储。
★ 0 📥 511