← 返回
数据分析 中文

Volcano Plot Labeler

Automatically label top significant genes in volcano plots with repulsion algorithm
利用排斥算法自动标注火山图中的显著基因。
ec-cyber258
数据分析 clawhub v0.1.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 450
下载
💾 15
安装
1
版本
#latest

概述

Volcano Plot Labeler (ID: 148)

Automatically identify and label the Top 10 most significant genes in volcano plots using a repulsion algorithm to prevent label overlap.

Features

  • Smart Gene Selection: Automatically identifies the top 10 most significant genes based on p-value and fold change
  • Repulsion Algorithm: Uses force-directed positioning to prevent text label overlap
  • Customizable: Configurable thresholds, label styling, and positioning options
  • Multiple Output Formats: PNG, PDF, SVG support

Installation

pip install pandas matplotlib numpy scipy

Usage

Basic Usage

from volcano_plot_labeler import label_volcano_plot
import pandas as pd

# Load your data
df = pd.read_csv('differential_expression_results.csv')

# Generate labeled volcano plot
fig = label_volcano_plot(
    df,
    log2fc_col='log2FoldChange',
    pvalue_col='padj',
    gene_col='gene_name',
    top_n=10
)
fig.savefig('volcano_plot_labeled.png', dpi=300, bbox_inches='tight')

Advanced Usage

from volcano_plot_labeler import label_volcano_plot

fig = label_volcano_plot(
    df,
    log2fc_col='log2FoldChange',
    pvalue_col='padj',
    gene_col='gene_name',
    top_n=10,
    pvalue_threshold=0.05,
    log2fc_threshold=1.0,
    figsize=(12, 10),
    repulsion_iterations=100,
    repulsion_force=0.05,
    label_fontsize=10,
    label_color='black',
    arrow_color='gray',
    save_path='output.png'
)

Command Line Usage

python scripts/main.py \
    --input data/deseq2_results.csv \
    --output volcano_labeled.png \
    --log2fc-col log2FoldChange \
    --pvalue-col padj \
    --gene-col gene_name \
    --top-n 10

Input Format

Expected CSV/TSV columns:

  • log2FoldChange: Log2 fold change values
  • padj or pvalue: Adjusted p-values or raw p-values
  • gene_name: Gene identifiers

Algorithm

Significance Calculation

  1. Calculate -log10(pvalue) for all genes
  2. Rank genes by combined score: |log2FC| * -log10(pvalue)
  3. Select top N genes with highest significance

Repulsion Algorithm

  1. Initial Placement: Place labels at gene coordinates
  2. Force Calculation:
    • Repulsive force between overlapping labels
    • Spring force pulling label toward its gene point
    • Boundary forces to keep labels within plot area
  3. Iterative Optimization: Update positions for N iterations until convergence
  4. Arrow Drawing: Draw connecting lines from labels to gene points

Parameters

ParameterTypeDefaultDescription
---------------------------------------
dfDataFrame-Input data
log2fc_colstr'log2FoldChange'Column name for log2 fold change
pvalue_colstr'padj'Column name for p-value
gene_colstr'gene_name'Column name for gene names
top_nint10Number of top genes to label
pvalue_thresholdfloat0.05P-value cutoff for coloring
log2fc_thresholdfloat1.0Log2FC cutoff for coloring
repulsion_iterationsint100Iterations for repulsion algorithm
repulsion_forcefloat0.05Strength of repulsion force
label_fontsizeint10Font size for labels
figsizetuple(10, 10)Figure size

Output

  • Labeled volcano plot with:
  • Color-coded points (up/down/not significant)
  • Top 10 gene labels with leader lines
  • No overlapping text labels

License

MIT

Risk Assessment

Risk IndicatorAssessmentLevel
-----------------------------------
Code ExecutionPython/R scripts executed locallyMedium
Network AccessNo external API callsLow
File System AccessRead input files, write output filesMedium
Instruction TamperingStandard prompt guidelinesLow
Data ExposureOutput files saved to workspaceLow

Security Checklist

  • [ ] No hardcoded credentials or API keys
  • [ ] No unauthorized file system access (../)
  • [ ] Output does not expose sensitive information
  • [ ] Prompt injection protections in place
  • [ ] Input file paths validated (no ../ traversal)
  • [ ] Output directory restricted to workspace
  • [ ] Script execution in sandboxed environment
  • [ ] Error messages sanitized (no stack traces exposed)
  • [ ] Dependencies audited
  • Prerequisites

# Python dependencies
pip install -r requirements.txt

Evaluation Criteria

Success Metrics

  • [ ] Successfully executes main functionality
  • [ ] Output meets quality standards
  • [ ] Handles edge cases gracefully
  • [ ] Performance is acceptable

Test Cases

  1. Basic Functionality: Standard input → Expected output
  2. Edge Case: Invalid input → Graceful error handling
  3. Performance: Large dataset → Acceptable processing time

Lifecycle Status

  • Current Stage: Draft
  • Next Review Date: 2026-03-06
  • Known Issues: None
  • Planned Improvements:
  • Performance optimization
  • Additional feature support

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-03-30 06:07 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 368 📥 140,460
data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 165 📥 60,015
content-creation

Vector Text Fixer

ec-cyber258
修复PDF/SVG矢量图形乱码以便AI最终编辑。检测、替换并修复矢量文件中的乱码,保持原始格式。
★ 0 📥 502