← 返回
未分类 中文

Fastqc Report Interpreter

Use when analyzing FASTQC quality reports from sequencing data, identifying quality issues in NGS datasets, or troubleshooting sequencing problems. Interpret...
用于分析测序数据的FASTQC质量报告、识别NGS数据集的质量问题或排除测序问题。解读...
aipoch-ai aipoch-ai 来源
未分类 clawhub v0.1.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 427
下载
💾 0
安装
1
版本
#latest

概述

FASTQC Report Interpreter

Analyze FASTQC quality control reports for Next-Generation Sequencing (NGS) data to assess data quality and identify issues.

Quick Start

from scripts.fastqc_interpreter import FASTQCInterpreter

interpreter = FASTQCInterpreter()

# Analyze report
analysis = interpreter.analyze("sample_fastqc.html")
print(f"Overall Quality: {analysis.quality_status}")
print(f"Issues Found: {analysis.issues}")

Core Capabilities

1. Quality Metrics Analysis

metrics = interpreter.parse_metrics("fastqc_data.txt")

Key Metrics:

MetricGoodWarningFail
-----------------------------
Per base sequence qualityQ > 28Q 20-28Q < 20
Per sequence quality scoresPeak at Q30Peak Q20-30Peak < Q20
Per base N content< 5%5-20%> 20%
Sequence duplication< 20%20-50%> 50%
Adapter content< 5%5-10%> 10%

2. Issue Diagnosis

issues = interpreter.diagnose_issues(metrics)
for issue in issues:
    print(f"{issue.severity}: {issue.description}")
    print(f"Recommendation: {issue.recommendation}")

Common Issues:

Low Quality at Read Ends

  • Cause: Phasing effects, reagent depletion
  • Solution: Trim last 10-20 bases

Adapter Contamination

  • Cause: Incomplete adapter removal
  • Solution: Re-run cutadapt/Trimmomatic with stricter parameters

High Duplication

  • Cause: PCR over-amplification, low input
  • Solution: Use deduplication; consider library prep optimization

Per Base Sequence Content Bias

  • Cause: Adapter dimers, non-random priming
  • Solution: Check for adapter contamination; randomize primers

3. Batch Analysis

batch_results = interpreter.analyze_batch(
    fastqc_files=["sample1_fastqc.html", "sample2_fastqc.html", ...],
    output_summary="batch_summary.csv"
)

4. Recommendation Generation

recommendations = interpreter.get_recommendations(
    analysis,
    application="rna_seq",  # or "dna_seq", "chip_seq"
    quality_threshold="high"
)

Application-Specific Thresholds:

  • RNA-seq: Acceptable duplication up to 40% (transcript abundance)
  • DNA-seq: Strict quality requirements (variant calling)
  • ChIP-seq: Moderate quality, focus on enrichment metrics

CLI Usage

# Analyze single report
python scripts/fastqc_interpreter.py --input sample_fastqc.html

# Batch analysis
python scripts/fastqc_interpreter.py --batch "*fastqc.html" --output report.pdf

# With custom thresholds
python scripts/fastqc_interpreter.py --input fastqc.html --application rna_seq

Output Interpretation

PASS (Green): Proceed with analysis

WARNING (Yellow): Review but likely acceptable

FAIL (Red): Requires action before downstream analysis

Troubleshooting Guide

See references/troubleshooting.md for:

  • Platform-specific issues (Illumina, PacBio, Oxford Nanopore)
  • Library prep problem diagnosis
  • Downstream analysis impact assessment

Skill ID: 205 | Version: 1.0 | License: MIT

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-05-02 14:30 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 274 📥 101,077
data-analysis

AdMapix

fly0pants
AdMapix 原始数据层,提供广告创意、应用、排名、下载/收入及市场元数据。返回 AdMapix API 的结构化 JSON;调用方...
★ 297 📥 142,242
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 211 📥 70,337