Pipeline Detail

Cancer Transcriptomics肿瘤转录组与临床应用

肿瘤 RNA-seq 综合分析

面向肿瘤 bulk RNA-seq 的综合分析流程，整合 DEG、通路活性、免疫浸润、分型、预后、融合基因和候选机制解释。

创建时间

2026/6/3

分析难度

高级

推荐场景

肿瘤转录组

预计耗时

3-5 天

Metadata

流程元数据

先看应用场景、输入输出和工具依赖，再进入正文命令细节。

Difficulty

高级

Scenario

肿瘤转录组

Estimated Time

3-5 天

Tools

DESeq2STARGSVAssGSEAxCellESTIMATESTAR-FusionArriba

Inputs

BAMGTFTPM

Outputs

heatmapfusion candidatespathway scorereport

Workflow DAG

流程图

用步骤节点快速理解这个分析从原始数据到结果报告的流转关系。

STEP 1

建立肿瘤项目目录

→

STEP 2

临床信息和表达矩阵

→

STEP 3

差异表达

→

STEP 4

通路活性

→

STEP 5

免疫浸润

→

STEP 6

预后分析

→

STEP 7

分型/聚类

→

STEP 8

融合基因

→

STEP 9

综合报告

Protocol

流程文档

正文保留 Markdown 排版、代码语言标识和表格样式，适合边学边复现。

肿瘤 RNA-seq 综合分析

一、项目目录

mkdir -p tumor_rnaseq_project/{00_clinical,01_expression,02_deg,03_pathway,04_immune,05_survival,06_subtype,07_fusion,report}

二、示例数据

00_clinical/clinical_info.csv：

sample_id,group,stage,OS_time,OS_status
Tumor_1,Tumor,III,520,1
Tumor_2,Tumor,II,900,0
Normal_1,Normal,NA,NA,NA
Normal_2,Normal,NA,NA,NA

01_expression/tpm_matrix.csv：

gene_symbol,Tumor_1,Tumor_2,Normal_1,Normal_2
MKI67,50,45,5,6
PDCD1,8,10,1,1.2
EPCAM,100,120,30,28

三、整体流程图

flowchart TD
    A[expression + clinical metadata] --> B[DEG]
    A --> C[GSVA/ssGSEA pathway score]
    A --> D[immune infiltration]
    A --> E[survival analysis]
    A --> F[molecular subtype clustering]
    A --> G[fusion detection]
    B --> H[integrated tumor mechanism]
    C --> H
    D --> H
    E --> H
    F --> H
    G --> H

四、差异表达

library(DESeq2)

counts <- read.csv("01_expression/raw_counts.csv", row.names = 1, check.names = FALSE)
clinical <- read.csv("00_clinical/clinical_info.csv", row.names = 1)

dds <- DESeqDataSetFromMatrix(
  countData = round(as.matrix(counts)),
  colData = clinical,
  design = ~ group
)

dds <- dds[rowSums(counts(dds) >= 10) >= 3, ]
dds <- DESeq(dds)
res <- results(dds, contrast = c("group", "Tumor", "Normal"))
write.csv(as.data.frame(res), "02_deg/Tumor_vs_Normal_DESeq2.csv")

五、通路活性

library(GSVA)
library(msigdbr)

tpm <- read.csv("01_expression/tpm_matrix.csv", row.names = 1, check.names = FALSE)
expr_log <- log2(as.matrix(tpm) + 1)

hallmark <- msigdbr(species = "Homo sapiens", category = "H") |>
  split(x = .$gene_symbol, f = .$gs_name)

gsva_score <- gsva(expr_log, hallmark, method = "gsva", kcdf = "Gaussian")
write.csv(gsva_score, "03_pathway/hallmark_gsva_scores.csv")

六、免疫浸润

library(immunedeconv)

immune_xcell <- deconvolute(as.matrix(tpm), method = "xcell", arrays = FALSE)
estimate_score <- deconvolute(as.matrix(tpm), method = "estimate")

write.csv(immune_xcell, "04_immune/xcell_scores.csv", row.names = FALSE)
write.csv(estimate_score, "04_immune/estimate_scores.csv", row.names = FALSE)

七、预后分析

library(survival)
library(survminer)

gene <- "MKI67"
clinical$expr <- as.numeric(tpm[gene, rownames(clinical)])
clinical$risk_group <- ifelse(clinical$expr >= median(clinical$expr, na.rm = TRUE), "High", "Low")

fit <- survfit(Surv(OS_time, OS_status) ~ risk_group, data = clinical)

ggsurvplot(
  fit,
  data = clinical,
  pval = TRUE,
  risk.table = TRUE
)

八、分型/聚类

library(pheatmap)

top_var <- names(sort(apply(expr_log, 1, mad), decreasing = TRUE))[1:1000]
mat <- expr_log[top_var, grepl("Tumor", colnames(expr_log))]

pheatmap(
  mat,
  scale = "row",
  show_rownames = FALSE,
  filename = "06_subtype/tumor_unsupervised_clustering.pdf"
)

九、融合基因检测

STAR-Fusion 示例：

STAR-Fusion   --genome_lib_dir ref/ctat_genome_lib   --left_fq tumor_R1.fq.gz   --right_fq tumor_R2.fq.gz   --CPU 16   --output_dir 07_fusion/Tumor_1

Arriba 示例：

arriba   -x tumor.Aligned.sortedByCoord.out.bam   -g ref/genes.gtf   -a ref/genome.fa   -o 07_fusion/Tumor_1_fusions.tsv

十、综合解释示例

Tumor 组中 MKI67 上调，cell cycle 通路 GSVA 分数升高，并且高 MKI67 表达组预后更差。
同时免疫浸润分析显示 macrophage score 升高，提示该肿瘤亚型可能具有高增殖和免疫抑制特征。
如果 fusion 检测发现 driver fusion，需要与 DEG 和通路结果联合解释。

十一、交付物

DEG 表和火山图
Hallmark/KEGG 通路活性矩阵
免疫浸润分数矩阵
生存曲线和 Cox 结果
肿瘤样本分型热图
fusion candidates
综合机制解释报告