← 返回分析流程中心创建时间 2026/6/3 分析难度 中级 推荐场景 通路活性 预计耗时 1-3 天
Pipeline Detail
Pathway Analysis机制解释与多组学调控
GSVA/ssGSEA 通路活性评分
将基因表达矩阵转换为样本级通路活性矩阵,支持 GSVA、ssGSEA、PROGENy、Hallmark gene sets 和组间通路比较。
Metadata
流程元数据
先看应用场景、输入输出和工具依赖,再进入正文命令细节。
Difficulty
中级
Scenario
通路活性
Estimated Time
1-3 天
Tools
GSVAssGSEA
Inputs
TPMexpression matrix
Outputs
heatmapreport
Workflow DAG
流程图
用步骤节点快速理解这个分析从原始数据到结果报告的流转关系。
STEP 1
→建立通路评分项目
STEP 2
→表达矩阵准备
STEP 3
→基因集选择
STEP 4
→GSVA/ssGSEA 评分
STEP 5
→组间差异通路
STEP 6
→通路热图/PCA
STEP 7
通路活性报告
Protocol
流程文档
正文保留 Markdown 排版、代码语言标识和表格样式,适合边学边复现。
GSVA/ssGSEA 通路活性评分
一、项目目录
mkdir -p pathway_score_project/{00_input,01_genesets,02_scores,03_statistics,04_plots,report}
二、示例数据
00_input/expression_tpm.csv:
gene_symbol,Ctrl_1,Ctrl_2,Treat_1,Treat_2
IL6,2.1,2.4,20.5,18.9
CXCL8,1.2,1.5,15.2,14.8
GAPDH,100,98,105,101
00_input/sample_info.csv:
sample_id,condition
Ctrl_1,Ctrl
Ctrl_2,Ctrl
Treat_1,Treat
Treat_2,Treat
三、整体流程图
flowchart TD
A[gene x sample expression matrix] --> B[log2(TPM+1) 或 VST]
B --> C[MSigDB Hallmark / KEGG / Reactome gene sets]
C --> D[GSVA / ssGSEA]
D --> E[pathway x sample score matrix]
E --> F[limma / Wilcoxon 组间比较]
F --> G[通路热图和箱线图]
G --> H[通路机制解释]
四、GSVA 评分
library(GSVA)
library(msigdbr)
library(tidyverse)
expr <- read.csv("00_input/expression_tpm.csv", row.names = 1, check.names = FALSE)
expr_log <- log2(as.matrix(expr) + 1)
hallmark <- msigdbr(species = "Homo sapiens", category = "H") |>
split(x = .$gene_symbol, f = .$gs_name)
gsva_scores <- gsva(
expr_log,
hallmark,
method = "gsva",
kcdf = "Gaussian",
verbose = FALSE
)
write.csv(gsva_scores, "02_scores/gsva_hallmark_scores.csv")
五、ssGSEA 评分
ssgsea_scores <- gsva(
expr_log,
hallmark,
method = "ssgsea",
kcdf = "Gaussian",
abs.ranking = TRUE,
verbose = FALSE
)
write.csv(ssgsea_scores, "02_scores/ssgsea_hallmark_scores.csv")
六、组间通路差异
library(limma)
sample_info <- read.csv("00_input/sample_info.csv")
sample_info <- sample_info[match(colnames(gsva_scores), sample_info$sample_id), ]
design <- model.matrix(~ 0 + condition, data = sample_info)
colnames(design) <- levels(factor(sample_info$condition))
fit <- lmFit(gsva_scores, design)
contrast <- makeContrasts(Treat_vs_Ctrl = Treat - Ctrl, levels = design)
fit2 <- contrasts.fit(fit, contrast)
fit2 <- eBayes(fit2)
pathway_res <- topTable(fit2, number = Inf)
write.csv(pathway_res, "03_statistics/gsva_Treat_vs_Ctrl.csv")
七、可视化
library(pheatmap)
top_pathways <- rownames(pathway_res)[1:30]
pheatmap(
gsva_scores[top_pathways, ],
scale = "row",
annotation_col = data.frame(condition = sample_info$condition, row.names = sample_info$sample_id),
filename = "04_plots/top_pathway_heatmap.pdf",
width = 8,
height = 10
)
箱线图:
pathway <- "HALLMARK_INFLAMMATORY_RESPONSE"
plot_df <- data.frame(
score = gsva_scores[pathway, ],
sample_info
)
ggplot(plot_df, aes(condition, score, fill = condition)) +
geom_boxplot(width = 0.5, outlier.shape = NA) +
geom_jitter(width = 0.08, size = 2) +
theme_bw() +
labs(title = pathway, y = "GSVA score")
八、结果解释示例
Treat 组 inflammatory response、TNFA signaling via NF-kB 和 interferon response 分数升高,
说明处理诱导免疫炎症相关通路活化。
与 DEG 富集不同,GSVA 可以在单个样本层面比较通路活性。
九、交付物
- GSVA score matrix
- ssGSEA score matrix
- 差异通路表
- 通路热图
- 重点通路箱线图
- 样本通路 PCA
- 通路机制解释报告