返回分析流程中心

Pipeline Detail

Pathway Analysis机制解释与多组学调控

GSVA/ssGSEA 通路活性评分

将基因表达矩阵转换为样本级通路活性矩阵,支持 GSVA、ssGSEA、PROGENy、Hallmark gene sets 和组间通路比较。

创建时间
2026/6/3
分析难度
中级
推荐场景
通路活性
预计耗时
1-3 天

Metadata

流程元数据

先看应用场景、输入输出和工具依赖,再进入正文命令细节。

Difficulty

中级

Scenario

通路活性

Estimated Time

1-3 天

Tools

GSVAssGSEA

Inputs

TPMexpression matrix

Outputs

heatmapreport

Workflow DAG

流程图

用步骤节点快速理解这个分析从原始数据到结果报告的流转关系。

STEP 1

建立通路评分项目

STEP 2

表达矩阵准备

STEP 3

基因集选择

STEP 4

GSVA/ssGSEA 评分

STEP 5

组间差异通路

STEP 6

通路热图/PCA

STEP 7

通路活性报告

Protocol

流程文档

正文保留 Markdown 排版、代码语言标识和表格样式,适合边学边复现。

GSVA/ssGSEA 通路活性评分

一、项目目录

mkdir -p pathway_score_project/{00_input,01_genesets,02_scores,03_statistics,04_plots,report}

二、示例数据

00_input/expression_tpm.csv

gene_symbol,Ctrl_1,Ctrl_2,Treat_1,Treat_2
IL6,2.1,2.4,20.5,18.9
CXCL8,1.2,1.5,15.2,14.8
GAPDH,100,98,105,101

00_input/sample_info.csv

sample_id,condition
Ctrl_1,Ctrl
Ctrl_2,Ctrl
Treat_1,Treat
Treat_2,Treat

三、整体流程图

flowchart TD
    A[gene x sample expression matrix] --> B[log2(TPM+1) 或 VST]
    B --> C[MSigDB Hallmark / KEGG / Reactome gene sets]
    C --> D[GSVA / ssGSEA]
    D --> E[pathway x sample score matrix]
    E --> F[limma / Wilcoxon 组间比较]
    F --> G[通路热图和箱线图]
    G --> H[通路机制解释]

四、GSVA 评分

library(GSVA)
library(msigdbr)
library(tidyverse)

expr <- read.csv("00_input/expression_tpm.csv", row.names = 1, check.names = FALSE)
expr_log <- log2(as.matrix(expr) + 1)

hallmark <- msigdbr(species = "Homo sapiens", category = "H") |>
  split(x = .$gene_symbol, f = .$gs_name)

gsva_scores <- gsva(
  expr_log,
  hallmark,
  method = "gsva",
  kcdf = "Gaussian",
  verbose = FALSE
)

write.csv(gsva_scores, "02_scores/gsva_hallmark_scores.csv")

五、ssGSEA 评分

ssgsea_scores <- gsva(
  expr_log,
  hallmark,
  method = "ssgsea",
  kcdf = "Gaussian",
  abs.ranking = TRUE,
  verbose = FALSE
)

write.csv(ssgsea_scores, "02_scores/ssgsea_hallmark_scores.csv")

六、组间通路差异

library(limma)

sample_info <- read.csv("00_input/sample_info.csv")
sample_info <- sample_info[match(colnames(gsva_scores), sample_info$sample_id), ]

design <- model.matrix(~ 0 + condition, data = sample_info)
colnames(design) <- levels(factor(sample_info$condition))

fit <- lmFit(gsva_scores, design)
contrast <- makeContrasts(Treat_vs_Ctrl = Treat - Ctrl, levels = design)
fit2 <- contrasts.fit(fit, contrast)
fit2 <- eBayes(fit2)

pathway_res <- topTable(fit2, number = Inf)
write.csv(pathway_res, "03_statistics/gsva_Treat_vs_Ctrl.csv")

七、可视化

library(pheatmap)

top_pathways <- rownames(pathway_res)[1:30]

pheatmap(
  gsva_scores[top_pathways, ],
  scale = "row",
  annotation_col = data.frame(condition = sample_info$condition, row.names = sample_info$sample_id),
  filename = "04_plots/top_pathway_heatmap.pdf",
  width = 8,
  height = 10
)

箱线图:

pathway <- "HALLMARK_INFLAMMATORY_RESPONSE"

plot_df <- data.frame(
  score = gsva_scores[pathway, ],
  sample_info
)

ggplot(plot_df, aes(condition, score, fill = condition)) +
  geom_boxplot(width = 0.5, outlier.shape = NA) +
  geom_jitter(width = 0.08, size = 2) +
  theme_bw() +
  labs(title = pathway, y = "GSVA score")

八、结果解释示例

Treat 组 inflammatory response、TNFA signaling via NF-kB 和 interferon response 分数升高,
说明处理诱导免疫炎症相关通路活化。
与 DEG 富集不同,GSVA 可以在单个样本层面比较通路活性。

九、交付物

  • GSVA score matrix
  • ssGSEA score matrix
  • 差异通路表
  • 通路热图
  • 重点通路箱线图
  • 样本通路 PCA
  • 通路机制解释报告