{"title":"CUT&Tag T2T 基因组差异结合分析","description":"面向 CUT&Tag 数据的 T2T 基因组比对、CPM 标准化、SEACR 搜峰、DiffBind 差异 Peak 与 GO 富集分析流程。","omics_type":"CUT&Tag","category_key":"epigenomics","category_name":"表观调控实验","dag_json":{"nodes":[{"id":"qc","label":"Trim Galore/FastQC 质控"},{"id":"index","label":"Bowtie2 构建 T2T 索引"},{"id":"mapping","label":"T2T 基因组严格双端比对"},{"id":"spikein","label":"Spike-in 诊断"},{"id":"coverage","label":"CPM BedGraph 生成"},{"id":"seacr","label":"SEACR 精确搜峰"},{"id":"diffbind","label":"DiffBind 差异 Peak"},{"id":"annotation","label":"ChIPseeker/GO 注释"}],"edges":[{"source":"qc","target":"index"},{"source":"index","target":"mapping"},{"source":"mapping","target":"spikein"},{"source":"mapping","target":"coverage"},{"source":"coverage","target":"seacr"},{"source":"seacr","target":"diffbind"},{"source":"diffbind","target":"annotation"}]},"metadata_json":{"difficulty":"入门","tools":["DESeq2","STAR"],"inputs":["FASTQ","BAM"],"outputs":["report"],"estimated_time":"0.5-1 天","scenario":"表观调控"},"content":"# CUT&Tag T2T 基因组差异结合分析\n\n## 适用场景\n适用于 CUT&Tag 或 CUT&RUN 染色质结合数据，尤其是使用 T2T 级别参考基因组、需要比较处理组与对照组蛋白结合重分布的项目。\n\n本流程来自 Zeocin vs Control 的 TOP1 CUT&Tag 分析场景：使用拟南芥 Col-PEK T2T 基因组进行 Bowtie2 比对，结合 Spike-in 诊断、CPM 标准化、SEACR 搜峰、DiffBind 差异结合分析、ChIPseeker 注释和 GO 富集，解释 DNA 损伤诱导的 TOP1 全基因组重分布。\n\n## 样本设计\n| SampleID | Condition | Replicate | BAM | Peak |\n| --- | --- | --- | --- | --- |\n| C1 | Control | 1 | `C1_T2T.sorted.bam` | `C1_seacr.stringent.bed` |\n| C2 | Control | 2 | `C2_T2T.sorted.bam` | `C2_seacr.stringent.bed` |\n| C3 | Control | 3 | `C3_T2T.sorted.bam` | `C3_seacr.stringent.bed` |\n| Z1 | Zeocin | 1 | `Z1_T2T.sorted.bam` | `Z1_seacr.stringent.bed` |\n| Z2 | Zeocin | 2 | `Z2_T2T.sorted.bam` | `Z2_seacr.stringent.bed` |\n| Z3 | Zeocin | 3 | `Z3_T2T.sorted.bam` | `Z3_seacr.stringent.bed` |\n\n## 1. 原始 reads 质控与接头过滤\n```bash\nWORKDIR=\"/public/home/yhpeng/cut_tag\"\nRAW_DIR=\"${WORKDIR}/01_raw_data\"\nOUT_DIR=\"${WORKDIR}/03_trimmed_data\"\n\nfor sample in C1 C2 C3 Z1 Z2 Z3\ndo\n  trim_galore     --paired     --quality 20     --phred33     --fastqc     --cores 4     --gzip     -o ${OUT_DIR}     ${RAW_DIR}/${sample}_R1.fq.gz     ${RAW_DIR}/${sample}_R2.fq.gz\ndone\n```\n\n## 2. 构建 T2T 基因组 Bowtie2 索引\n```bash\nWORKDIR=\"/public/home/yhpeng/cut_tag\"\nREF_DIR=\"${WORKDIR}/ref_genenic\"\n\nbowtie2-build   --threads 4   ${REF_DIR}/Arabidopsis_thaliana_Col-PEK.genomic.fa   ${REF_DIR}/Col_PEK_T2T\n```\n\n## 3. 比对到 T2T 基因组并过滤多重比对\n```bash\nWORKDIR=\"/public/home/yhpeng/cut_tag\"\nTRIM_DIR=\"${WORKDIR}/03_trimmed_data\"\nALN_DIR=\"${WORKDIR}/04_alignment\"\nREF_INDEX=\"${WORKDIR}/ref_genenic/Col_PEK_T2T\"\n\nfor sample in C1 C2 C3 Z1 Z2 Z3\ndo\n  bowtie2     --end-to-end     --very-sensitive     --no-mixed     --no-discordant     --phred33     -I 10     -X 700     -p 8     -x ${REF_INDEX}     -1 ${TRIM_DIR}/${sample}_R1_val_1.fq.gz     -2 ${TRIM_DIR}/${sample}_R2_val_2.fq.gz     -S ${ALN_DIR}/${sample}_T2T.sam\n\n  samtools view -bS -q 20 -@ 8 ${ALN_DIR}/${sample}_T2T.sam |     samtools sort -@ 8 -o ${ALN_DIR}/${sample}_T2T.sorted.bam\n\n  samtools index ${ALN_DIR}/${sample}_T2T.sorted.bam\n  rm ${ALN_DIR}/${sample}_T2T.sam\ndone\n```\n\n## 4. Spike-in 诊断与归一化策略\n| Group | Samples | Spike-in rate |\n| --- | --- | --- |\n| Control | C1, C2, C3 | ~6.63% |\n| Zeocin | Z1, Z2, Z3 | ~1.45% |\n\nZeocin 处理组 Spike-in 比对率相较对照组出现约 4.5 倍下降，同时拟南芥基因组比对 reads 明显上升。这说明处理组中真实靶序列发生大规模扩增，外源 Spike-in 被物理稀释。\n\n**关键判断：** 对于这种全局剧烈变化的 CUT&Tag 实验，不应直接使用外源 Spike-in scale factor 强行放大处理组信号，否则容易造成全基因组背景假阳性。这里采用 CPM 或 DESeq2 内部相对定量，重点寻找特异性重分布靶点。\n\n## 5. 生成 CPM 标准化 BedGraph\n```bash\nWORKDIR=\"/public/home/yhpeng/cut_tag\"\nALN_DIR=\"${WORKDIR}/04_alignment\"\nPEAK_DIR=\"${WORKDIR}/05_peaks\"\nmkdir -p ${PEAK_DIR}\n\nfor sample in C1 C2 C3 Z1 Z2 Z3\ndo\n  bamCoverage     -b ${ALN_DIR}/${sample}_T2T.sorted.bam     -o ${PEAK_DIR}/${sample}.bedgraph     --outFileFormat bedgraph     --normalizeUsing CPM     --binSize 10     -p 8\ndone\n```\n\n## 6. SEACR 无 IgG 模式搜峰\nSEACR 适合 CUT&Tag 的窄峰/稀疏信号场景。对于无 IgG 对照的实验，可以使用 stringent 模式获得更干净的候选峰。\n\n```bash\nwget https://github.com/FredHutch/SEACR/raw/master/SEACR_1.3.sh\nwget https://github.com/FredHutch/SEACR/raw/master/SEACR_1.3.R\n\nbash SEACR_1.3.sh C1.bedgraph 0.01 non stringent C1_seacr\n```\n\n## 7. DiffBind 差异 Peak 定量\n```r\nlibrary(DiffBind)\n\ncut_tag <- dba(sampleSheet = \"sample_sheet.csv\")\ncut_tag <- dba.count(cut_tag, bUseSummarizeOverlaps = TRUE)\ncut_tag <- dba.contrast(cut_tag, categories = DBA_CONDITION, minMembers = 2)\ncut_tag <- dba.analyze(cut_tag)\n\nres_deseq <- dba.report(cut_tag, method = DBA_DESEQ2, contrast = 1, th = 0.05)\nwrite.csv(as.data.frame(res_deseq), file = \"Zeocin_vs_Control_DiffPeaks.csv\")\n\ndba.plotPCA(cut_tag, DBA_CONDITION, label = DBA_ID)\ndba.plotVolcano(cut_tag)\n```\n\n## 8. T2T 注释、TAIR 名称转换与 GO 富集\n```r\nlibrary(ChIPseeker)\nlibrary(GenomicFeatures)\nlibrary(GenomicRanges)\nlibrary(txdbmaker)\n\ntxdb <- makeTxDbFromGFF(\"Arabidopsis_thaliana_Col-PEK.genomic.gff\", format = \"gff3\")\ndiff_peaks <- read.csv(\"Zeocin_vs_Control_DiffPeaks.csv\")\n\npeak_gr <- GRanges(\n  seqnames = diff_peaks$seqnames,\n  ranges = IRanges(start = diff_peaks$start, end = diff_peaks$end),\n  strand = diff_peaks$strand\n)\nmcols(peak_gr) <- diff_peaks[, c(\"Fold\", \"FDR\")]\n\npeak_anno <- annotatePeak(peak_gr, tssRegion = c(-3000, 3000), TxDb = txdb)\nwrite.csv(as.data.frame(peak_anno), \"Zeocin_vs_Control_Annotated_DiffPeaks.csv\", row.names = FALSE)\n```\n\n```r\nlibrary(clusterProfiler)\nlibrary(org.At.tair.db)\n\ndiff_genes <- read.csv(\"Zeocin_vs_Control_TAIR_Ready.csv\")\ndiff_genes <- diff_genes[grep(\"^AT[1-5]G\\d{5}$\", diff_genes$tairId), ]\n\ngenes_up <- unique(diff_genes$tairId[diff_genes$Fold > 0])\ngenes_down <- unique(diff_genes$tairId[diff_genes$Fold < 0])\n\ngo_up <- enrichGO(gene = genes_up, OrgDb = org.At.tair.db, keyType = \"TAIR\", ont = \"BP\")\ngo_down <- enrichGO(gene = genes_down, OrgDb = org.At.tair.db, keyType = \"TAIR\", ont = \"ALL\")\n```\n\n## 生物学解释\n正常状态下，TOP1 更集中地占据启动子或活跃转录区，形成清晰的局部高峰；Zeocin 诱导 DNA 损伤后，TOP1 发生全基因组重分布，更多 reads 映射到拟南芥基因组，但局部孤立峰可能减少，呈现“高原化”的广泛结合模式。\n\n## 主要输出\n- Trim Galore/FastQC 质控报告\n- T2T 基因组比对 BAM 与索引\n- Spike-in 诊断统计\n- CPM 标准化 BedGraph\n- SEACR peak BED 文件\n- DiffBind 差异 Peak 表\n- ChIPseeker 注释结果\n- TAIR ID 转换表与 GO 富集图\n","id":22,"created_at":"2026-06-03T17:48:59.764103Z"}