← 返回分析流程中心创建时间 2026/6/3 分析难度 高级 推荐场景 转录组分析 预计耗时 3-5 天
Pipeline Detail
RNA-Seq转录组结构与网络
RNA editing 分析:A-to-I editing
面向神经、肿瘤和免疫 RNA-seq 的 A-to-I RNA editing 分析流程,覆盖 DNA/RNA 区分、REDItools/GIREMI、位点过滤、SnpEff/ANNOVAR 注释和功能解释。
Metadata
流程元数据
先看应用场景、输入输出和工具依赖,再进入正文命令细节。
Difficulty
高级
Scenario
转录组分析
Estimated Time
3-5 天
Tools
STARREDItoolsGIREMISnpEffANNOVAR
Inputs
FASTQBAMVCF
Outputs
report
Workflow DAG
流程图
用步骤节点快速理解这个分析从原始数据到结果报告的流转关系。
STEP 1
→建立 RNA editing 项目
STEP 2
→RNA-seq 比对与预处理
STEP 3
→DNA/SNP 数据过滤
STEP 4
→REDItools calling
STEP 5
→GIREMI 候选位点
STEP 6
→A-to-I 位点过滤
STEP 7
→SnpEff/ANNOVAR 注释
STEP 8
→editing level 可视化
STEP 9
功能解释报告
Protocol
流程文档
正文保留 Markdown 排版、代码语言标识和表格样式,适合边学边复现。
RNA editing 分析:A-to-I editing
一、项目目录
mkdir -p rna_editing_project/{00_metadata,01_alignment,02_variant_filter,03_reditools,04_giremi,05_annotation,06_plots,report}
二、示例数据
00_metadata/sample_info.csv:
sample_id,group,rna_bam,dna_vcf
Brain_1,Case,01_alignment/Brain_1.rna.bam,02_variant_filter/Brain_1.dna.vcf.gz
Brain_2,Control,01_alignment/Brain_2.rna.bam,02_variant_filter/Brain_2.dna.vcf.gz
候选 editing 位点示例:
chrom,pos,ref,alt,gene,editing_level,coverage
chr1,100100,A,G,GRIA2,0.42,86
chr2,200500,T,C,AZIN1,0.31,65
三、整体流程图
flowchart TD
A[RNA-seq FASTQ] --> B[STAR/HISAT2 比对]
B --> C[sorted BAM + duplicates/QC]
C --> D[REDItools / GIREMI calling]
E[matched DNA VCF or SNP database] --> F[remove genomic SNPs]
D --> F
F --> G[keep A-to-G / T-to-C candidates]
G --> H[coverage/editing level filters]
H --> I[SnpEff / ANNOVAR annotation]
I --> J[group comparison and plots]
J --> K[editing report]
四、RNA-seq 比对
STAR --runThreadN 16 --genomeDir ref/star_index --readFilesIn sample_R1.fq.gz sample_R2.fq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outFileNamePrefix 01_alignment/Brain_1_
samtools index 01_alignment/Brain_1_Aligned.sortedByCoord.out.bam
五、REDItools calling
python REDItoolDnaRna.py -i 01_alignment/Brain_1_Aligned.sortedByCoord.out.bam -f ref/genome.fa -o 03_reditools/Brain_1 -t 12 -m 20 -q 25 -c 10
常用过滤含义:
| 参数 | 含义 |
|---|---|
-m | mapping quality |
-q | base quality |
-c | minimum coverage |
六、GIREMI 思路
GIREMI 常用于在缺少 matched DNA 时利用 allelic linkage 和 RNA-seq 信息推断 editing 位点。
GIREMI -f ref/genome.fa -l known_snp.vcf -o 04_giremi/Brain_1.giremi.txt 01_alignment/Brain_1_Aligned.sortedByCoord.out.bam
七、A-to-I 位点过滤
A-to-I 在正链表现为 A>G,在负链常表现为 T>C。
import pandas as pd
df = pd.read_csv("03_reditools/Brain_1/reditools_table.tsv", sep=" ")
filtered = df[
(df["coverage"] >= 10)
& (df["editing_level"] >= 0.05)
& (
((df["ref"] == "A") & (df["alt"] == "G"))
| ((df["ref"] == "T") & (df["alt"] == "C"))
)
].copy()
known_snp = pd.read_csv("02_variant_filter/known_snp_sites.tsv", sep=" ")
filtered = filtered.merge(
known_snp[["chrom", "pos"]],
on=["chrom", "pos"],
how="left",
indicator=True
)
filtered = filtered[filtered["_merge"] == "left_only"].drop(columns="_merge")
filtered.to_csv("03_reditools/Brain_1_AtoI_filtered.tsv", sep=" ", index=False)
八、SnpEff/ANNOVAR 注释
java -jar snpEff.jar GRCh38.99 03_reditools/Brain_1_AtoI_filtered.vcf > 05_annotation/Brain_1_AtoI.snpeff.vcf
ANNOVAR:
table_annovar.pl Brain_1_AtoI.avinput humandb/ -buildver hg38 -out 05_annotation/Brain_1_AtoI -remove -protocol refGene,avsnp150,dbnsfp42a -operation g,f,f -nastring .
九、editing level 可视化
library(tidyverse)
edit <- read.delim("03_reditools/Brain_1_AtoI_filtered.tsv")
ggplot(edit, aes(editing_level)) +
geom_histogram(bins = 40, fill = "#2c7fb8", color = "white") +
theme_bw() +
labs(x = "Editing level", y = "Site count")
组间比较:
editing_matrix <- read.csv("editing_level_matrix.csv", row.names = 1)
sample_info <- read.csv("00_metadata/sample_info.csv")
site <- "chr1:100100"
plot_df <- data.frame(
editing_level = as.numeric(editing_matrix[site, sample_info$sample_id]),
group = sample_info$group
)
ggplot(plot_df, aes(group, editing_level, fill = group)) +
geom_boxplot(width = 0.5) +
geom_jitter(width = 0.08, size = 2) +
theme_bw()
十、结果解释图例
| 指标 | 解释 |
|---|---|
| coverage | 覆盖该位点的 reads 数 |
| editing level | alt reads / total reads |
| A>G / T>C | A-to-I editing 典型表现 |
| nonsynonymous | 可能改变蛋白序列 |
| Alu/repeat region | 人类 A-to-I editing 常见区域 |
十一、交付物
- RNA BAM QC
- REDItools/GIREMI 原始候选表
- 去 SNP 后 A-to-I editing 位点表
- editing level matrix
- SnpEff/ANNOVAR 注释表
- 组间差异 editing 位点
- editing level 分布图和箱线图
- 重点位点 IGV 截图和报告