返回分析流程中心

Pipeline Detail

RNA-Seq转录组结构与网络

RNA editing 分析:A-to-I editing

面向神经、肿瘤和免疫 RNA-seq 的 A-to-I RNA editing 分析流程,覆盖 DNA/RNA 区分、REDItools/GIREMI、位点过滤、SnpEff/ANNOVAR 注释和功能解释。

创建时间
2026/6/3
分析难度
高级
推荐场景
转录组分析
预计耗时
3-5 天

Metadata

流程元数据

先看应用场景、输入输出和工具依赖,再进入正文命令细节。

Difficulty

高级

Scenario

转录组分析

Estimated Time

3-5 天

Tools

STARREDItoolsGIREMISnpEffANNOVAR

Inputs

FASTQBAMVCF

Outputs

report

Workflow DAG

流程图

用步骤节点快速理解这个分析从原始数据到结果报告的流转关系。

STEP 1

建立 RNA editing 项目

STEP 2

RNA-seq 比对与预处理

STEP 3

DNA/SNP 数据过滤

STEP 4

REDItools calling

STEP 5

GIREMI 候选位点

STEP 6

A-to-I 位点过滤

STEP 7

SnpEff/ANNOVAR 注释

STEP 8

editing level 可视化

STEP 9

功能解释报告

Protocol

流程文档

正文保留 Markdown 排版、代码语言标识和表格样式,适合边学边复现。

RNA editing 分析:A-to-I editing

一、项目目录

mkdir -p rna_editing_project/{00_metadata,01_alignment,02_variant_filter,03_reditools,04_giremi,05_annotation,06_plots,report}

二、示例数据

00_metadata/sample_info.csv

sample_id,group,rna_bam,dna_vcf
Brain_1,Case,01_alignment/Brain_1.rna.bam,02_variant_filter/Brain_1.dna.vcf.gz
Brain_2,Control,01_alignment/Brain_2.rna.bam,02_variant_filter/Brain_2.dna.vcf.gz

候选 editing 位点示例:

chrom,pos,ref,alt,gene,editing_level,coverage
chr1,100100,A,G,GRIA2,0.42,86
chr2,200500,T,C,AZIN1,0.31,65

三、整体流程图

flowchart TD
    A[RNA-seq FASTQ] --> B[STAR/HISAT2 比对]
    B --> C[sorted BAM + duplicates/QC]
    C --> D[REDItools / GIREMI calling]
    E[matched DNA VCF or SNP database] --> F[remove genomic SNPs]
    D --> F
    F --> G[keep A-to-G / T-to-C candidates]
    G --> H[coverage/editing level filters]
    H --> I[SnpEff / ANNOVAR annotation]
    I --> J[group comparison and plots]
    J --> K[editing report]

四、RNA-seq 比对

STAR   --runThreadN 16   --genomeDir ref/star_index   --readFilesIn sample_R1.fq.gz sample_R2.fq.gz   --readFilesCommand zcat   --outSAMtype BAM SortedByCoordinate   --outFileNamePrefix 01_alignment/Brain_1_

samtools index 01_alignment/Brain_1_Aligned.sortedByCoord.out.bam

五、REDItools calling

python REDItoolDnaRna.py   -i 01_alignment/Brain_1_Aligned.sortedByCoord.out.bam   -f ref/genome.fa   -o 03_reditools/Brain_1   -t 12   -m 20   -q 25   -c 10

常用过滤含义:

参数含义
-mmapping quality
-qbase quality
-cminimum coverage

六、GIREMI 思路

GIREMI 常用于在缺少 matched DNA 时利用 allelic linkage 和 RNA-seq 信息推断 editing 位点。

GIREMI   -f ref/genome.fa   -l known_snp.vcf   -o 04_giremi/Brain_1.giremi.txt   01_alignment/Brain_1_Aligned.sortedByCoord.out.bam

七、A-to-I 位点过滤

A-to-I 在正链表现为 A>G,在负链常表现为 T>C。

import pandas as pd

df = pd.read_csv("03_reditools/Brain_1/reditools_table.tsv", sep="	")

filtered = df[
    (df["coverage"] >= 10)
    & (df["editing_level"] >= 0.05)
    & (
        ((df["ref"] == "A") & (df["alt"] == "G"))
        | ((df["ref"] == "T") & (df["alt"] == "C"))
    )
].copy()

known_snp = pd.read_csv("02_variant_filter/known_snp_sites.tsv", sep="	")
filtered = filtered.merge(
    known_snp[["chrom", "pos"]],
    on=["chrom", "pos"],
    how="left",
    indicator=True
)
filtered = filtered[filtered["_merge"] == "left_only"].drop(columns="_merge")

filtered.to_csv("03_reditools/Brain_1_AtoI_filtered.tsv", sep="	", index=False)

八、SnpEff/ANNOVAR 注释

java -jar snpEff.jar   GRCh38.99   03_reditools/Brain_1_AtoI_filtered.vcf   > 05_annotation/Brain_1_AtoI.snpeff.vcf

ANNOVAR:

table_annovar.pl   Brain_1_AtoI.avinput   humandb/   -buildver hg38   -out 05_annotation/Brain_1_AtoI   -remove   -protocol refGene,avsnp150,dbnsfp42a   -operation g,f,f   -nastring .

九、editing level 可视化

library(tidyverse)

edit <- read.delim("03_reditools/Brain_1_AtoI_filtered.tsv")

ggplot(edit, aes(editing_level)) +
  geom_histogram(bins = 40, fill = "#2c7fb8", color = "white") +
  theme_bw() +
  labs(x = "Editing level", y = "Site count")

组间比较:

editing_matrix <- read.csv("editing_level_matrix.csv", row.names = 1)
sample_info <- read.csv("00_metadata/sample_info.csv")

site <- "chr1:100100"
plot_df <- data.frame(
  editing_level = as.numeric(editing_matrix[site, sample_info$sample_id]),
  group = sample_info$group
)

ggplot(plot_df, aes(group, editing_level, fill = group)) +
  geom_boxplot(width = 0.5) +
  geom_jitter(width = 0.08, size = 2) +
  theme_bw()

十、结果解释图例

指标解释
coverage覆盖该位点的 reads 数
editing levelalt reads / total reads
A>G / T>CA-to-I editing 典型表现
nonsynonymous可能改变蛋白序列
Alu/repeat region人类 A-to-I editing 常见区域

十一、交付物

  • RNA BAM QC
  • REDItools/GIREMI 原始候选表
  • 去 SNP 后 A-to-I editing 位点表
  • editing level matrix
  • SnpEff/ANNOVAR 注释表
  • 组间差异 editing 位点
  • editing level 分布图和箱线图
  • 重点位点 IGV 截图和报告