R | 提取GO分类下的所有基因

问题描述

有时候我们想知道与某一个GO注释分类相关的基因有哪些,那么我们就需要一种方法将注释到这个GO term所有的基因提取出来

解决方案

在搜索一轮后,发现可以通过以下代码解决:

library(tidyverse)
library(org.Hs.eg.db)
GOgeneID <- get(GOID, org.Hs.egGO2ALLEGS) %>% mget(org.Hs.egSYMBOL) %>% unlist() 

下面用DNA 复制(GO:0006260)这一生物学过程为例子,使用人源的GO注释进行展开


library(tidyverse)
library(org.Hs.eg.db)
# GO ID --> gene entrez ID
DNA_geneID <- get('GO:0006260', org.Hs.egGO2ALLEGS) 
> head(DNA_geneID)
  TAS   IEA   TAS   IMP   TAS   ISS 
 "94" "466" "472" "545" "545" "546" 
> length(DNA_geneID)
[1] 421

org.Hs.egGO2ALLEGS 包含GO ID与 Entrez ID之间的对应关系,输出的结果中还标注了该基因的注释证据程度,包括以下分类 :

IMP: inferred from mutant phenotype

IGI: inferred from genetic interaction

IPI: inferred from physical interaction

ISS: inferred from sequence similarity

IDA: inferred from direct assay

IEP: inferred from expression pattern

IEA: inferred from electronic annotation

TAS: traceable author statement

NAS: non-traceable author statement

ND: no biological data available

IC: inferred by curator

详细分类结果可以到以下网址查询:
http://geneontology.org/docs/guide-go-evidence-codes/

进一步我们还可以将Entrez ID转换为Symbol

DNA_geneSYMBOL <- mget(DNA_geneID, org.Hs.egSYMBOL) %>% unlist() 
> head(DNA_geneSYMBOL)
      94      466      472      545      545      546 
"ACVRL1"   "ATF1"    "ATM"    "ATR"    "ATR"   "ATRX" 

完。

ref
https://davetang.org/muse/2011/05/20/extract-gene-names-according-to-go-terms/
https://www.ebi.ac.uk/QuickGO/term/GO:0006260
http://geneontology.org/docs/guide-go-evidence-codes/

你可能感兴趣的:(R | 提取GO分类下的所有基因)