Tag Archives: bioinformatics

How to map a SNP or CpG site to the proximal gene in R

There may be many ways, how to do that. Here are my favorite methods

SNP <- c("rs123")
BiocManager::install('grimbough/biomaRt')
library(biomaRt)
library(plyr)
grch37.snp = useMart(biomart="ENSEMBL_MART_SNP", host="grch37.ensembl.org", path="/biomart/martservice",dataset="hsapiens_snp")
grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")
t1 <- getBM(attributes = c("refsnp_id", "ensembl_gene_stable_id", "chr_name","chrom_start","chrom_end"), filters = "snp_filter", values = SNP, mart = grch37.snp)
names(t1)[names(t1)=="ensembl_gene_stable_id"] <- c("ensembl_gene_id")
t2 <- getBM(attributes = c("ensembl_gene_id","external_gene_name","start_position","end_position","description"), filters = "ensembl_gene_id", values =  ensembl_gene_id, mart = grch37)
join(t1,t2, type="left", by="ensembl_gene_id", match = "first")

and

BiocManager::install('FDb.InfiniumMethylation.hg19')
library(FDb.InfiniumMethylation.hg19)
CpG <- c("cg00920043")
hm450 <- get450k()
probes <- hm450[CpG]
getNearestTSS(probes)

Science is an emergent system too

From Edge / NY Times

We often try to understand problems by taking apart and studying their constituent parts. But emergent problems can’t be understood this way. Emergent systems are ones in which many different elements interact. The pattern of interaction then produces a new element that is greater than the sum of the parts, which then exercises a top-down influence on the constituent elements. Continue reading Science is an emergent system too

SNP batch annotation of GWAs

Genowatch (paper|website) is doing pretty well by annotating large SNP sets that would require otherwise numerous hours to map their position on genes, biological function and pathways. Continue reading SNP batch annotation of GWAs

GPS for biological pathways

After running a dual core CPU for two weeks I have a list here of all transcripts that are associated with the “ORMDL3” SNP gene cluster. Making sense from this list is a difficult task even with dozen of dedicated websites.
To get an overview of what is available I would start Continue reading GPS for biological pathways