Last updated: 2025-02-24
Checks: 7 0
Knit directory: muse/
File | Version | Author | Date | Message |
Rmd | bba6082 | Dave Tang | 2025-02-24 | Checking out AnnotationHub |
The AnnotationHub package provides a client interface to resources stored at the AnnotationHub web service.
if (!require("BiocManager", quietly = TRUE))
[1] '3.14.0'
The AnnotationHub package is straightforward to use. Create an
ah <- AnnotationHub()
Now at this point you have already done everything you need in order
to start retrieving annotations. For most operations, using the
object should feel a lot like working with a
familiar list or data.frame.
AnnotationHub with 72098 records
# snapshotDate(): 2024-10-28
# $dataprovider: Ensembl, BroadInstitute, UCSC,
# $species: Homo sapiens, Mus musculus, Drosophila melanogaster, Rattus norv...
# $rdataclass: GRanges, TwoBitFile, BigWigFile, EnsDb, Rle, OrgDb, SQLiteFil...
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH5012"]]'
AH5012 | Chromosome Band
AH5013 | STS Markers
AH5014 | FISH Clones
AH5015 | Recomb Rate
AH5016 | ENCODE Pilot
... ...
AH119504 | Ensembl 113 EnsDb for Xiphophorus maculatus
AH119505 | Ensembl 113 EnsDb for Xenopus tropicalis
AH119506 | Ensembl 113 EnsDb for Zonotrichia albicollis
AH119507 | Ensembl 113 EnsDb for Zalophus californianus
AH119508 | Ensembl 113 EnsDb for Zosterops lateralis melanops
You can see that it gives you an idea about the different types of data that are present inside the hub. You can see where the data is coming from (dataprovider), as well as what species have samples present (species), what kinds of R data objects could be returned (rdataclass). We can take a closer look at all the kinds of data providers that are available by simply looking at the contents of dataprovider as if it were the column of a data.frame object like this:
[1] "UCSC"
[2] "Ensembl"
[3] "RefNet"
[4] "Inparanoid8"
[5] "NHLBI"
[6] "ChEA"
[7] "Pazar"
[8] "NIH Pathway Interaction Database"
[9] "Haemcode"
[10] "BroadInstitute"
[11] "PRIDE"
[12] "Gencode"
[13] "CRIBI"
[14] "Genoscope"
[16] "Stanford"
[17] "dbSNP"
[18] "BioMart"
[19] "GeneOntology"
[20] "KEGG"
[21] "URGI"
[22] "EMBL-EBI"
[23] "MicrosporidiaDB"
[24] "FungiDB"
[25] "TriTrypDB"
[26] "ToxoDB"
[27] "AmoebaDB"
[28] "PlasmoDB"
[29] "PiroplasmaDB"
[30] "CryptoDB"
[31] "TrichDB"
[32] "GiardiaDB"
[33] "The Gene Ontology Consortium"
[34] "ENCODE Project"
[35] "SchistoDB"
[36] "NCBI/UniProt"
[37] "GENCODE"
[38] ""
[39] "RMBase v2.0"
[40] "snoRNAdb"
[41] "tRNAdb"
[42] "NCBI"
[43] "DrugAge, DrugBank, Broad Institute"
[44] "DrugAge"
[45] "DrugBank"
[46] "Broad Institute"
[48] "STRING"
[49] "OMA"
[50] "OrthoDB"
[51] "PathBank"
[52] "EBI/EMBL"
[55] "WikiPathways"
[57] "pyGenomeTracks "
[58] "NA"
[59] "UoE"
[60] "TargetScan,miRTarBase,USCS,ENSEMBL"
[61] "TargetScan"
[62] "QuickGO"
[63] "CIS-BP"
[64] "CTCFBSDB 2.0"
[65] "HOCOMOCO v11"
[66] "JASPAR 2022"
[67] "Jolma 2013"
[68] "SwissRegulon"
[70] "MassBank"
[71] "excluderanges"
[72] "ENCODE"
[73] "GitHub"
[74] ""
[75] "Publication"
[76] "CHM13"
[77] "UCSChub"
[78] "Google DeepMind"
[79] "UWashington"
[80] "Bioconductor"
[81] "ENCODE cCREs"
[82] "The Human Phenotype Ontology"
[83] "MGI"
[84] ""
In the same way, you can also see data from different species inside the hub by looking at the contents of species.
[1] "Homo sapiens" "Vicugna pacos" "Dasypus novemcinctus"
[4] "Otolemur garnettii" "Papio hamadryas" "Papio anubis"
One can get chain files for Drosophila melanogaster from UCSC with:
dm <- query(ah, c("ChainFile", "UCSC", "Drosophila melanogaster"))
AnnotationHub with 45 records
# snapshotDate(): 2024-10-28
# $dataprovider: UCSC
# $species: Drosophila melanogaster
# $rdataclass: ChainFile
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH15102"]]'
AH15102 | dm3ToAnoGam1.over.chain.gz
AH15103 | dm3ToApiMel3.over.chain.gz
AH15104 | dm3ToDm2.over.chain.gz
AH15105 | dm3ToDm6.over.chain.gz
AH15106 | dm3ToDp3.over.chain.gz
... ...
AH15142 | dm2ToDroVir3.over.chain.gz
AH15143 | dm2ToDroWil1.over.chain.gz
AH15144 | dm2ToDroYak1.over.chain.gz
AH15145 | dm2ToDroYak2.over.chain.gz
AH15146 | dm1ToDm2.over.chain.gz
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/; LAPACK version 3.10.0
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] AnnotationHub_3.14.0 BiocFileCache_2.14.0 dbplyr_2.5.0
[4] BiocGenerics_0.52.0 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] KEGGREST_1.46.0 xfun_0.48 bslib_0.8.0
[4] processx_3.8.4 Biobase_2.66.0 callr_3.7.6
[7] vctrs_0.6.5 tools_4.4.1 ps_1.8.1
[10] generics_0.1.3 stats4_4.4.1 curl_5.2.3
[13] tibble_3.2.1 fansi_1.0.6 AnnotationDbi_1.68.0
[16] RSQLite_2.3.7 blob_1.2.4 pkgconfig_2.0.3
[19] S4Vectors_0.44.0 GenomeInfoDbData_1.2.13 lifecycle_1.0.4
[22] compiler_4.4.1 stringr_1.5.1 git2r_0.35.0
[25] Biostrings_2.74.1 getPass_0.2-4 GenomeInfoDb_1.42.3
[28] httpuv_1.6.15 htmltools_0.5.8.1 sass_0.4.9
[31] yaml_2.3.10 later_1.3.2 pillar_1.9.0
[34] crayon_1.5.3 jquerylib_0.1.4 whisker_0.4.1
[37] cachem_1.1.0 mime_0.12 tidyselect_1.2.1
[40] digest_0.6.37 stringi_1.8.4 purrr_1.0.2
[43] dplyr_1.1.4 BiocVersion_3.20.0 rprojroot_2.0.4
[46] fastmap_1.2.0 cli_3.6.3 magrittr_2.0.3
[49] utf8_1.2.4 withr_3.0.2 UCSC.utils_1.2.0
[52] filelock_1.0.3 promises_1.3.0 rappdirs_0.3.3
[55] bit64_4.5.2 XVector_0.46.0 rmarkdown_2.28
[58] httr_1.4.7 bit_4.5.0 png_0.1-8
[61] memoise_2.0.1 evaluate_1.0.1 knitr_1.48
[64] IRanges_2.40.1 rlang_1.1.4 Rcpp_1.0.13
[67] glue_1.8.0 DBI_1.2.3 BiocManager_1.30.25
[70] rstudioapi_0.17.1 jsonlite_1.8.9 R6_2.5.1
[73] zlibbioc_1.52.0 fs_1.6.4