Last updated: 2025-09-05
Checks: 7 0
Knit directory: muse/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200712)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version bfebc82. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rproj.user/
Ignored: data/1M_neurons_filtered_gene_bc_matrices_h5.h5
Ignored: data/293t/
Ignored: data/293t_3t3_filtered_gene_bc_matrices.tar.gz
Ignored: data/293t_filtered_gene_bc_matrices.tar.gz
Ignored: data/5k_Human_Donor1_PBMC_3p_gem-x_5k_Human_Donor1_PBMC_3p_gem-x_count_sample_filtered_feature_bc_matrix.h5
Ignored: data/5k_Human_Donor2_PBMC_3p_gem-x_5k_Human_Donor2_PBMC_3p_gem-x_count_sample_filtered_feature_bc_matrix.h5
Ignored: data/5k_Human_Donor3_PBMC_3p_gem-x_5k_Human_Donor3_PBMC_3p_gem-x_count_sample_filtered_feature_bc_matrix.h5
Ignored: data/5k_Human_Donor4_PBMC_3p_gem-x_5k_Human_Donor4_PBMC_3p_gem-x_count_sample_filtered_feature_bc_matrix.h5
Ignored: data/97516b79-8d08-46a6-b329-5d0a25b0be98.h5ad
Ignored: data/Parent_SC3v3_Human_Glioblastoma_filtered_feature_bc_matrix.tar.gz
Ignored: data/brain_counts/
Ignored: data/cl.obo
Ignored: data/cl.owl
Ignored: data/jurkat/
Ignored: data/jurkat:293t_50:50_filtered_gene_bc_matrices.tar.gz
Ignored: data/jurkat_293t/
Ignored: data/jurkat_filtered_gene_bc_matrices.tar.gz
Ignored: data/pbmc20k/
Ignored: data/pbmc20k_seurat/
Ignored: data/pbmc3k.h5ad
Ignored: data/pbmc3k/
Ignored: data/pbmc3k_bpcells_mat/
Ignored: data/pbmc3k_export.mtx
Ignored: data/pbmc3k_matrix.mtx
Ignored: data/pbmc3k_seurat.rds
Ignored: data/pbmc4k_filtered_gene_bc_matrices.tar.gz
Ignored: data/pbmc_1k_v3_filtered_feature_bc_matrix.h5
Ignored: data/pbmc_1k_v3_raw_feature_bc_matrix.h5
Ignored: data/refdata-gex-GRCh38-2020-A.tar.gz
Ignored: data/seurat_1m_neuron.rds
Ignored: data/t_3k_filtered_gene_bc_matrices.tar.gz
Ignored: r_packages_4.4.1/
Ignored: r_packages_4.5.0/
Untracked files:
Untracked: analysis/bioc_scrnaseq.Rmd
Untracked: bpcells_matrix/
Untracked: data/Caenorhabditis_elegans.WBcel235.113.gtf.gz
Untracked: data/GCF_043380555.1-RS_2024_12_gene_ontology.gaf.gz
Untracked: data/arab.rds
Untracked: data/astronomicalunit.csv
Untracked: data/ensembl_113_human_mouse_homologues.csv
Untracked: data/femaleMiceWeights.csv
Untracked: m3/
Unstaged changes:
Modified: analysis/isoform_switch_analyzer.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/biomart_homologues.Rmd
)
and HTML (docs/biomart_homologues.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | bfebc82 | Dave Tang | 2025-09-05 | All human mouse homologues for Ensembl 113 |
html | 2d0dd8f | Dave Tang | 2025-09-04 | Build site. |
Rmd | 87b461f | Dave Tang | 2025-09-04 | Using biomaRt to get homologues |
To begin, install the {biomaRt} package.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("biomaRt")
Load package.
suppressPackageStartupMessages(library(biomaRt))
packageVersion("biomaRt")
[1] '2.64.0'
List the available BioMart databases.
listMarts()
biomart version
1 ENSEMBL_MART_ENSEMBL Ensembl Genes 115
2 ENSEMBL_MART_MOUSE Mouse strains 115
3 ENSEMBL_MART_SNP Ensembl Variation 115
4 ENSEMBL_MART_FUNCGEN Ensembl Regulation 115
Connect to the selected BioMart database by using
useMart()
.
ensembl <- useMart("ENSEMBL_MART_ENSEMBL")
avail_datasets <- listDatasets(ensembl)
head(avail_datasets)
dataset description
1 abrachyrhynchus_gene_ensembl Pink-footed goose genes (ASM259213v1)
2 acalliptera_gene_ensembl Eastern happy genes (fAstCal1.3)
3 acarolinensis_gene_ensembl Green anole genes (AnoCar2.0v2)
4 acchrysaetos_gene_ensembl Golden eagle genes (bAquChr1.2)
5 acitrinellus_gene_ensembl Midas cichlid genes (Midas_v5)
6 amelanoleuca_gene_ensembl Giant panda genes (ASM200744v2)
version
1 ASM259213v1
2 fAstCal1.3
3 AnoCar2.0v2
4 bAquChr1.2
5 Midas_v5
6 ASM200744v2
Look for human datasets by searching the description column.
idx <- grep('human', avail_datasets$description, ignore.case = TRUE)
avail_datasets[idx, ]
dataset description version
80 hsapiens_gene_ensembl Human genes (GRCh38.p14) GRCh38.p14
Connect to the selected BioMart database and human dataset.
ensembl <- useMart("ensembl", dataset=avail_datasets[idx, 'dataset'])
ensembl
Object of class 'Mart':
Using the ENSEMBL_MART_ENSEMBL BioMart database
Using the hsapiens_gene_ensembl dataset
Building a query, requires three things:
Use listFilters()
to show available filters.
avail_filters <- listFilters(ensembl)
head(avail_filters)
name description
1 chromosome_name Chromosome/scaffold name
2 start Start
3 end End
4 band_start Band Start
5 band_end Band End
6 marker_start Marker Start
Use listAttributes()
to show available attributes.
avail_attributes <- listAttributes(ensembl)
head(avail_attributes)
name description page
1 ensembl_gene_id Gene stable ID feature_page
2 ensembl_gene_id_version Gene stable ID version feature_page
3 ensembl_transcript_id Transcript stable ID feature_page
4 ensembl_transcript_id_version Transcript stable ID version feature_page
5 ensembl_peptide_id Protein stable ID feature_page
6 ensembl_peptide_id_version Protein stable ID version feature_page
Look for mouse homologues.
grep('homolog', avail_attributes$name, ignore.case = TRUE, value = TRUE) |>
grep('mmus', x = _, ignore.case = TRUE, value = TRUE) -> wanted_attr
wanted_attr <- c('ensembl_gene_id', wanted_attr)
wanted_attr
[1] "ensembl_gene_id"
[2] "mmusculus_homolog_ensembl_gene"
[3] "mmusculus_homolog_associated_gene_name"
[4] "mmusculus_homolog_ensembl_peptide"
[5] "mmusculus_homolog_chromosome"
[6] "mmusculus_homolog_chrom_start"
[7] "mmusculus_homolog_chrom_end"
[8] "mmusculus_homolog_canonical_transcript_protein"
[9] "mmusculus_homolog_subtype"
[10] "mmusculus_homolog_orthology_type"
[11] "mmusculus_homolog_perc_id"
[12] "mmusculus_homolog_perc_id_r1"
[13] "mmusculus_homolog_goc_score"
[14] "mmusculus_homolog_wga_coverage"
[15] "mmusculus_homolog_orthology_confidence"
ENSG00000206172 (HBA1).
my_gene <- 'ENSG00000206172'
getBM(
attributes = wanted_attr,
filters = "ensembl_gene_id",
values = my_gene,
mart = ensembl
) -> my_res
t(my_res)
[,1]
ensembl_gene_id "ENSG00000206172"
mmusculus_homolog_ensembl_gene "ENSMUSG00000069919"
mmusculus_homolog_associated_gene_name "Hba-a1"
mmusculus_homolog_ensembl_peptide "ENSMUSP00000090897"
mmusculus_homolog_chromosome "11"
mmusculus_homolog_chrom_start "32233511"
mmusculus_homolog_chrom_end "32234465"
mmusculus_homolog_canonical_transcript_protein "ENSP00000322421"
mmusculus_homolog_subtype "Boreoeutheria"
mmusculus_homolog_orthology_type "ortholog_many2many"
mmusculus_homolog_perc_id "86.6197"
mmusculus_homolog_perc_id_r1 "86.6197"
mmusculus_homolog_goc_score "75"
mmusculus_homolog_wga_coverage "0"
mmusculus_homolog_orthology_confidence "1"
[,2]
ensembl_gene_id "ENSG00000206172"
mmusculus_homolog_ensembl_gene "ENSMUSG00000069917"
mmusculus_homolog_associated_gene_name "Hba-a2"
mmusculus_homolog_ensembl_peptide "ENSMUSP00000090895"
mmusculus_homolog_chromosome "11"
mmusculus_homolog_chrom_start "32246489"
mmusculus_homolog_chrom_end "32247298"
mmusculus_homolog_canonical_transcript_protein "ENSP00000322421"
mmusculus_homolog_subtype "Boreoeutheria"
mmusculus_homolog_orthology_type "ortholog_many2many"
mmusculus_homolog_perc_id "86.6197"
mmusculus_homolog_perc_id_r1 "86.6197"
mmusculus_homolog_goc_score "25"
mmusculus_homolog_wga_coverage "0"
mmusculus_homolog_orthology_confidence "0"
I manually slugged through the Compara database and found that ENSG00000207721 (MIR186) should have a one-to-one ortholog with ENSMUSG00000065431 (Mir186).
my_gene <- 'ENSG00000207721'
getBM(
attributes = wanted_attr,
filters = "ensembl_gene_id",
values = my_gene,
mart = ensembl
) -> my_res
t(my_res)
[,1]
ensembl_gene_id "ENSG00000207721"
mmusculus_homolog_ensembl_gene "ENSMUSG00000065431"
mmusculus_homolog_associated_gene_name "Mir186"
mmusculus_homolog_ensembl_peptide "ENSMUST00000083497"
mmusculus_homolog_chromosome "3"
mmusculus_homolog_chrom_start "157249916"
mmusculus_homolog_chrom_end "157249986"
mmusculus_homolog_canonical_transcript_protein "ENST00000384988"
mmusculus_homolog_subtype "Eutheria"
mmusculus_homolog_orthology_type "ortholog_one2one"
mmusculus_homolog_perc_id "79.0698"
mmusculus_homolog_perc_id_r1 "95.7747"
mmusculus_homolog_goc_score NA
mmusculus_homolog_wga_coverage "100"
mmusculus_homolog_orthology_confidence "1"
List releases.
listEnsemblArchives()
name date url version
1 Ensembl GRCh37 Feb 2014 https://grch37.ensembl.org GRCh37
2 Ensembl 115 Sep 2025 https://sep2025.archive.ensembl.org 115
3 Ensembl 114 May 2025 https://may2025.archive.ensembl.org 114
4 Ensembl 113 Oct 2024 https://oct2024.archive.ensembl.org 113
5 Ensembl 112 May 2024 https://may2024.archive.ensembl.org 112
6 Ensembl 111 Jan 2024 https://jan2024.archive.ensembl.org 111
7 Ensembl 110 Jul 2023 https://jul2023.archive.ensembl.org 110
8 Ensembl 109 Feb 2023 https://feb2023.archive.ensembl.org 109
9 Ensembl 108 Oct 2022 https://oct2022.archive.ensembl.org 108
10 Ensembl 107 Jul 2022 https://jul2022.archive.ensembl.org 107
11 Ensembl 106 Apr 2022 https://apr2022.archive.ensembl.org 106
12 Ensembl 105 Dec 2021 https://dec2021.archive.ensembl.org 105
13 Ensembl 104 May 2021 https://may2021.archive.ensembl.org 104
14 Ensembl 103 Feb 2021 https://feb2021.archive.ensembl.org 103
15 Ensembl 102 Nov 2020 https://nov2020.archive.ensembl.org 102
16 Ensembl 101 Aug 2020 https://aug2020.archive.ensembl.org 101
17 Ensembl 100 Apr 2020 https://apr2020.archive.ensembl.org 100
18 Ensembl 80 May 2015 https://may2015.archive.ensembl.org 80
19 Ensembl 77 Oct 2014 https://oct2014.archive.ensembl.org 77
20 Ensembl 75 Feb 2014 https://feb2014.archive.ensembl.org 75
21 Ensembl 54 May 2009 https://may2009.archive.ensembl.org 54
current_release
1
2 *
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Use Ensembl 113.
ensembl <- useMart(
"ENSEMBL_MART_ENSEMBL",
dataset=avail_datasets[idx, 'dataset'],
host = "https://oct2024.archive.ensembl.org"
)
Check again.
my_gene <- 'ENSG00000207721'
getBM(
attributes = wanted_attr,
filters = "ensembl_gene_id",
values = my_gene,
mart = ensembl
) -> my_res
t(my_res)
[,1]
ensembl_gene_id "ENSG00000207721"
mmusculus_homolog_ensembl_gene "ENSMUSG00000065431"
mmusculus_homolog_associated_gene_name "Mir186"
mmusculus_homolog_ensembl_peptide "ENSMUST00000083497"
mmusculus_homolog_chromosome "3"
mmusculus_homolog_chrom_start "157249916"
mmusculus_homolog_chrom_end "157249986"
mmusculus_homolog_canonical_transcript_protein "ENST00000384988"
mmusculus_homolog_subtype "Eutheria"
mmusculus_homolog_orthology_type "ortholog_one2one"
mmusculus_homolog_perc_id "79.0698"
mmusculus_homolog_perc_id_r1 "95.7747"
mmusculus_homolog_goc_score NA
mmusculus_homolog_wga_coverage "100"
mmusculus_homolog_orthology_confidence "1"
Get all human genes for Ensembl 113.
all_genes_113 <- getBM(
attributes = "ensembl_gene_id",
mart = ensembl
)
length(unique(all_genes_113$ensembl_gene_id))
[1] 86402
Get all homologues.
getBM(
attributes = wanted_attr,
filters = "ensembl_gene_id",
values = all_genes_113$ensembl_gene_id,
mart = ensembl
) -> human_mouse_homologues
dim(human_mouse_homologues)
[1] 92787 15
Check out the results!
head(human_mouse_homologues)
ensembl_gene_id mmusculus_homolog_ensembl_gene
1 ENSG00000000003 ENSMUSG00000067377
2 ENSG00000000005 ENSMUSG00000031250
3 ENSG00000000419 ENSMUSG00000078919
4 ENSG00000000457 ENSMUSG00000026584
5 ENSG00000000460 ENSMUSG00000041406
6 ENSG00000000938 ENSMUSG00000028874
mmusculus_homolog_associated_gene_name mmusculus_homolog_ensembl_peptide
1 Tspan6 ENSMUSP00000084838
2 Tnmd ENSMUSP00000033602
3 Dpm1 ENSMUSP00000118776
4 Scyl3 ENSMUSP00000027876
5 Firrm ENSMUSP00000095101
6 Fgr ENSMUSP00000030693
mmusculus_homolog_chromosome mmusculus_homolog_chrom_start
1 X 132791817
2 X 132751729
3 2 168050968
4 1 163756669
5 1 163773562
6 4 132701406
mmusculus_homolog_chrom_end mmusculus_homolog_canonical_transcript_protein
1 132799178 ENSP00000362111
2 132766326 ENSP00000362122
3 168072511 ENSP00000360644
4 163782695 ENSP00000356745
5 163822365 ENSP00000352276
6 132729221 ENSP00000363117
mmusculus_homolog_subtype mmusculus_homolog_orthology_type
1 Euarchontoglires ortholog_one2one
2 Euarchontoglires ortholog_one2one
3 Eutheria ortholog_one2one
4 Euarchontoglires ortholog_one2one
5 Euarchontoglires ortholog_one2one
6 Euarchontoglires ortholog_one2one
mmusculus_homolog_perc_id mmusculus_homolog_perc_id_r1
1 93.0612 93.0612
2 96.2145 96.2145
3 91.1538 91.1538
4 82.8488 77.5510
5 72.0985 66.3430
6 83.9319 85.8801
mmusculus_homolog_goc_score mmusculus_homolog_wga_coverage
1 100 100.00
2 100 100.00
3 100 100.00
4 100 100.00
5 0 99.21
6 100 100.00
mmusculus_homolog_orthology_confidence
1 1
2 1
3 1
4 1
5 1
6 1
sessionInfo()
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.64.0 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] KEGGREST_1.48.1 xfun_0.52 bslib_0.9.0
[4] httr2_1.2.1 processx_3.8.6 Biobase_2.68.0
[7] callr_3.7.6 vctrs_0.6.5 tools_4.5.0
[10] ps_1.9.1 generics_0.1.4 curl_6.4.0
[13] stats4_4.5.0 tibble_3.2.1 AnnotationDbi_1.70.0
[16] RSQLite_2.4.2 blob_1.2.4 pkgconfig_2.0.3
[19] dbplyr_2.5.0 S4Vectors_0.46.0 lifecycle_1.0.4
[22] GenomeInfoDbData_1.2.14 compiler_4.5.0 stringr_1.5.1
[25] git2r_0.36.2 Biostrings_2.76.0 progress_1.2.3
[28] getPass_0.2-4 httpuv_1.6.16 GenomeInfoDb_1.44.1
[31] htmltools_0.5.8.1 sass_0.4.10 yaml_2.3.10
[34] later_1.4.2 pillar_1.10.2 crayon_1.5.3
[37] jquerylib_0.1.4 whisker_0.4.1 cachem_1.1.0
[40] tidyselect_1.2.1 digest_0.6.37 stringi_1.8.7
[43] purrr_1.0.4 dplyr_1.1.4 rprojroot_2.0.4
[46] fastmap_1.2.0 cli_3.6.5 magrittr_2.0.3
[49] withr_3.0.2 filelock_1.0.3 prettyunits_1.2.0
[52] UCSC.utils_1.4.0 promises_1.3.2 rappdirs_0.3.3
[55] bit64_4.6.0-1 rmarkdown_2.29 XVector_0.48.0
[58] httr_1.4.7 bit_4.6.0 png_0.1-8
[61] hms_1.1.3 memoise_2.0.1 evaluate_1.0.3
[64] knitr_1.50 IRanges_2.42.0 BiocFileCache_2.16.1
[67] rlang_1.1.6 Rcpp_1.0.14 glue_1.8.0
[70] DBI_1.2.3 xml2_1.3.8 BiocGenerics_0.54.0
[73] rstudioapi_0.17.1 jsonlite_2.0.0 R6_2.6.1
[76] fs_1.6.6