Last updated: 2020-04-26

Checks: 7 0

Knit directory: Bgee/

This reproducible R Markdown analysis was created with workflowr (version 1.6.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200417) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version c287d01. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory

Untracked files:
    Untracked:  Drosophila_melanogaster_Bgee_14_1/
    Untracked:  analysis/.here
    Untracked:  genes_Drosophila_melanogaster.tsv
    Untracked:  release.tsv
    Untracked:  species_Bgee_14_1.tsv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/sparql.Rmd) and HTML (docs/sparql.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd c287d01 SFonsecaCosta 2020-04-26 Update
html d99876e SFonsecaCosta 2020-04-22 Build site.
Rmd 5ebe080 SFonsecaCosta 2020-04-22 add links sparql
html ae29961 SFonsecaCosta 2020-04-22 Build site.
Rmd 9907294 SFonsecaCosta 2020-04-22 clean text
html 8d821e2 SFonsecaCosta 2020-04-20 Build site.
Rmd 9073f83 SFonsecaCosta 2020-04-20 add analysis

In this section we will introduce the SPARQL endpoint from Bgee.

You are able to use the SPARQL endpoint from Bgee in R to retrieve information from the database.

Load the packages

library(SPARQL)
library(stringr)
library(data.table)

SPARQL endpoint

The latest version of the Bgee SPARQL endpoint is accessible through the URL address below. SPARQL is a semantic query language for databases. For further details, see the SPARQL documentation at https://www.w3.org/TR/2013/REC-sparql11-query-20130321/ . The Bgee data accessible through this SPARQL enpoint are structured by using the Gene expression (GenEx) semantic model and vocabulary that is fully described at https://biosoda.github.io/genex/ .

sparqlEndPoint <- "https://bgee.org/sparql"

Retrieve species

Using the SPARQL endpoint from Bgee you are able to retrieve information about each species by specifying the respective NCBI taxon.

species_taxon <- "PREFIX up: <http://purl.uniprot.org/core/>
SELECT * {
    ?species a up:Taxon .
    ?species up:scientificName ?name .
    ?species up:rank up:Species .
}"

species_taxonTable <- unique(SPARQL(url=sparqlEndPoint, species_taxon)$results)

paste0("Number of the species present in Bgee database: ", nrow(species_taxonTable))
[1] "Number of the species present in Bgee database: 29"

For forward analysis, we recommend to clean the first row of the table.

species_taxonTable$species <- sub('<http://purl.uniprot.org/taxonomy/(\\d+).*', '\\1', species_taxonTable$species)
head(species_taxonTable)
  species                   name
1   10090           Mus musculus
2   10116      Rattus norvegicus
3   10141        Cavia porcellus
4   13616  Monodelphis domestica
5   28377    Anolis carolinensis
6    6239 Caenorhabditis elegans

To show you how to query particular data from species, genes or anatomical entitites, in this section we will use information collected from the TopAnat analysis, so this means we will use Bos taurus as a species target.

Retrieve anatomical entities

Anatomical entities from a particular species and a developmental stage

Here, we will use the example of Bos taurus (cattle) with the developmental stage ‘prime adult stage’.

anatEnt_devStage <- "PREFIX up: <http://purl.uniprot.org/core/>
PREFIX genex: <http://purl.org/genex#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT DISTINCT ?anatName FROM <https://bgee.org/rdf_v14_1> {
    ?cond genex:hasAnatomicalEntity ?anatEntity .
    ?anatEntity rdfs:label ?anatName .
    ?cond genex:hasDevelopmentalStage ?stage .
    ?stage rdfs:label ?stageName .
    ?cond obo:RO_0002162 ?taxon .
    ?taxon up:commonName ?commonName .
    FILTER ( LCASE(?commonName) = LCASE('cattle')).
    FILTER ( CONTAINS(?stageName, 'prime adult stage'))
}"

anatEnt_devStageTable <- SPARQL(url=sparqlEndPoint, anatEnt_devStage)
print(paste0("Number of anatomical entities found: ", length(anatEnt_devStageTable$results)))
[1] "Number of anatomical entities found: 319"

Anatomic entities where a gene is expressed

Now using one of the statistical significant genes from TopAnat, you should be able to retrieve all anatomical entites in Bgee. For that you should specify in your query the target species and the target gene.

anatEnt_gene_species <- "PREFIX orth: <http://purl.org/net/orth#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX genex: <http://purl.org/genex#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX lscr: <http://purl.org/lscr#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT DISTINCT ?anatEntity ?anatName  FROM <https://bgee.org/rdf_v14_1> {
values ?ensembl_gene { <http://rdf.ebi.ac.uk/resource/ensembl/ENSBTAG00000005333> }  
   
    ?seq a orth:Gene .
    ?seq lscr:xrefEnsemblGene  ?ensembl_gene.
    ?seq rdfs:label ?geneName .
    ?seq genex:isExpressedIn ?cond .
    ?cond genex:hasAnatomicalEntity ?anatEntity .
    ?anatEntity rdfs:label ?anatName .
    ?cond obo:RO_0002162 <http://purl.uniprot.org/taxonomy/9913> . 
}"

anatEnt_gene_speciesTable <- SPARQL(url=sparqlEndPoint, anatEnt_gene_species)
print(paste0("Number of anatomical entities: ", length(anatEnt_gene_speciesTable$results$anatEntity)))
[1] "Number of anatomical entities: 13"
print(unique(anatEnt_gene_speciesTable$results$anatEntity))
 [1] "<http://purl.obolibrary.org/obo/UBERON_0002048>"
 [2] "<http://purl.obolibrary.org/obo/UBERON_0000955>"
 [3] "<http://purl.obolibrary.org/obo/UBERON_0002000>"
 [4] "<http://purl.obolibrary.org/obo/UBERON_0000451>"
 [5] "<http://purl.obolibrary.org/obo/UBERON_0001295>"
 [6] "<http://purl.obolibrary.org/obo/UBERON_0001134>"
 [7] "<http://purl.obolibrary.org/obo/UBERON_0000082>"
 [8] "<http://purl.obolibrary.org/obo/UBERON_0001155>"
 [9] "<http://purl.obolibrary.org/obo/UBERON_0001401>"
[10] "<http://purl.obolibrary.org/obo/UBERON_0000948>"
[11] "<http://purl.obolibrary.org/obo/UBERON_0034908>"
[12] "<http://purl.obolibrary.org/obo/UBERON_0001111>"
[13] "<http://purl.obolibrary.org/obo/UBERON_0002106>"

Target genes

In this section we will use the genes from our TopAnat analysis to target description and species.

Target the genes that have muscle in the term description.

Target the genes that have muscle as a term condition, from this verify if the gene “ENSBTAG00000014614” was detected. Note that this gene was statistically significant in the TopAnat analysis.

genes_muscles <- "PREFIX up: <http://purl.uniprot.org/core/>
PREFIX orth: <http://purl.org/net/orth#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?geneName ?geneId FROM <https://bgee.org/rdf_v14_1> {
    ?gene a orth:Gene .
    ?gene rdfs:label ?geneName .
    ?gene dcterms:identifier ?geneId .
    ?gene dcterms:description ?desc .
    FILTER CONTAINS ( ?desc, 'muscle' )
}"

genes_musclesTable <- SPARQL(url=sparqlEndPoint, genes_muscles)

## gene just in Bos taurus
genes_musclesTable$results[genes_musclesTable$results$geneId %like% "ENSBTAG", ]
    geneName             geneId
28      PYGM ENSBTAG00000001032
29      MUSK ENSBTAG00000002744
30     MYH7B ENSBTAG00000003512
31     ACTC1 ENSBTAG00000005714
32      MYH2 ENSBTAG00000007090
33      MYH8 ENSBTAG00000009702
34      MYH7 ENSBTAG00000009703
35    ANKRD1 ENSBTAG00000011734
36    CAPZA3 ENSBTAG00000013207
37       CKM ENSBTAG00000013921
38     MBNL3 ENSBTAG00000014088
39     PERM1 ENSBTAG00000014540
40      SMPX ENSBTAG00000015204
41     MYLPF ENSBTAG00000021218
42      MURC ENSBTAG00000021992
43      MYH4 ENSBTAG00000037794
44     ACTA1 ENSBTAG00000046332
452     PFKM ENSBTAG00000000286
453   ATP2A2 ENSBTAG00000001398
454     MRAS ENSBTAG00000001497
455      PKM ENSBTAG00000001601
456   ATP5A1 ENSBTAG00000002507
457    MYH14 ENSBTAG00000002580
458   CAPZA2 ENSBTAG00000004072
459    CAPZB ENSBTAG00000004554
460    MBNL1 ENSBTAG00000004564
461     ENO3 ENSBTAG00000005534
462    ACYP2 ENSBTAG00000006852
463      GEM ENSBTAG00000007596
464    PHKG1 ENSBTAG00000008195
465     CNN1 ENSBTAG00000011207
466    PAMR1 ENSBTAG00000012630
467   ANKRD2 ENSBTAG00000012720
468   CAPZA1 ENSBTAG00000014295
469     MYLK ENSBTAG00000014567
470    ACTA2 ENSBTAG00000014614
471   COX7A1 ENSBTAG00000014878
472     CFL2 ENSBTAG00000015053
473    ACTG2 ENSBTAG00000015441
474    PHKA1 ENSBTAG00000015848
475    FABP3 ENSBTAG00000016819
476    MBNL2 ENSBTAG00000018313
477    MYH10 ENSBTAG00000021151
478    LMOD1 ENSBTAG00000021576

Target species where gene is present

Verify if the geneId “ENSBTAG00000014614” with gene name ACTA2 is also present in other species.

gene_present_species <- "PREFIX up: <http://purl.uniprot.org/core/>
PREFIX orth: <http://purl.org/net/orth#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT ?name FROM <https://bgee.org/rdf_v14_1> {
    ?gene a orth:Gene .
    ?gene rdfs:label ?geneName .
    ?gene orth:organism ?organism . #orth v2
    ?organism obo:RO_0002162 ?taxon . #label: in taxon .
    ?taxon up:scientificName ?name .
    FILTER ( UCASE(?geneName) = UCASE('ACTA2') )
}"

gene_present_speciesTable <- SPARQL(url=sparqlEndPoint, gene_present_species)
print(paste0("Number of species detected: ",length(gene_present_speciesTable$results)))
[1] "Number of species detected: 18"
t(gene_present_speciesTable$results)
        [,1]                      
name    "Danio rerio"             
name.1  "Homo sapiens"            
name.2  "Mus musculus"            
name.3  "Rattus norvegicus"       
name.4  "Sus scrofa"              
name.5  "Xenopus tropicalis"      
name.6  "Anolis carolinensis"     
name.7  "Bos taurus"              
name.8  "Canis lupus familiaris"  
name.9  "Cavia porcellus"         
name.10 "Equus caballus"          
name.11 "Erinaceus europaeus"     
name.12 "Felis catus"             
name.13 "Ornithorhynchus anatinus"
name.14 "Oryctolagus cuniculus"   
name.15 "Gorilla gorilla"         
name.16 "Macaca mulatta"          
name.17 "Monodelphis domestica"   

sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.8 stringr_1.4.0     SPARQL_1.16       RCurl_1.98-1.1   
[5] XML_3.99-0.3      workflowr_1.6.1  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6    knitr_1.28      whisker_0.4     magrittr_1.5   
 [5] R6_2.4.1        rlang_0.4.5     tools_3.6.0     xfun_0.13      
 [9] git2r_0.26.1    htmltools_0.4.0 yaml_2.2.1      digest_0.6.25  
[13] rprojroot_1.3-2 later_1.0.0     promises_1.1.0  fs_1.4.1       
[17] bitops_1.0-6    glue_1.4.0      evaluate_0.14   rmarkdown_2.1  
[21] stringi_1.4.6   compiler_3.6.0  backports_1.1.6 httpuv_1.5.2