Last updated: 2020-04-26
Checks: 7 0
Knit directory: Bgee/
This reproducible R Markdown analysis was created with workflowr (version 1.6.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200417)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version c287d01. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Untracked files:
Untracked: Drosophila_melanogaster_Bgee_14_1/
Untracked: analysis/.here
Untracked: genes_Drosophila_melanogaster.tsv
Untracked: release.tsv
Untracked: species_Bgee_14_1.tsv
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/sparql.Rmd
) and HTML (docs/sparql.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | c287d01 | SFonsecaCosta | 2020-04-26 | Update |
html | d99876e | SFonsecaCosta | 2020-04-22 | Build site. |
Rmd | 5ebe080 | SFonsecaCosta | 2020-04-22 | add links sparql |
html | ae29961 | SFonsecaCosta | 2020-04-22 | Build site. |
Rmd | 9907294 | SFonsecaCosta | 2020-04-22 | clean text |
html | 8d821e2 | SFonsecaCosta | 2020-04-20 | Build site. |
Rmd | 9073f83 | SFonsecaCosta | 2020-04-20 | add analysis |
In this section we will introduce the SPARQL endpoint from Bgee.
You are able to use the SPARQL endpoint from Bgee in R to retrieve information from the database.
library(SPARQL)
library(stringr)
library(data.table)
The latest version of the Bgee SPARQL endpoint is accessible through the URL address below. SPARQL is a semantic query language for databases. For further details, see the SPARQL documentation at https://www.w3.org/TR/2013/REC-sparql11-query-20130321/ . The Bgee data accessible through this SPARQL enpoint are structured by using the Gene expression (GenEx) semantic model and vocabulary that is fully described at https://biosoda.github.io/genex/ .
sparqlEndPoint <- "https://bgee.org/sparql"
Using the SPARQL endpoint from Bgee you are able to retrieve information about each species by specifying the respective NCBI taxon.
species_taxon <- "PREFIX up: <http://purl.uniprot.org/core/>
SELECT * {
?species a up:Taxon .
?species up:scientificName ?name .
?species up:rank up:Species .
}"
species_taxonTable <- unique(SPARQL(url=sparqlEndPoint, species_taxon)$results)
paste0("Number of the species present in Bgee database: ", nrow(species_taxonTable))
[1] "Number of the species present in Bgee database: 29"
For forward analysis, we recommend to clean the first row of the table.
species_taxonTable$species <- sub('<http://purl.uniprot.org/taxonomy/(\\d+).*', '\\1', species_taxonTable$species)
head(species_taxonTable)
species name
1 10090 Mus musculus
2 10116 Rattus norvegicus
3 10141 Cavia porcellus
4 13616 Monodelphis domestica
5 28377 Anolis carolinensis
6 6239 Caenorhabditis elegans
To show you how to query particular data from species, genes or anatomical entitites, in this section we will use information collected from the TopAnat analysis, so this means we will use Bos taurus as a species target.
Here, we will use the example of Bos taurus (cattle) with the developmental stage ‘prime adult stage’.
anatEnt_devStage <- "PREFIX up: <http://purl.uniprot.org/core/>
PREFIX genex: <http://purl.org/genex#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT DISTINCT ?anatName FROM <https://bgee.org/rdf_v14_1> {
?cond genex:hasAnatomicalEntity ?anatEntity .
?anatEntity rdfs:label ?anatName .
?cond genex:hasDevelopmentalStage ?stage .
?stage rdfs:label ?stageName .
?cond obo:RO_0002162 ?taxon .
?taxon up:commonName ?commonName .
FILTER ( LCASE(?commonName) = LCASE('cattle')).
FILTER ( CONTAINS(?stageName, 'prime adult stage'))
}"
anatEnt_devStageTable <- SPARQL(url=sparqlEndPoint, anatEnt_devStage)
print(paste0("Number of anatomical entities found: ", length(anatEnt_devStageTable$results)))
[1] "Number of anatomical entities found: 319"
Now using one of the statistical significant genes from TopAnat, you should be able to retrieve all anatomical entites in Bgee. For that you should specify in your query the target species and the target gene.
anatEnt_gene_species <- "PREFIX orth: <http://purl.org/net/orth#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX genex: <http://purl.org/genex#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX lscr: <http://purl.org/lscr#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT DISTINCT ?anatEntity ?anatName FROM <https://bgee.org/rdf_v14_1> {
values ?ensembl_gene { <http://rdf.ebi.ac.uk/resource/ensembl/ENSBTAG00000005333> }
?seq a orth:Gene .
?seq lscr:xrefEnsemblGene ?ensembl_gene.
?seq rdfs:label ?geneName .
?seq genex:isExpressedIn ?cond .
?cond genex:hasAnatomicalEntity ?anatEntity .
?anatEntity rdfs:label ?anatName .
?cond obo:RO_0002162 <http://purl.uniprot.org/taxonomy/9913> .
}"
anatEnt_gene_speciesTable <- SPARQL(url=sparqlEndPoint, anatEnt_gene_species)
print(paste0("Number of anatomical entities: ", length(anatEnt_gene_speciesTable$results$anatEntity)))
[1] "Number of anatomical entities: 13"
print(unique(anatEnt_gene_speciesTable$results$anatEntity))
[1] "<http://purl.obolibrary.org/obo/UBERON_0002048>"
[2] "<http://purl.obolibrary.org/obo/UBERON_0000955>"
[3] "<http://purl.obolibrary.org/obo/UBERON_0002000>"
[4] "<http://purl.obolibrary.org/obo/UBERON_0000451>"
[5] "<http://purl.obolibrary.org/obo/UBERON_0001295>"
[6] "<http://purl.obolibrary.org/obo/UBERON_0001134>"
[7] "<http://purl.obolibrary.org/obo/UBERON_0000082>"
[8] "<http://purl.obolibrary.org/obo/UBERON_0001155>"
[9] "<http://purl.obolibrary.org/obo/UBERON_0001401>"
[10] "<http://purl.obolibrary.org/obo/UBERON_0000948>"
[11] "<http://purl.obolibrary.org/obo/UBERON_0034908>"
[12] "<http://purl.obolibrary.org/obo/UBERON_0001111>"
[13] "<http://purl.obolibrary.org/obo/UBERON_0002106>"
In this section we will use the genes from our TopAnat analysis to target description and species.
Target the genes that have muscle as a term condition, from this verify if the gene “ENSBTAG00000014614” was detected. Note that this gene was statistically significant in the TopAnat analysis.
genes_muscles <- "PREFIX up: <http://purl.uniprot.org/core/>
PREFIX orth: <http://purl.org/net/orth#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?geneName ?geneId FROM <https://bgee.org/rdf_v14_1> {
?gene a orth:Gene .
?gene rdfs:label ?geneName .
?gene dcterms:identifier ?geneId .
?gene dcterms:description ?desc .
FILTER CONTAINS ( ?desc, 'muscle' )
}"
genes_musclesTable <- SPARQL(url=sparqlEndPoint, genes_muscles)
## gene just in Bos taurus
genes_musclesTable$results[genes_musclesTable$results$geneId %like% "ENSBTAG", ]
geneName geneId
28 PYGM ENSBTAG00000001032
29 MUSK ENSBTAG00000002744
30 MYH7B ENSBTAG00000003512
31 ACTC1 ENSBTAG00000005714
32 MYH2 ENSBTAG00000007090
33 MYH8 ENSBTAG00000009702
34 MYH7 ENSBTAG00000009703
35 ANKRD1 ENSBTAG00000011734
36 CAPZA3 ENSBTAG00000013207
37 CKM ENSBTAG00000013921
38 MBNL3 ENSBTAG00000014088
39 PERM1 ENSBTAG00000014540
40 SMPX ENSBTAG00000015204
41 MYLPF ENSBTAG00000021218
42 MURC ENSBTAG00000021992
43 MYH4 ENSBTAG00000037794
44 ACTA1 ENSBTAG00000046332
452 PFKM ENSBTAG00000000286
453 ATP2A2 ENSBTAG00000001398
454 MRAS ENSBTAG00000001497
455 PKM ENSBTAG00000001601
456 ATP5A1 ENSBTAG00000002507
457 MYH14 ENSBTAG00000002580
458 CAPZA2 ENSBTAG00000004072
459 CAPZB ENSBTAG00000004554
460 MBNL1 ENSBTAG00000004564
461 ENO3 ENSBTAG00000005534
462 ACYP2 ENSBTAG00000006852
463 GEM ENSBTAG00000007596
464 PHKG1 ENSBTAG00000008195
465 CNN1 ENSBTAG00000011207
466 PAMR1 ENSBTAG00000012630
467 ANKRD2 ENSBTAG00000012720
468 CAPZA1 ENSBTAG00000014295
469 MYLK ENSBTAG00000014567
470 ACTA2 ENSBTAG00000014614
471 COX7A1 ENSBTAG00000014878
472 CFL2 ENSBTAG00000015053
473 ACTG2 ENSBTAG00000015441
474 PHKA1 ENSBTAG00000015848
475 FABP3 ENSBTAG00000016819
476 MBNL2 ENSBTAG00000018313
477 MYH10 ENSBTAG00000021151
478 LMOD1 ENSBTAG00000021576
Verify if the geneId “ENSBTAG00000014614” with gene name ACTA2 is also present in other species.
gene_present_species <- "PREFIX up: <http://purl.uniprot.org/core/>
PREFIX orth: <http://purl.org/net/orth#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT ?name FROM <https://bgee.org/rdf_v14_1> {
?gene a orth:Gene .
?gene rdfs:label ?geneName .
?gene orth:organism ?organism . #orth v2
?organism obo:RO_0002162 ?taxon . #label: in taxon .
?taxon up:scientificName ?name .
FILTER ( UCASE(?geneName) = UCASE('ACTA2') )
}"
gene_present_speciesTable <- SPARQL(url=sparqlEndPoint, gene_present_species)
print(paste0("Number of species detected: ",length(gene_present_speciesTable$results)))
[1] "Number of species detected: 18"
t(gene_present_speciesTable$results)
[,1]
name "Danio rerio"
name.1 "Homo sapiens"
name.2 "Mus musculus"
name.3 "Rattus norvegicus"
name.4 "Sus scrofa"
name.5 "Xenopus tropicalis"
name.6 "Anolis carolinensis"
name.7 "Bos taurus"
name.8 "Canis lupus familiaris"
name.9 "Cavia porcellus"
name.10 "Equus caballus"
name.11 "Erinaceus europaeus"
name.12 "Felis catus"
name.13 "Ornithorhynchus anatinus"
name.14 "Oryctolagus cuniculus"
name.15 "Gorilla gorilla"
name.16 "Macaca mulatta"
name.17 "Monodelphis domestica"
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.12.8 stringr_1.4.0 SPARQL_1.16 RCurl_1.98-1.1
[5] XML_3.99-0.3 workflowr_1.6.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 knitr_1.28 whisker_0.4 magrittr_1.5
[5] R6_2.4.1 rlang_0.4.5 tools_3.6.0 xfun_0.13
[9] git2r_0.26.1 htmltools_0.4.0 yaml_2.2.1 digest_0.6.25
[13] rprojroot_1.3-2 later_1.0.0 promises_1.1.0 fs_1.4.1
[17] bitops_1.0-6 glue_1.4.0 evaluate_0.14 rmarkdown_2.1
[21] stringi_1.4.6 compiler_3.6.0 backports_1.1.6 httpuv_1.5.2