Last updated: 2020-04-26
Checks: 7 0
Knit directory: Bgee/
This reproducible R Markdown analysis was created with workflowr (version 1.6.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200417)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version c287d01. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Untracked files:
Untracked: Drosophila_melanogaster_Bgee_14_1/
Untracked: analysis/.here
Untracked: genes_Drosophila_melanogaster.tsv
Untracked: release.tsv
Untracked: species_Bgee_14_1.tsv
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/topanat.Rmd
) and HTML (docs/topanat.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | c287d01 | SFonsecaCosta | 2020-04-26 | Update |
html | ae29961 | SFonsecaCosta | 2020-04-22 | Build site. |
Rmd | 9907294 | SFonsecaCosta | 2020-04-22 | clean text |
html | 8d9e88f | SFonsecaCosta | 2020-04-20 | Build site. |
Rmd | f18beb5 | SFonsecaCosta | 2020-04-20 | Minor things |
html | 8d821e2 | SFonsecaCosta | 2020-04-20 | Build site. |
Rmd | 9073f83 | SFonsecaCosta | 2020-04-20 | add analysis |
library(BgeeDB)
library(biomaRt)
library(ggplot2)
library(here)
The TopAnat allows you to make GO-like enrichment of anatomical terms, mapped to genes by expression patterns.
In this section we will use 2 different species, Drosophila melanogaster and Bos taurus to run the TopAnat analysis.
Initially we will use the output results from our previous analysis with Drosophila melanogaster to run TopAnat, after that we will run the TopAnat tool with data retrieved from biomart that match specific phenotype from a non-model organism.
In order to start the analysis in TopAnat you have to create a Bgee object, as demonstrated before in the download data section.
Note that in this example we do not specify any data type in the argument dataType, which means we will use all data types from Bgee.
targetDrosophila <- Bgee$new(species = "Drosophila_melanogaster")
NOTE: You did not specify any data type. The argument dataType will be set to c("rna_seq","affymetrix","est","in_situ") for the next steps.
Querying Bgee to get release information...
NOTE: the file describing Bgee species information for release 14_1 was found in the download directory /Users/sfonseca1/Bgee. Data will not be redownloaded.
API key built: 342617eeb6e728567eeaf07855efc4fa274e6ad21dde329ecce7bd3668c4efa9f23c364808e45b7b772594a592dfe6855ff75027e7983a848ff43a25416ebe6f
After that you should download the data for the reference species in order to allows you to perform the TopAnat analysis. The function loadTopAnatData() loads multiple files for the target species, as:
an organ relationships file
an organ names file
a gene to organs mapping file
drosophilaTopAnatData <- loadTopAnatData(targetDrosophila, confidence = "silver", stage = "UBERON:0000092")
Building URLs to retrieve organ relationships from Bgee.........
URL successfully built (https://r.bgee.org/bgee14_1/?page=r_package&action=get_anat_entity_relations&display_type=tsv&species_list=7227&attr_list=SOURCE_ID&attr_list=TARGET_ID&api_key=342617eeb6e728567eeaf07855efc4fa274e6ad21dde329ecce7bd3668c4efa9f23c364808e45b7b772594a592dfe6855ff75027e7983a848ff43a25416ebe6f&source=BgeeDB_R_package&source_version=2.12.1)
Submitting URL to Bgee webservice (can be long)
Got results from Bgee webservice. Files are written in "/Users/sfonseca1/Bgee/Drosophila_melanogaster_Bgee_14_1"
Building URLs to retrieve organ names from Bgee.................
URL successfully built (https://r.bgee.org/bgee14_1/?page=r_package&action=get_anat_entities&display_type=tsv&species_list=7227&attr_list=ID&attr_list=NAME&api_key=342617eeb6e728567eeaf07855efc4fa274e6ad21dde329ecce7bd3668c4efa9f23c364808e45b7b772594a592dfe6855ff75027e7983a848ff43a25416ebe6f&source=BgeeDB_R_package&source_version=2.12.1)
Submitting URL to Bgee webservice (can be long)
Got results from Bgee webservice. Files are written in "/Users/sfonseca1/Bgee/Drosophila_melanogaster_Bgee_14_1"
Building URLs to retrieve mapping of gene to organs from Bgee...
URL successfully built (https://r.bgee.org/bgee14_1/?page=r_package&action=get_expression_calls&display_type=tsv&species_list=7227&attr_list=GENE_ID&attr_list=ANAT_ENTITY_ID&api_key=342617eeb6e728567eeaf07855efc4fa274e6ad21dde329ecce7bd3668c4efa9f23c364808e45b7b772594a592dfe6855ff75027e7983a848ff43a25416ebe6f&source=BgeeDB_R_package&source_version=2.12.1&data_qual=SILVER&stage_id=UBERON:0000092)
Submitting URL to Bgee webservice (can be long)
Got results from Bgee webservice. Files are written in "/Users/sfonseca1/Bgee/Drosophila_melanogaster_Bgee_14_1"
Parsing the results.............................................
Adding BGEE:0 as unique root of all terms of the ontology.......
Done.
Note that the loadTopAnatData() function allows you to retrieve data from a specific developmental stage by using the stage
argument and allow you to specify the confidence of the quality of expression calls by using the argument confidence
. In this particular case we use all data and the developmental stage UBERON:0000092.
The next step of the analysis relies on topGO package, for this a list of genes in the foreground and in the background are necessary.
Note that in this case the foreground genes will be the genes detected with high variability between the two samples from the experiment SRP002072 analysed in the processing data section.
The initial step is to retrieve information (ensembl gene id) for the target species by using biomart.
ensemblDM <- useMart("ENSEMBL_MART_ENSEMBL", dataset="dmelanogaster_gene_ensembl", host="mar2016.archive.ensembl.org")
# background genes
universeDM <- getBM(attributes=c("ensembl_gene_id"),mart=ensemblDM)
Cache found
# foreground genes from the analysis
foregroundDM <- read.table(file = here("genes_Drosophila_melanogaster.tsv"), header=F, sep="\t")
foregroundDM <- unlist(foregroundDM, use.names = FALSE)
We will use this list of genes to create our geneList object.
geneListDM <- factor(as.integer(unique(universeDM$ensembl_gene_id) %in% foregroundDM))
names(geneListDM) <- unique(universeDM$ensembl_gene_id)
summary(geneListDM)
0 1
17533 26
Create the object with data loaded and the gene list with foreground genes.
myTopAnatObject_DM <- topAnat(drosophilaTopAnatData, geneListDM)
Checking topAnatData object.............
Checking gene list......................
WARNING: Some genes in your gene list have no expression data in Bgee, and will not be included in the analysis. 17478 genes in background will be kept.
Building most specific Ontology terms... ( 461 Ontology terms found. )
Building DAG topology................... ( 1048 Ontology terms and 1964 relations. )
Annotating nodes (Can be long).......... ( 17478 genes annotated to the Ontology terms. )
After creating the final object, the enrichment analysis is performed by using the statistical tests implemented in topGO package.
results_DM <- runTest(myTopAnatObject_DM, algorithm = 'classic', statistic = 'fisher')
-- Classic Algorithm --
the algorithm is scoring 181 nontrivial nodes
parameters:
test statistic: fisher
The results can be retrieved by specifying the desired cutoff using the cutoff
argument. You can also export the table by ordering the desire column specifying that in ordering
argument.
tableOver <- makeTable(drosophilaTopAnatData, myTopAnatObject_DM, results_DM)
Building the results table for the 406 significant terms at FDR threshold of 1...
Ordering results by pValue column in increasing order...
Done
## look
head(tableOver,4)
organId organName annotated significant
UBERON:0000992 UBERON:0000992 female gonad 12885 26
UBERON:0003134 UBERON:0003134 female reproductive organ 12885 26
UBERON:0000474 UBERON:0000474 female reproductive system 13078 26
UBERON:0003100 UBERON:0003100 female organism 13078 26
expected foldEnrichment pValue FDR
UBERON:0000992 19.17 1.356286 0.0003585320 0.04185368
UBERON:0003134 19.17 1.356286 0.0003585320 0.04185368
UBERON:0000474 19.45 1.336761 0.0005279216 0.04185368
UBERON:0003100 19.45 1.336761 0.0005279216 0.04185368
In this example we will use a non-model organism Bos taurus (cattle) to run the TopAnat analysis.
In order to start the analysis in TopAnat you have to create a new Bgee object, as demonstrated before.
Note that in this example we do not specify any data type in the argument dataType, as in the previous example, which means that we will use all data types from Bgee once again.
targetBT <- Bgee$new(species = "Bos_taurus")
NOTE: You did not specify any data type. The argument dataType will be set to c("rna_seq","affymetrix","est","in_situ") for the next steps.
Querying Bgee to get release information...
NOTE: the file describing Bgee species information for release 14_1 was found in the download directory /Users/sfonseca1/Bgee. Data will not be redownloaded.
API key built: 342617eeb6e728567eeaf07855efc4fa274e6ad21dde329ecce7bd3668c4efa9f23c364808e45b7b772594a592dfe6855ff75027e7983a848ff43a25416ebe6f
After that you should download the data for the reference species in order to allows you to perform the TopAnat analysis. Note that in this case we will not specify any developmental stage or confidence in the calls.
bostaurusTopAnatData <- loadTopAnatData(targetBT)
Building URLs to retrieve organ relationships from Bgee.........
URL successfully built (https://r.bgee.org/bgee14_1/?page=r_package&action=get_anat_entity_relations&display_type=tsv&species_list=9913&attr_list=SOURCE_ID&attr_list=TARGET_ID&api_key=342617eeb6e728567eeaf07855efc4fa274e6ad21dde329ecce7bd3668c4efa9f23c364808e45b7b772594a592dfe6855ff75027e7983a848ff43a25416ebe6f&source=BgeeDB_R_package&source_version=2.12.1)
Submitting URL to Bgee webservice (can be long)
Got results from Bgee webservice. Files are written in "/Users/sfonseca1/Bgee/Bos_taurus_Bgee_14_1"
Building URLs to retrieve organ names from Bgee.................
URL successfully built (https://r.bgee.org/bgee14_1/?page=r_package&action=get_anat_entities&display_type=tsv&species_list=9913&attr_list=ID&attr_list=NAME&api_key=342617eeb6e728567eeaf07855efc4fa274e6ad21dde329ecce7bd3668c4efa9f23c364808e45b7b772594a592dfe6855ff75027e7983a848ff43a25416ebe6f&source=BgeeDB_R_package&source_version=2.12.1)
Submitting URL to Bgee webservice (can be long)
Got results from Bgee webservice. Files are written in "/Users/sfonseca1/Bgee/Bos_taurus_Bgee_14_1"
Building URLs to retrieve mapping of gene to organs from Bgee...
URL successfully built (https://r.bgee.org/bgee14_1/?page=r_package&action=get_expression_calls&display_type=tsv&species_list=9913&attr_list=GENE_ID&attr_list=ANAT_ENTITY_ID&api_key=342617eeb6e728567eeaf07855efc4fa274e6ad21dde329ecce7bd3668c4efa9f23c364808e45b7b772594a592dfe6855ff75027e7983a848ff43a25416ebe6f&source=BgeeDB_R_package&source_version=2.12.1&data_qual=SILVER)
Submitting URL to Bgee webservice (can be long)
Got results from Bgee webservice. Files are written in "/Users/sfonseca1/Bgee/Bos_taurus_Bgee_14_1"
Parsing the results.............................................
Adding BGEE:0 as unique root of all terms of the ontology.......
Done.
In the next step we will perform the TopAnat analysis with topGO, for this a list of genes in the foreground and in the background are necessary.
In this example we will retrieve genes enriched for expression in muscle.
The initial step is to retrieve information for the target species by using biomart.
ensembl_BT <- useMart("ENSEMBL_MART_ENSEMBL", dataset="btaurus_gene_ensembl", host="mar2016.archive.ensembl.org")
# background genes
universe_BT <- getBM(attributes=c("ensembl_gene_id","phenotype_description"),mart=ensembl_BT)
Cache found
# select phenotypes related to muscle
phenotypes <- grep("musc", unique(universe_BT$phenotype_description), value=T)
# Foreground genes are those with an annotated phenotype related to muscle.
myGenes <- unique(universe_BT$ensembl_gene_id[universe_BT$phenotype_description %in% phenotypes])
# list vector with 0 and 1
geneList_BT <- factor(as.integer(unique(universe_BT$ensembl_gene_id) %in% myGenes))
names(geneList_BT) <- unique(universe_BT$ensembl_gene_id)
summary(geneList_BT)
0 1
24609 7
Note that you have just 7 genes in the foreground that match ‘musc’ keyword when you extract uniques with getBM()
function.
If we do a search in UniProtKB/Swiss-Prot entries with keyword ‘muscle’ we will found the following genes that can be part of our foreground:
phenotypes <- c("ENSBTAG00000008394", "ENSBTAG00000006030", "ENSBTAG00000015644", "ENSBTAG00000011424", "ENSBTAG00000010799", "ENSBTAG00000014614", "ENSBTAG00000011869", "ENSBTAG00000045757", "ENSBTAG00000026266", "ENSBTAG00000022158", "ENSBTAG00000006419", "ENSBTAG00000030425", "ENSBTAG00000009707", "ENSBTAG00000009749", "ENSBTAG00000033217", "ENSBTAG00000007196", "ENSBTAG00000021218", "ENSBTAG00000005333", "ENSBTAG00000006424", "ENSBTAG00000005353", "ENSBTAG00000021992", "ENSBTAG00000026972", "ENSBTAG00000005970", "ENSBTAG00000046332", "ENSBTAG00000018285", "ENSBTAG00000005714", "ENSBTAG00000016024", "ENSBTAG00000017992", "ENSBTAG00000014003", "ENSBTAG00000018204", "ENSBTAG00000019159")
We will use this list of genes to create our geneList object.
geneList_BT <- factor(as.integer(unique(universe_BT$ensembl_gene_id) %in% phenotypes))
names(geneList_BT) <- unique(universe_BT$ensembl_gene_id)
summary(geneList_BT)
0 1
24585 31
Create the object with data loaded and the gene list with foreground genes (phenotypes from the search in UniProtKB/Swiss-Prot).
myTopAnatObject_BT <- topAnat(bostaurusTopAnatData, geneList_BT)
Checking topAnatData object.............
Checking gene list......................
WARNING: Some genes in your gene list have no expression data in Bgee, and will not be included in the analysis. 22645 genes in background will be kept.
Building most specific Ontology terms... ( 19 Ontology terms found. )
Building DAG topology................... ( 172 Ontology terms and 260 relations. )
Annotating nodes (Can be long).......... ( 22645 genes annotated to the Ontology terms. )
After creating the final object, the enrichment analysis is performed by using the statistical tests implemented in topGO package. Note that in this example we will use weight in the argument algorithm
.
results_BT <- runTest(myTopAnatObject_BT, algorithm = 'weight', statistic = 'fisher')
-- Weight Algorithm --
The algorithm is scoring 172 nontrivial nodes
parameters:
test statistic: fisher : ratio
Level 19: 1 nodes to be scored.
Level 18: 1 nodes to be scored.
Level 17: 1 nodes to be scored.
Level 16: 2 nodes to be scored.
Level 15: 4 nodes to be scored.
Level 14: 7 nodes to be scored.
Level 13: 14 nodes to be scored.
Level 12: 16 nodes to be scored.
Level 11: 12 nodes to be scored.
Level 10: 17 nodes to be scored.
Level 9: 17 nodes to be scored.
Level 8: 22 nodes to be scored.
Level 7: 18 nodes to be scored.
Level 6: 12 nodes to be scored.
Level 5: 9 nodes to be scored.
Level 4: 10 nodes to be scored.
Level 3: 4 nodes to be scored.
Level 2: 4 nodes to be scored.
Level 1: 1 nodes to be scored.
The results can be retrieved by specifying the desired cutoff using the cutoff
argument. You can also export the table by ordering the desire column specifying that in ordering
argument.
tableOver <- makeTable(bostaurusTopAnatData, myTopAnatObject_BT, results_BT)
Building the results table for the 172 significant terms at FDR threshold of 1...
Ordering results by pValue column in increasing order...
Done
grph <- ggplot(tableOver, aes(x = reorder(organId, FDR), y = FDR))
grph <- grph + geom_bar(stat="identity", color='black',fill='gray')
grph <- grph + theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank()) + xlab("Anatomic entities") + ylab("FDR") + ggtitle("Results - enrichment test")
grph
Version | Author | Date |
---|---|---|
8d821e2 | SFonsecaCosta | 2020-04-20 |
tableOver <- makeTable(bostaurusTopAnatData, myTopAnatObject_BT, results_BT, cutoff = 0.01, ordering = -8)
Building the results table for the 5 significant terms at FDR threshold of 0.01...
Ordering results by FDR column in decreasing order...
Done
targetUberon <- c("UBERON:0001134", "UBERON:0001111", "UBERON:0001401", "UBERON:0002000", "UBERON:0034908")
termStat(myTopAnatObject_BT, targetUberon)
Annotated Significant Expected
UBERON:0001134 16812 31 23.01
UBERON:0001111 14106 30 19.31
UBERON:0001401 11858 28 16.23
UBERON:0002000 14178 31 19.41
UBERON:0034908 14475 31 19.82
In order to use some target gene in the next section (Sparql endpoint), we should retrieve the ensembl IDs for one of the anatomical entities that are significant.
annotated <- genesInTerm(myTopAnatObject_BT, targetUberon)[["UBERON:0001134"]]
annotated[annotated %in% sigGenes(myTopAnatObject_BT)]
[1] "ENSBTAG00000005333" "ENSBTAG00000005353" "ENSBTAG00000005714"
[4] "ENSBTAG00000005970" "ENSBTAG00000006030" "ENSBTAG00000006419"
[7] "ENSBTAG00000006424" "ENSBTAG00000007196" "ENSBTAG00000008394"
[10] "ENSBTAG00000009707" "ENSBTAG00000009749" "ENSBTAG00000010799"
[13] "ENSBTAG00000011424" "ENSBTAG00000011869" "ENSBTAG00000014003"
[16] "ENSBTAG00000014614" "ENSBTAG00000015644" "ENSBTAG00000016024"
[19] "ENSBTAG00000017992" "ENSBTAG00000018204" "ENSBTAG00000018285"
[22] "ENSBTAG00000019159" "ENSBTAG00000021218" "ENSBTAG00000021992"
[25] "ENSBTAG00000022158" "ENSBTAG00000026266" "ENSBTAG00000026972"
[28] "ENSBTAG00000030425" "ENSBTAG00000033217" "ENSBTAG00000045757"
[31] "ENSBTAG00000046332"
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] here_0.1 ggplot2_3.3.0 biomaRt_2.42.1
[4] BgeeDB_2.12.1 tidyr_1.0.2 topGO_2.38.1
[7] SparseM_1.78 GO.db_3.10.0 AnnotationDbi_1.48.0
[10] IRanges_2.20.2 S4Vectors_0.24.4 Biobase_2.46.0
[13] graph_1.64.0 BiocGenerics_0.32.0 workflowr_1.6.1
loaded via a namespace (and not attached):
[1] httr_1.4.1 bit64_0.9-7 assertthat_0.2.1
[4] askpass_1.1 BiocFileCache_1.10.2 blob_1.2.1
[7] yaml_2.2.1 progress_1.2.2 pillar_1.4.3
[10] RSQLite_2.2.0 backports_1.1.6 lattice_0.20-41
[13] glue_1.4.0 digest_0.6.25 promises_1.1.0
[16] colorspace_1.4-1 htmltools_0.4.0 httpuv_1.5.2
[19] XML_3.99-0.3 pkgconfig_2.0.3 purrr_0.3.3
[22] scales_1.1.0 whisker_0.4 later_1.0.0
[25] git2r_0.26.1 tibble_3.0.0 openssl_1.4.1
[28] farver_2.0.3 ellipsis_0.3.0 withr_2.1.2
[31] cli_2.0.2 magrittr_1.5 crayon_1.3.4
[34] memoise_1.1.0 evaluate_0.14 fs_1.4.1
[37] fansi_0.4.1 tools_3.6.0 data.table_1.12.8
[40] prettyunits_1.1.1 hms_0.5.3 lifecycle_0.2.0
[43] matrixStats_0.56.0 stringr_1.4.0 munsell_0.5.0
[46] compiler_3.6.0 rlang_0.4.5 grid_3.6.0
[49] RCurl_1.98-1.1 rappdirs_0.3.1 labeling_0.3
[52] bitops_1.0-6 rmarkdown_2.1 gtable_0.3.0
[55] DBI_1.1.0 curl_4.3 R6_2.4.1
[58] knitr_1.28 dplyr_0.8.5 bit_1.1-15.2
[61] rprojroot_1.3-2 stringi_1.4.6 Rcpp_1.0.4.6
[64] vctrs_0.2.4 dbplyr_1.4.2 tidyselect_1.0.0
[67] xfun_0.13