Chapter 9: Analyses Using Disease Ontologies

doi:10.1371/journal.pcbi.1002827

Figure 1.

An overview of the process to calculate enrichment of GO categories.

The steps usually followed are: (1) Get annotations for each gene in reference set and the set of interest. (2) Count the occurrence (n) of each GO term in the annotations of the genes comprising the set of interest. (3) Count the occurrence (m) of that same GO term in the annotations of the reference set. (4) Assess how “surprising” is it to find n, given m, M and N.

More »

Expand

Figure 2.

Workflow schematic of enrichment analysis.

If the input set has only textual annotations, we first run the Annotator service to create ontology-term annotations. The annotation counts in the input set are first aggregated along the ontology hierarchy and then compared with a background set for a statistically significant difference in the frequency of each ontology term. If a significant difference in the term frequency is found, that term is called “enriched” in the input set of entities. The results of the analysis are returned either as a tag-cloud, a graph, or as an XML output that users can process as required.

More »

Expand

Figure 3.

Tag cloud output: An example for the annotations of grants from FY1981 using SNOMEDCT.

Blue denotes low-frequency terms and red denotes highly frequent terms. Many concepts, such as “neoplasm of digestive tract”, occur at high frequencies in most years, possibly denoting the constant focus on cancer research. An appropriate background term frequency distribution is necessary to determine significance of the high frequency.

More »

Expand

Figure 4.

The figure shows a visualization generated using the GO TermFinder tool.

The GO graph layout shows the significantly enriched GO terms in the annotations of the analyzed gene set. The color of the nodes is an indication of their Bonferroni corrected P-value (orange < = 1e-10; yellow 1e-10 to 1e-8; green 1e-8 to 1e-6; cyan 1e-6 to 1e-4; blue 1e-4 to 1e-2; tan >0.01).

More »

Expand

Figure 5.

Workflow for generating background annotation sets for enrichment analysis: We obtain a set of PubMed articles from manually curated GO annotations, which we process using the NCBO Annotator service.

More »

Expand

Figure 6.

Disease terms significantly enriched in annotations of aging-related genes: The tag cloud shows those disease terms in the annotations of the 261 aging related genes that are statistically enriched given our gene–disease background annotation dataset.

Terms that are significantly enriched appear larger. We used a binomial test to detect enriched disease terms in the aging related gene set. Note that mis-annotated terms (such as Recruitment) and non-informative terms (such as Disease) are not deemed enriched by the statistical analysis.

More »

Expand