Figure 1.
Distribution of the number of proteins annotated per article.
X-axis: number of annotating articles. Y-axis: number of annotated proteins. The distribution was found to be logarithmic with a significant () linear fit to the log-log plot. The data came from 76137 articles annotating 256033 proteins with GO experimental evidence codes, in Uniprot-GOA 12/2011.
Table 1.
Annotation Cohorts.
Figure 2.
Relative contribution of top-50 articles to the annotation of major model organisms.
The length of each bar represents the percentage of proteins annotated by the top-50 articles in a given organism by a given GO term. GO terms that are present in more than one species are highlighted.
Figure 3.
Redundancy in proteins described by the top-50 articles.
A circle represents the sum total of articles annotating each organism. Each colored arch is composed of all the proteins in a single article. A line is drawn between any two points on the circle if the proteins they represent have 100% sequence identity. A black line is drawn if they are annotated with a different ontology (for example, in one article the protein is annotated with the MFO, and in another article with BPO); a red line if they are annotated in the same ontology. Example: S. pombe is described by two articles, one with few protein (light arch on bottom) and one with many (dark arch encompassing most of circle). Many of the same proteins are annotated by both articles. See Table 2 for numbers.
Table 2.
Sequence Redundancy in Top-50 Annotating Articles.
Table 3.
Annotation Consistency in Top 50 articles.
Figure 4.
Information provided by articles depending on the number of proteins the articles annotate.
Articles are grouped into cohorts: 1: one protein annotated by article; : more than 1, up to 10 annotated;
: more than 10, less than 100 annotated;
: 100 or more proteins annotated per article. Blue bars: Molecular Function ontology; Green bars: Biological Process ontology; Red bars: Cellular Component ontology. Information is gauged by A: Information Content and B: GO depth. See text for details.
Table 4.
Fraction of Proteins Exclusively Annotated by High Throughput Studies.
Table 5.
Annotation Consistency Example.