Beyond Captions: Linking Figures with Abstract Sentences in Biomedical Articles

doi:10.1371/journal.pone.0039618

Figure 1.

An example of a full-text biomedical article (pmid = 12808147) with author identified links between sentences in the abstract and figures and tables in the body of the article.

Abstract sentences are shown in different colors. Arrows denote the annotated associations and arrow colors correspond to sentence color. To save space, figure captions are truncated and Fig. 2, which is not linked with any sentence, is not shown. (Figures republished with permission from [32], Copyright (2003) National Academy of Sciences, U.S.A.).

More »

Expand

Figure 2.

Recall-precision curves for three LMs and the baseline.

The (Fixed Size, Mixture) model is our CompleteLM. The filled circles denote locations of the points.

More »

Expand

Table 1.

Performance measures of text-only models.

More »

Expand

Figure 3.

Empirical cumulative distribution functions of sentence length for four collections of instances: all 5406 instances, all 947 linked instances, and the top 826 scoring instances from (Fixed Size, Mixture) and (Variable Size, Mixture) language models.

More »

Expand

Figure 4.

Results of permutation tests showing article-effects on two performance measures: area under the ROC curve (left) and precision (right).

Blue and magenta points show actual performance values for the CompleteLM model calculated with the whole-corpus and per-article methods, respectively. The red-line shows a normalized histogram of per-article performance for 1000 random permutations of the associations between articles and abstract sentence/figure instances.

More »

Expand

Table 2.

Percent information gain of non-text features.

More »

Expand

Figure 5.

Whole-corpus recall-precision curves.

The solid dots indicate the recall-precision point at , when the number of predicted linked instances is equal to the total number of abstract sentences in the corpus.

More »

Expand

Table 3.

Performance values for models that use combinations of text, positional and linkage features.

More »

Expand

Table 4.

Results for 14 articles with human annotations provided by both authors and non-authors, and computational predictions provided by the CRF (SIS) model.

More »

Expand

Table 5.

Survey questions and average response values.

More »

Expand

Figure 6.

Example graph and linkage matrix representations for an article with four abstract sentences, three figures and four sentence/figure links.

Combinations of linkages that induce edges that cross in the graph representation, –– and –– in this example, are less common as they are out of keeping with the observed tendency for consistent relative ordering among linked instances.

More »

Expand

Figure 7.

Example HMM (a) and CRF (b) state transition diagrams using the sentences-in-states construction.

(a) States and transitions for the base HMM for a corpus where the maximum number of abstract sentences in an article () is 4. The states and transitions in sold blue are part of the derived HMM for an article with sentences. (b) CRF states and transitions for an article with sentences and where the maximum number of sentences per figure () is 2.

More »

Expand