Nearest Template Prediction: A Single-Sample-Based Flexible Class Prediction with Confidence Assessment

doi:10.1371/journal.pone.0015543

Figure 1.

Methodology of Nearest Template Prediction (NTP).

Based on predetermined gene signature of subclasses A and B including n_A over-expressed genes in subclass A (marker.A) and n_B over-expressed genes in subclass B (marker.B), template.A and template.B are defined as representative expression pattern of the signature genes for each subclass (Templates of subclass A and B). From microarray data measuring N genes in sample S (Microarray data of sample S), the n_A+n_B signature genes are extracted (Signature in sample S: sample.signature), and its proximity to the templates is evaluated by calculating distance d. The label of closer template is assigned as a prediction for sample S, and its significance is estimated as a nominal p-value based on a null distribution for d generated by randomly resampling n_A+n_B genes from the N genes 1,000 times. Red and blue colors in heatmaps indicate high and low gene expression, respectively.

More »

Expand

Table 1.

Datasets.

More »

Expand

Figure 2.

Example 1: Prediction of acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML).

Previously reported ALL-AML signature was evaluated on the test set (N = 35) employed in the study. Samples are ordered according to proximity to either of the template of ALL or AML to visualize clear or unclear expression pattern of the signature genes in each sample. Note that closer distance to template (i.e., most left or right in heatmap) does not necessarily indicate higher prediction confidence because the confidence p-value is calculated based on null distribution for the distance generated for each individual sample. CART: classification and regression tree, WV: weighted voting, SVM: support vector machine, kNN: k-nearest neighbor, FDR: false discovery rate.

More »

Expand

Table 2.

Summary of prediciton error rates according to prediction method.

More »

Expand

Table 3.

Prediciton results according to prediction confidence.

More »

Expand

Figure 3.

Example 2: Cross-platform prediction of estrogen receptor (ER) positivity in breast cancer.

The ER signature was defined and tested on a pair of independent training (N = 97) and test (N = 49) datasets generated on different microarray platforms (see Table 1 for details). Samples are ordered according to proximity to either of the template of ER (+) or ER (−).

More »

Expand

Figure 4.

Example 3: Cross-species prediction of liver cirrhosis between human and rat.

Human liver cirrhosis signature generated by comparing 13 cirrhotic and 10 normal livers was tested on rat livers with or without liver cirrhosis induced by bile duct ligation.

More »

Expand

Figure 5.

Example 4: Prediction of multiple tissue types.

A signature distinguishing 4 tissue types (breast, prostate, lung, and colon) was defined in the training set (N = 51) and evaluated on the test set (N = 52). The signature was defined as a concatenation of over-expressed genes in each tissue type in comparison to the rest.

More »

Expand

Figure 6.

Example 5: Prediction of breast cancer molecular subclass.

A signature distinguishing 5 breast cancer subclasses was defined in the training set (N = 295) and evaluated in 3 independent test sets: (a) test set 1 (“TransBig”, N = 198), (b) test set 2 (“Wang”, N = 286), and (c) test set 3 (“Weigelt”, N = 53). The signature was defined as a concatenation of over-expressed genes in each subclass in comparison to the rest.

More »

Expand