Skip to main content
Advertisement

< Back to Article

A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data

Fig 3

Performance of loop-scores vs spectral-biclustering applied to the planted-bicluster problem.

For each instantiation of the planted-bicluster problem we choose an M, m, ε and l; we use these parameters to generate a random M × M matrix D and embedded m × m rank-l submatrix B with spectral noise ε. For each instantiation, our algorithm produces a list of row- and column-indices of D in the order in which they are eliminated; those rows and columns retained the longest are expected to be members of B. To assess the success of our algorithm we calculate the auc AR (i.e., area under the receiver operator characteristic curve) associated with the row-indices of B with respect to the output list from our algorithm. The value AR is equal to the probability that: given a randomly chosen row from B as well as a randomly chosen row from outside of B, our algorithm eliminates the latter before the former (i.e., the latter is lower on our list than the former); We calculate the auc AC for the columns similarly. Finally, we use A = (AR + AC)/2 as a metric of success; values of A near 1 mean that the rows and columns of B were filtered to the top by our algorithm, whereas values of A near 0.5 mean that our algorithm failed to detect B. In the top of Panel-A we show the trial-averaged auc A for our loop-counting method as a function of and logM (m). Results for l = 1 are shown on the left; l = 2 is shown on the right. Each subplot takes the form of a heatmap, with each pixel showing the value of A for a given value of and logM (m) (averaged over at least 128 trials). The different subplots correspond to different values for M. Note that our loop-counting algorithm is generally successful when and . In the bottom of Panel-A we show the analogous auc A for a simple implementation of the spectral method (see section of S2 Text). In Panel-B we show the difference in trial-averaged A between these two methods (see colorbar for scale). Note that when l ≥ 2 or the noise is small, our loop-score generally has a higher rate of success than the spectral method. On the other hand, there do exist parameters when l = 1 and where the spectral method has a higher rate of success. In each panel the thin grey line shows the detection-boundary for our loop-counting method (calculated using ).

Fig 3

doi: https://doi.org/10.1371/journal.pcbi.1006105.g003