A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data

doi:10.1371/journal.pcbi.1006105

A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data

Fig 3

Performance of loop-scores vs spectral-biclustering applied to the planted-bicluster problem.

For each instantiation of the planted-bicluster problem we choose an M, m, ε and l; we use these parameters to generate a random M × M matrix D and embedded m × m rank-l submatrix B with spectral noise ε. For each instantiation, our algorithm produces a list of row- and column-indices of D in the order in which they are eliminated; those rows and columns retained the longest are expected to be members of B. To assess the success of our algorithm we calculate the auc A_R (i.e., area under the receiver operator characteristic curve) associated with the row-indices of B with respect to the output list from our algorithm. The value A_R is equal to the probability that: given a randomly chosen row from B as well as a randomly chosen row from outside of B, our algorithm eliminates the latter before the former (i.e., the latter is lower on our list than the former); We calculate the auc A_C for the columns similarly. Finally, we use A = (A_R + A_C)/2 as a metric of success; values of A near 1 mean that the rows and columns of B were filtered to the top by our algorithm, whereas values of A near 0.5 mean that our algorithm failed to detect B. In the top of Panel-A we show the trial-averaged auc A for our loop-counting method as a function of and log_M (m). Results for l = 1 are shown on the left; l = 2 is shown on the right. Each subplot takes the form of a heatmap, with each pixel showing the value of A for a given value of and log_M (m) (averaged over at least 128 trials). The different subplots correspond to different values for M. Note that our loop-counting algorithm is generally successful when and . In the bottom of Panel-A we show the analogous auc A for a simple implementation of the spectral method (see section of S2 Text). In Panel-B we show the difference in trial-averaged A between these two methods (see colorbar for scale). Note that when l ≥ 2 or the noise is small, our loop-score generally has a higher rate of success than the spectral method. On the other hand, there do exist parameters when l = 1 and where the spectral method has a higher rate of success. In each panel the thin grey line shows the detection-boundary for our loop-counting method (calculated using ).

doi: https://doi.org/10.1371/journal.pcbi.1006105.g003