A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data

doi:10.1371/journal.pcbi.1006105

A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data

Fig 2

Illustration of the algorithm operating on a case-matrix alone (i.e., D only).

In Panel-A we show a large M × N binarized matrix D (black and white pixels correspond to values of ±1, respectively). In the upper left corner of D we’ve inserted a large rank-1 bicluster B (shaded in pink). Our algorithm considers all 2 × 2 submatrices (i.e., ‘loops’) within D. Several such loops are highlighted via the blue rectangles (the corners of each rectangle pick out a 2 × 2 submatrix). Generally speaking, loops are equally likely to be rank-1 or rank-2. Some loops, such as the loop shown in red, are entirely contained within B. These loops are more likely to be rank-1 than rank-2. In Panel-B we show some examples of rank-2 and rank-1 loops. Given a loop with row-indices j, j′ and column-indices k, k′, the rank of the loop is determined by the sign of . Our algorithm accumulates a ‘loop-score’ for each row j and each column k. In its simplest form, the loop-score for a particular row j is given by . Analogously, the loop-score for a column k is given by . In Panel-C we show the distribution of loop-scores we might expect from the rows or columns within D. The blue-curve corresponds to the distribution of scores expected from the rows/cols of D that are not in B, whereas the red-curve corresponds to the distribution of scores expected from the rows/cols of B. In Panel-D we show the distribution of loop-scores we might expect by pooling all rows or columns of D. The rows or columns that correspond to the lowest scores are not likely to be part of B.

doi: https://doi.org/10.1371/journal.pcbi.1006105.g002