A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data
Fig 2
Illustration of the algorithm operating on a case-matrix alone (i.e., D only).
In Panel-A we show a large M × N binarized matrix D (black and white pixels correspond to values of ±1, respectively). In the upper left corner of D we’ve inserted a large rank-1 bicluster B (shaded in pink). Our algorithm considers all 2 × 2 submatrices (i.e., ‘loops’) within D. Several such loops are highlighted via the blue rectangles (the corners of each rectangle pick out a 2 × 2 submatrix). Generally speaking, loops are equally likely to be rank-1 or rank-2. Some loops, such as the loop shown in red, are entirely contained within B. These loops are more likely to be rank-1 than rank-2. In Panel-B we show some examples of rank-2 and rank-1 loops. Given a loop with row-indices j, j′ and column-indices k, k′, the rank of the loop is determined by the sign of . Our algorithm accumulates a ‘loop-score’ for each row j and each column k. In its simplest form, the loop-score for a particular row j is given by
. Analogously, the loop-score for a column k is given by
. In Panel-C we show the distribution of loop-scores we might expect from the rows or columns within D. The blue-curve corresponds to the distribution of scores expected from the rows/cols of D that are not in B, whereas the red-curve corresponds to the distribution of scores expected from the rows/cols of B. In Panel-D we show the distribution of loop-scores we might expect by pooling all rows or columns of D. The rows or columns that correspond to the lowest scores are not likely to be part of B.