Skip to main content
Advertisement

< Back to Article

Fig 1.

Partitioning the gene-gene correlation matrix.

Genes are sorted and binned according to increasing expression level, and the correlation matrix is partitioned into 10x10 non-overlapping submatrices of equal size. The diagonal submatrices are indicated by gray shading.

More »

Fig 1 Expand

Fig 2.

The mean-correlation relationship between gene expression level and the distribution of observed gene-gene correlations.

The distribution of Pearson correlations of gene pairs using 350 RNA-seq samples from adipose subcutaneous tissue, with 4 PCs removed. (a) Densities of the Pearson correlation between gene pairs stratified by overall expression (10 bins ranging from low to high expression). Average expression level for each expression bin is given by the values to the right of the densities. (b) A 2D boxplot where each box represents the IQR of the Pearson correlations between all genes (termed background IQR) in a submatrix of the correlation matrix corresponding to two bins of expression. (c) The relationship between IQRs of the Pearson correlations between all genes in a submatrix (y-axis), and the minimum between the average expression level of the two bins associated with the submatrix (x-axis).

More »

Fig 2 Expand

Fig 3.

The mean-correlation relationship leads to expression bias.

Same data as Fig 2. (a) Like Fig 2a, but supplemented with the densities (scaled differently from the background densities) of the top 0.1% of the correlations in each expression bin, representing possible signal. (b) We calculate the average expression level between two genes involved in a gene-gene correlation, as a measure of the expression level of the pair. The expression level of pairs of genes for either all expressed genes (black) or all gene pairs in the top 0.1% of the correlations (gray). (c) The expression level of pairs of genes for either all genes (black) or all gene pairs with a known protein-protein interaction (PPI) (pink).(d) The expression level of pairs of genes for either all genes (black) or all gene pairs within a regulatory pathway (green).

More »

Fig 3 Expand

Fig 4.

Spatial quantile normalization explained.

(a) A submatrix Xi,j of the correlation matrix, and its enclosing submatrix Yi,j. (b) Two directly adjacent, non-overlapping, submatrices Xi,j, Xi+1,j and their enclosing, overlapping, submatrices Yi,j, Yi+1,j. The enclosing submatrices Yi,j, Yi+1,j are used to form the empirical distribution functions , which are then applied to the non-overlapping submatrices as .

More »

Fig 4 Expand

Fig 5.

Spatial quantile normalization removes the mean-correlation relationship.

Same data as Fig 2, but after applying spatial quantile normalization (SpQN). (a) Like Fig 3a, i.e. densities of the Pearson correlation between all genes within each of 10 expression bins (background) as well as the top 0.1% correlations (possible signal). (b) Like Fig 2b, i.e. IQRs of Pearson correlations between genes in each of 10 different expression levels. (c) Like Fig 2c, i.e. the relationship between IQR of gene-gene correlation distribution and the lowest of the two expression bins associated with the submatrix. (d) Like Fig 3b, i.e. the expression level of pairs of genes in different subsets (all genes (black), genes above the 0.1% threshold with (orange) and without SpQN (gray)).

More »

Fig 5 Expand

Fig 6.

IQR of gene-gene correlation distributions in each bin for 9 tissues.

RNA-seq data from [15] from 9 tissues with 4 PCs removed. A point in this figure corresponds to one submatrix in a given gene-gene correlation matrix for each tissue, before and after SpQN. (a) Background IQR for unadjusted (left smear) and SpQN-adjusted (right smear) gene-gene correlation distributions for all expression bins across 9 GTEx tissues. Color indicates expression level. (b) The relationship between sample size for a tissue and background IQR for correlation distributions before and after SpQN adjustment.

More »

Fig 6 Expand

Fig 7.

The impact of SpQN on transcription factor co-expression.

Data is from adipose subcutaneous from GTEx. The percent increase in the number of edges (y-axis) identified after thresholding (x-axis) the correlation matrix. (a) Edges involving transcription factors. (b) Edges between genes with protein-protein interactions, where one of the involved genes is a transcription factor. Additional tissues are depicted in Figs E and F in S1 Text.

More »

Fig 7 Expand

Fig 8.

The relationship between the signal threshold (in percentage) and the expression level (before and after SpQN adjustment).

We define the co-expression signal threshold (x-axis) as the top percentage of absolute correlation values (ranging between 0 and 3%). For a given signal threshold, we calculate the average expression level (y-axis), before (blue) and after SpQN (pink). The average gene expression level is shown by the dotted black line. Data is from the Adipose subcutaneous tissue with 4 PCs removed, see Fig G in S1 Text for all 9 tissues.

More »

Fig 8 Expand

Fig 9.

The expression bias in graphical lasso network inference.

The expression levels of networks inferred by graphical lasso. Different values of the tuning parameter (ρ) results in different network sizes (x-axis) with higher values of the tuning parameter leading to smaller networks. The average gene expression is shown by the dotted black line. Data is from Adipose subcutaneous with 4 PCs removed, see Fig I in S1 Text for additional tissues.

More »

Fig 9 Expand

Fig 10.

Mean-correlation relationship in scRNA-seq data.

scRNA-seq data from [16] containing 60 cells, with either 4 PCs removed (a-b) or 16 PCs removed (c-d). The later value is the result of using SVA to estimate the number of PCs as suggested by [11]. (a) Like Fig 3a, i.e. densities of the Pearson correlation between all genes within each of 10 expression bins (background, blue) as well as the top 0.1% correlations (possible signal, red). (b) Like Fig 5d, i.e. the expression level of pairs of genes in different subsets (all genes (black), genes above the 0.1% threshold with (orange) and without SpQN (gray)). (c) Like (a) but for data with 16 PCs removed. (d) Like (b) but for data with 16 PCs removed.

More »

Fig 10 Expand

Fig 11.

The impact of removing principal components.

Data from “heart left venticle”. (a) Background and signal distribution without removing principal components. (b) Average bias (median of the 10 background distributions) as a function of PCs removed. (c) Average variance (average variance of the 10 background distributions) as a function of PCs removed. (d) Average expression after removing 4 PCs. (e) Average expression after removing a number of PCs estimated using SVA.

More »

Fig 11 Expand

Fig 12.

Mean-correlation in a differential setting.

Data in (a,b): 100 samples were randomly selected from each of 3 GTEx tissues (adipose subcutaneous, adrenal gland and artery tibial) for a total of 300 samples. We removed 4 principal components from the resulting correlation matrix. Data in (c,d): bulk RNA-seq of a time course experiment on drosophila embryonic development with 30 samples. We removed 5 principal components from the resulting correlation matrix. (a) Densities of the Pearson correlation between gene pairs stratified by overall expression, for the GTEx data. (b) The relationship between IQRs of the Pearson correlations between all genes in a submatrix (y-axis), and the minimum between the average expression level of the two bins associated with the submatrix (x-axis), for the GTEx data. (c) Like (a), but for the drosophila data. (d) Like (b), but for the drosophila data.

More »

Fig 12 Expand