Figure 1.
General additive model for sources of gene expression variability.
The matrix
of measured gene expression levels of
genes from
individuals is modelled by additive contributions from components
and observation noise
. Here, the components capture the signal due to primary effect of the genetic state
, known factors
and hidden factors
. Some examples of possible underlying sources of variation are given above the model boxes. The groupings represent some standard genetic association models commonly used.
Figure 2.
Bayesian network and outline of the inference schedule for VBQTL.
(a) The Bayesian network for the model of gene expression variation used in VBQTL (see Methods). The full model combines genetic (green), known factor (blue) and hidden factor (red) models to explain the observed gene expression levels . The solid rectangles indicate that contained variables are duplicated for each gene probe (
), SNP (
) or factor (
) respectively. A similar rectangle for individuals (
) is omitted in this representation. The dashed rectangle indicates that the variable
switches the contained part of the graph on or off representing the existence or lack of an association. Nodes with thick outlines (
,
and
) are observed. (b)–(e) Update cycle of the known factors model introduced in Section Inference. The red outline highlights the parts of the model that change in a step, and the thick blue arrows illustrate the flow of information. Details of these updates are discussed in the text.
Figure 3.
Sensitivity of recovering simulated hidden factor effects and eQTLs for Bayesian and non-Bayesian methods.
(a) Mean-squared error in estimating only the hidden factor contribution. Methods that do not explicitly retain the genetic factors explain them away as hidden global factors, resulting in high error comparable to not accounting for hidden factors at all (Standard). (b) Mean-squared error in estimating the contribution from hidden and genetic factors. (c) Sensitivity of recovering immediate SNP associations. (d) Sensitivity of recovering downstream associations. Seven hidden factors and three transcription factor effects were simulated. For eQTL sensitivity, standard eQTL finding on simulated data (Standard) and same data without the hidden effects (Ideal) are included as comparisons. PCAsig and SVA identified a constant number of hidden components (marked with a diamond shape), thus only a single result (dashed line) is given.
Figure 4.
Number of probes with an eQTL found as a function of maximum number of hidden factors for three previously published datasets.
Significance-testing based methods (PCAsig, SVA) identified the same number of factors for a wide range of cutoff values (), thus only a single count is given (dashed lines), together with the number of factors found (diamond shape). Other methods were applied with a maximum number of
,
,
and
hidden factors.
Figure 5.
Fraction of tested genes with a cis association in individual chromosomes and overall false discovery rate for the HapMap CEU population (FPR = ).
Figure 6.
Validation of VBeQTLs by comparison to standard eQTLs.
(a,b,d,e) Venn diagrams depicting overlap of probes with a standard eQTL or VBeQTL in the CEU population and probes with an eQTL in other populations. (c,f) Standard and VBeQTL location and strength relative to the transcription start site.