Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Model Selection Emphasises the Importance of Non-Chromosomal Information in Genetic Studies

  • Reda Rawi,

    Affiliation Computational Science and Engineering Center, Qatar Computing Research Institute, Doha, Qatar

  • Mohamed El Anbari,

    Affiliations Computational Science and Engineering Center, Qatar Computing Research Institute, Doha, Qatar, Division of Biomedical Informatics, Sidra Medical and Research Center, Doha, Qatar

  • Halima Bensmail

    Affiliation Computational Science and Engineering Center, Qatar Computing Research Institute, Doha, Qatar

Model Selection Emphasises the Importance of Non-Chromosomal Information in Genetic Studies

  • Reda Rawi, 
  • Mohamed El Anbari, 
  • Halima Bensmail


Ever since the case of the missing heritability was highlighted some years ago, scientists have been investigating various possible explanations for the issue. However, none of these explanations include non-chromosomal genetic information. Here we describe explicitly how chromosomal and non-chromosomal modifiers collectively influence the heritability of a trait, in this case, the growth rate of yeast. Our results show that the non-chromosomal contribution can be large, adding another dimension to the estimation of heritability. We also discovered, combining the strength of LASSO with model selection, that the interaction of chromosomal and non-chromosomal information is essential in describing phenotypes.


Genome-wide association studies (GWAS) have contributed to the identification of many human loci associated with a wide range of complex traits, such as height, intelligence or diseases such as obesity, type 2 diabetes and age-related macular degeneration. However, GWAS do not explain the whole story of the observed heritability of these traits [1, 2]. Explanations for this missing heritability include—amongst others—variants with effects too small to be identified with statistical significance [3], variant interactions that cannot be detected with current estimates [4], rare variants not identified by GWAS [3], and epigenetic effects [57].

Interestingly, non-chromosomal genetic information has not yet been taken into account when estimating heritability, although there is evidence for effects on the phenotype arising from cytoplasmic elements in many different organisms. For instance, Cadwell et al. [8] showed in their work on a mouse model that the interaction between a specific virus infection and a mutation in Crohn’s disease susceptibility gene Atg16L1 induces intestinal pathologies.

Recently, our collaborators from MIT have designed an experiment using yeast where both sources of information, chromosomal and non-chromosomal were controlled in order to observe the phenotype of a single chromosomal polymorphism in the presence of different cytoplasmic elements [9]. They showed that the source of the mitochondrial genome, and the presence or absence of a dsRNA virus, both affect the phenotype of chromosomal variants [9].

Unfortunately, their statistical analysis had some limitations. Firstly, they split the data into training and test sub-samples (1/10 of the data held-out for testing) in order to conduct ten-fold cross validation. This appears to be the wrong approach, given that the sample sizes (see supplementary S1 Table) for the different gene deletion experiments range from 9 to 20. Secondly, they computed the coefficient of determination (R2), here used as metric for recovered heritability, for three different models that consider (i) only the chromosomal mutation, (ii) the effects of the chromosomal mutation and non-chromosomal information, and (iii) both the effect of the chromosomal mutation and non-chromosomal information as well as their interaction. They inferred—exclusively from the gain in R2—that non-chromosomal information and interaction effects substantially contribute to the heritability. However, it is well known that R2 values may increase with increasing number of explanatory variables, and, hence, can not be exclusively applied to meaningfully compare models with different number of variables.

In this study, we applied more sophisticated statistical means and models, such as the adjusted coefficient of determination (Ra2), a different cross validation strategy and a combination of Least Absolute Shrinkage and Selection Operator (LASSO) [10] with the Bayesian information criterion (BIC) [11, 12], as well as the recently introduced LASSO for hierarchical interactions [13], to detect the effects of non-chromosomal modifiers on the heritability of a trait. Our results confirm the importance of non-chromosomal information and its interaction with chromosomal mutations, when using both LASSO and BIC as well as LASSO for hierarchical interactions.


Previous work studied two inherited non-chromosomal modifiers. First, the presence and absence of the endogenous dsRNA yeast “killer” virus that is transmitted by mitosis and meiosis [14, 15], and second, the mitochondrial diversity. Strains were either constructed with ([kil-k]) or without ([kil-0]) the virus, that may spontaneously be lost and transition from [kil-k] to [kil-0]. Hence, its presence was constantly controlled by either a petri plate assay, or by detection of dsRNA on a gel. Furthermore, experiments were designed using either S288c ([rho+]S288c) or Sigma mitochondria ([rho+]Sigma). The mitochondria differ strongly in their genomes, with about 2–3 SNPs per kilobase and ten times more insertions and deletions compared to the chromosomal genome. The studied model chromosomal modifiers were gene deletions in the yeast strains S288c and Sigma [16]. Previous studies already detected several mutations with lethal or slow growth phenotypes in one strain, but not in the other [16]. A detailed description of the experiments can be found in Edwards et al. [9]. A summary of the colony size measurements is provided in S1 Table (see supplementary).

Non-chromosomal information explains increased heritability

We analysed the effects of non-chromosomal modifiers on growth phenotypes of chromosomal variants using raw data from Edwards et al. [9], with cij as the colony size of controlled genotype i in replicate j. We applied a variance-stabilising Box-Cox transform on the cij values with exponent of 0.25 to obtain normalised values yij, as this parameter choice gave the least-correlated means and variances in the analysed sets (according to Edwards et al. [9]). Next, we split the data (for each genotype i separately) into equally sized training and test sub-samples and applied the following linear models, (i) Y = β0 + β1X1 + ϵ (simple), (ii) Y = β0 + β1X1 + β2X2 + ϵ (additive), and (iii) Y = β0 + β1X1 + β2X2 + β3X1X2 + ϵ (interaction), with Y = (y1, …, yn)t as the response vector, ɛ = (ɛ1, …, ɛn)tN(0, σ2In) as the noise vector, Xj as the jth predictor for j = 1, …, p, and β = (β1, …, βp)t as the vector of parameters of interest to be estimated. Each βj, j = 1, …, p represents the association between the variable Xj and the response Y.

The simple model considered only the chromosomal mutation, whereas the additive and interaction model considered both chromosomal and non-chromosomal effects. The interaction model includes the interaction between chromosomal and non-chromosomal effects. We then calculated Ra2 values (similar to Edwards et al. [9]), for the three different models. We applied Ra2, a modification of R2, because it is capable of handling the inflation of R2, when comparing different models.

In Fig. 1 we illustrated the fractions of phenotypic variances (y-axis) for ten single gene deletions (x-axis) with the three different models. In order to ensure the stability of the Ra2 values we repeated the procedure of splitting the data into training and test sub-samples and calculating the Ra2 1000 times. We then plotted the average Ra2 as bar heights. The error bars show the standard deviation across the 1000 sampled test sets. Bars representing the simple model are illustrated in red, additive in orange and the interaction model is shown in yellow. Aside from the control MCM22 and the gene deletion PHO88(non-killer) experiment, it is clear that all experiments had a noticeably increase in model accuracy when including non-chromosomal and interaction effects.

Fig 1. Non-chromosomal information enhances the fraction of phenotypic variance explained.

Three linear models with different complexity are applied to measure the fraction of phenotypic variance. The first model (simple) includes only the gene deletion status (red), the second model (additive) considers the gene deletion status and non-chromosomal elements (orange), and finally the third model (interaction) includes both chromosomal and non-chromosomal elements as well as their interaction (yellow). The fraction of phenotypic variance is thereby approximated by the average coefficient of determination (Ra2) of 1000 randomly sampled sub-sets. Aside from the control MCM22 and the gene deletion PHO88(non-killer) experiment, model accuracy increases considerably when non-chromosomal information is included and much more when the interaction is taken into account.

Additional to the Ra2, we computed, for each of the three linear models, the mean squared error (MSE) (see Material and Methods) to compare their performances. We repeated the procedure, as for the Ra2, 1000 times and considered the model with the lowest MSE as the best for the given test sub-samples. In Table 1 we show, for each gene deletion experiment, the frequency of the chosen linear models according to their MSEs. Again we observed that, aside from the control MCM22 and the gene deletion experiment PHO88(non-killer), the interaction model is chosen in most cases, which emphasises the importance of the non-chromosomal information and its interaction with chromosomal mutations.

Table 1. Frequency of selected linear models according to their MSE within 1000 modelling repeats.

Despite these results indicating the importance of non-chromosomal information, we applied LASSO along BIC to verify our findings.


We analysed the effects of the three predictors (X1, X2 and X1X2) using LASSO alongside BIC. We take advantage of the fact that model selection criteria extracts from a set of candidate models (in our case the simple, additive and interaction models) those that best describe a given dataset. One advantage of LASSO over simple linear models is that the regression and model selection can be applied in a single procedure.

As in the previous approach, we performed 1000 LASSO regressions with BIC model selection for each gene deletion experiment. We then studied the complexity of the BIC-selected models that best describe the given data. In Table 2 we summarise the different model sizes selected during 1000 modelling repeats. If we disregard the gene deletions MCM22 and PHO88(non-killer), most experiments require two or three predictors to explain the data. An exception is the PHO88 gene deletion experiment, where, in 618 out of 1000 modelling repeats, one predictor is sufficient. The predictor representing the interaction between chromosomal and non-chromosomal modifiers was preferentially chosen by BIC.

Furthermore, we analysed the frequency of predictors representing chromosomal (X1) and non-chromosomal effects (X2) as well as their interaction (X1X2) within the 1000 procedures (see Table 3). Excluding the cases MCM22 and PHO88(non-killer), the predictors X1 and X1X2 are selected in most cases. Regarding the gene deletion PHO88 experiment, the interaction term (X1X2) is chosen more often as the other predictors.

Table 3. Frequency of BIC selected predictors representing chromosomal (X1) and non-chromosomal effects (X2) as well as their interaction (X1X2) within 1000 modelling repeats.

As for the previous method, these results highlighted the importance of the interaction between chromosomal and non-chromosomal modifiers. However, we discovered that LASSO selected the main or interaction effects arbitrarily (not only for gene deletion experiment PHO88), while it is a general good statistical practice to include interaction effects only if the main effects are also in the model. Therefore, we applied the recently introduced LASSO for hierarchical interaction [13].

LASSO for hierarchical interactions

In order to use LASSO for hierarchical interactions, we used the R software package hierNet [17, 18] that fits interaction models with the restriction that the interaction between two variables is only included if both variables are included as main effects (strong hierarchy). The analysis procedure consisted of the following steps. First, we fitted several LASSO models with different values of the regularisation parameter λ (function hierNet.path) to the data. Second, we applied a cross validation (function and chose the model with the largest value of λ such that the error is within 1 standard error of the minimum. We repeated this procedure 1000 times and analysed the complexity of the resulting models as before.

Interestingly, we identified that in most cases, aside from the experiments MCM22 and PHO88(non-killer), three predictors (X1, X2 and X1X2) are required to best describe the data (see S2 Table). An exception is the gene deletion experiment PHO88, where only around half of the 1000 modelling repeats required three predictors. This recently developed LASSO approach confirmed our earlier findings that non-chromosomal information and its interaction with chromosomal mutation is important.


Our analyses revealed that the phenotype of a chromosomal mutation may be affected by non-chromosomal elements such as mitochondria and viral state. We also showed that the introduction of non-chromosomal information and its interaction with chromosomal elements considerably enhanced the fraction of explained phenotypic variance of a trait, which is ensured by conserving the chromosomal contribution and the environment, whilst changing the non-chromosomal effects. Previous studies [1923], that crossed strains carrying a dsRNA virus with virus-free strains as S288c [24], may have also been affected by non-chromosomal elements or their interaction with chromosomal ones, although we cannot prove it without repeating the experiments and analyses.

However, it is well known that the coefficient of determination, here used as metric for recovered heritability, is prone to an increase when adding more variables to the statistical model. Hence, we could not exclude that the gain in Ra2 arose only from gain in explained heritability.

Due to this fact, we applied model selection criteria BIC to investigate the importance of both chromosomal and non-chromosomal information, as well as their interaction in describing colony size data. BIC highlighted that not only chromosomal mutations, but also the interactions between chromosomal and non-chromosomal elements, such as mitochondria and dsRNA virus are important. The mitochondrial background plays a crucial role in the PHO88 gene deletion case, where cells grow faster with a Sigma mitochondria background ([kil-k] [rho+]Sigma) compared to cells with a S288c mitochondria ([kil-k] [rho+]S288c). With BIC model selection, the interaction term (X1X2) is sufficient to describe the data for this gene deletion experiment in two out of three cases.

Furthermore, we examined LASSO with hierarchical interactions. This approach, unlike ordinary LASSO, prevents the inclusion of interaction effects unless the main effects are also included. We identified for most cases, that not only non-chromosomal, but also its interaction with chromosomal effects, is essential to best describe the colony size data. For the PHO88 gene deletion case, we identified that the interaction effect is chosen as important in about half the modelling repeats.

In summary, all applied statistical methods point to the fact that non-chromosomal modifiers, and the interaction effects of chromosomal and non-chromosomal elements, account for a substantial fraction of phenotypic variance of growth rates in yeast.

Materials and Methods


The raw data measurements and the analysis script (R code) can be found in the supplementary section.

Regularisation models

We consider the standard multiple linear regression model with n observations and p explanatory variables (predictors) (1) where Y = (y1, …, yn)t is the response vector, ɛ = (ɛ1, …, ɛn)tN(0, σ2In) is the noise vector; for j = 1, …, p, Xj represents the jth predictor and β = (β1, …, βp)t is the vector of parameters of interest to be estimated; each βj, j = 1, …, p represents the association between the variable Xj and the response Y. Given estimates β^1,,β^p, we can make predictions using the formula (2)

Coefficient of determination (R2)

Define TSS=i=1n(yiy¯)2 as the total sum of squares and the residual sum of squares (RSS) as RSS=i=1n(yiy^i)2. The coefficient of determination R2 or the percentage of variance explained is defined as (3)

Adjusted coefficient of determination (Ra2)

Since RSS always decreases as more variables are added to the model, R2 always increases as more variables are added. For a least squares model with q variables, the Ra2 statistic is calculated as (4)

Maximising Ra2 is equivalent to minimising RSS/(nq − 1). While RSS always decreases as the number of variables increases, RSS/(nq − 1) may increase or decrease, due to the presence of q in the denominator. Hence, the Ra2 statistic can be used for selecting among a set of models that contains different number of variables.


If we have enough observations, we can divide our data set into two parts: a training set of size ntrain, on which the model is fitted, and a test set of size ntest for evaluation of the performance. A measure of prediction performance commonly used is the mean squared error (MSE) on the test set, and it is defined as (5) where yi, test and y^i,test are respectively the real and predicted values of the response Y in the test data.


The LASSO coefficients, β^λL, minimises the quantity (6)

The LASSO technique penalises the regression coefficients using an l1 norm. It shrinks the coefficients towards zero. In addition, the l1 penalty has the effect of forcing some of the coefficient to be exactly equal to zero when the tuning parameter λ is sufficiently large. Hence, the LASSO estimates the coefficients and performs variable selection in a single procedure. The choice of the tuning parameter λ is critical and can be performed using cross validation.


For the least squares model with q predictors, the BIC is, up to irrelevant constants, given by (7) where σ^2 is an estimate of the variance of ɛ. We select the model that has the lowest BIC value.

LASSO for hierarchical interactions

Bien el al. [13] proposed an interesting approach. They consider the two-way interaction model (8)

The additive part is called the main effect, while the quadratic part is called the interaction terms. The goal is to estimate β ∈ ℝp and Θ ∈ ℝp × p, where Θ = Θt and θjj = 0 for j = 1, …, p. This is done using an all-pairs Lasso criterion, which has the following form (9) where β1=j=1pβj, ‖Θ‖1 = ∑jkθjk∣, xi is the observed value of Xi, β0 is the intercept, and λ is a positive tuning parameter that can be estimated using cross validation. The method produces sparse interaction models that honour the hierarchy restriction that an interaction is only included in a model if one or both variables are marginally important.

Supporting Information

S1 Table. Summary of colony measurements.



S2 Table. Complexity of best statistical models chosen by LASSO for hierarchical interactions.



S3 Table. Frequency of predictors (X1, X2 and X1X2.) within 1000 modelling repeats using LASSO for hierarchical interactions.



S1 Script. R analysis script.



S1 Data. Raw data measurements.




The yeast experimental design descriptions and the spore growth data were kindly provided by our colleagues David Gifford and Gerald Fink from the Whitehead and Broad Institutes in Cambridge, USA.

Furthermore, we thank Mohammed Dehbi from QBRI and Christopher Leonard from QScience, Qatar Foundation, for improving the quality of the manuscript.

Author Contributions

Conceived and designed the experiments: HB. Performed the experiments: RR. Analyzed the data: RR HB. Contributed reagents/materials/analysis tools: RR MEA HB. Wrote the paper: RR HB. Provided additional advice for statistical analysis: MEA.


  1. 1. Eichler EE, Flint J, Gibson G, Kong A, Leal SM, et al. (2010) Missing heritability and strategies for finding the underlying causes of complex disease. Nature reviews Genetics 11: 446–450. doi: 10.1038/nrg2809. pmid:20479774
  2. 2. Manolio Ta, Collins FS, Cox NJ, Goldstein DB, Hindorff La, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747–753. doi: 10.1038/nature08494. pmid:19812666
  3. 3. Bloom JS, Ehrenreich IM, Loo WT, Lite TLVo, Kruglyak L (2013) Finding the sources of missing heritability in a yeast cross. Nature 494: 234–237. doi: 10.1038/nature11867. pmid:23376951
  4. 4. Zuk O, Hechter E, Sunyaev SR, Lander ES (2012) The mystery of missing heritability: Genetic interactions create phantom heritability. Proceedings of the National Academy of Sciences of the United States of America 109: 1193–1198. doi: 10.1073/pnas.1119675109. pmid:22223662
  5. 5. Slatkin M (2009) Epigenetic inheritance and the missing heritability problem. Genetics 182: 845–850. doi: 10.1534/genetics.109.102798. pmid:19416939
  6. 6. Rassoulzadegan M, Grandjean V, Gounon P, Vincent S, Gillot I, et al. (2006) RNA-mediated non-mendelian inheritance of an epigenetic change in the mouse. Nature 441: 469–474. doi: 10.1038/nature04674. pmid:16724059
  7. 7. Nadeau JH (2009) Transgenerational genetic effects on phenotypic variation and disease risk. Human molecular genetics 18: R202–10. doi: 10.1093/hmg/ddp366. pmid:19808797
  8. 8. Cadwell K, Patel KK, Maloney NS, Liu TC, Ng ACY, et al. (2010) Virus-plus-susceptibility gene interaction determines Crohn’s disease gene Atg16L1 phenotypes in intestine. Cell 141: 1135–1145. doi: 10.1016/j.cell.2010.05.009. pmid:20602997
  9. 9. Edwards MD, Symbor-Nagrabska A, Dollard L, Gifford DK, Fink GR (2014) Interactions between chromosomal and nonchromosomal elements reveal missing heritability. Proceedings of the National Academy of Sciences of the United States of America 111: 7719–7722. doi: 10.1073/pnas.1407126111. pmid:24825890
  10. 10. Tibshirani R (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society (Series B) 58: 267–288.
  11. 11. Schwarz G (1978) Estimating the Dimension of a Model. The Annals of Statistics 6: 461–464. doi: 10.1214/aos/1176344136.
  12. 12. Bogdan M, Ghosh JK, Doerge RW (2004) Modifying the Schwarz Bayesian information criterion to locate multiple interacting quantitative trait loci. Genetics 167: 989–999. doi: 10.1534/genetics.103.021683. pmid:15238547
  13. 13. Bien J, Taylor J, Tibshirani R (2013) A lasso for hierarchical interactions. The Annals of Statistics 41: 1111–1141. doi: 10.1214/13-AOS1096.
  14. 14. Magliani W, Conti S, Gerloni M, Bertolotti D, Polonelli L (1997) Yeast killer systems. Clinical microbiology reviews 10: 369–400. pmid:9227858
  15. 15. Schmitt MJ, Breinig F (2006) Yeast viral killer toxins: lethality and self-protection. Nature reviews Microbiology 4: 212–221. doi: 10.1038/nrmicro1347. pmid:16489348
  16. 16. Dowell RD, Ryan O, Jansen A, Cheung D, Agarwala S, et al. (2010) Genotype to phenotype: a complex problem. Science (New York, NY) 328: 469. doi: 10.1126/science.1189015.
  17. 17. R Core Team (2014) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL
  18. 18. Bien J, Tibshirani R (2014) hierNet: A Lasso for Hierarchical Interactions. URL R package version 1.6.
  19. 19. Ben-Ari G, Zenvirth D, Sherman A, David L, Klutstein M, et al. (2006) Four linked genes participate in controlling sporulation efficiency in budding yeast. PLoS genetics 2: e195. doi: 10.1371/journal.pgen.0020195. pmid:17112318
  20. 20. Sinha H, David L, Pascon RC, Clauder-Münster S, Krishnakumar S, et al. (2008) Sequential elimination of major-effect contributors identifies additional quantitative trait loci conditioning high-temperature growth in yeast. Genetics 180: 1661–1670. doi: 10.1534/genetics.108.092932. pmid:18780730
  21. 21. Steinmetz LM, Sinha H, Richards DR, Spiegelman JI, Oefner PJ, et al. (2002) Dissecting the architecture of a quantitative trait locus in yeast. Nature 416: 326–330. doi: 10.1038/416326a. pmid:11907579
  22. 22. Deutschbauer AM, Davis RW (2005) Quantitative trait loci mapped to single-nucleotide resolution in yeast. Nature genetics 37: 1333–1340. doi: 10.1038/ng1674. pmid:16273108
  23. 23. Kim HS, Fay JC (2009) A combined-cross analysis reveals genes with drug-specific and background-dependent effects on drug sensitivity in Saccharomyces cerevisiae. Genetics 183: 1141–1151. doi: 10.1534/genetics.109.108068. pmid:19720856
  24. 24. Fink GR, Styles CA (1972) Curing of a killer factor in Saccharomyces cerevisiae. Proceedings of the National Academy of Sciences of the United States of America 69: 2846–2849. doi: 10.1073/pnas.69.10.2846. pmid:4562744