Model Selection Emphasises the Importance of Non-Chromosomal Information in Genetic Studies

Ever since the case of the missing heritability was highlighted some years ago, scientists have been investigating various possible explanations for the issue. However, none of these explanations include non-chromosomal genetic information. Here we describe explicitly how chromosomal and non-chromosomal modifiers collectively influence the heritability of a trait, in this case, the growth rate of yeast. Our results show that the non-chromosomal contribution can be large, adding another dimension to the estimation of heritability. We also discovered, combining the strength of LASSO with model selection, that the interaction of chromosomal and non-chromosomal information is essential in describing phenotypes.


Introduction
Genome-wide association studies (GWAS) have contributed to the identification of many human loci associated with a wide range of complex traits, such as height, intelligence or diseases such as obesity, type 2 diabetes and age-related macular degeneration.However, GWAS do not explain the whole story of the observed heritability of these traits [1,2].Explanations for this missing heritability include-amongst others-variants with effects too small to be identified with statistical significance [3], variant interactions that cannot be detected with current estimates [4], rare variants not identified by GWAS [3], and epigenetic effects [5][6][7].
Interestingly, non-chromosomal genetic information has not yet been taken into account when estimating heritability, although there is evidence for effects on the phenotype arising from cytoplasmic elements in many different organisms.For instance, Cadwell et al. [8] showed in their work on a mouse model that the interaction between a specific virus infection and a mutation in Crohn's disease susceptibility gene Atg16L1 induces intestinal pathologies.
Recently, our collaborators from MIT have designed an experiment using yeast where both sources of information, chromosomal and non-chromosomal were controlled in order to observe the phenotype of a single chromosomal polymorphism in the presence of different cytoplasmic elements [9].They showed that the source of the mitochondrial genome, and the presence or absence of a dsRNA virus, both affect the phenotype of chromosomal variants [9].
Unfortunately, their statistical analysis had some limitations.Firstly, they split the data into training and test sub-samples (1/10 of the data held-out for testing) in order to conduct tenfold cross validation.This appears to be the wrong approach, given that the sample sizes (see supplementary S1 Table ) for the different gene deletion experiments range from 9 to 20.Secondly, they computed the coefficient of determination (R 2 ), here used as metric for recovered heritability, for three different models that consider (i) only the chromosomal mutation, (ii) the effects of the chromosomal mutation and non-chromosomal information, and (iii) both the effect of the chromosomal mutation and non-chromosomal information as well as their interaction.They inferred-exclusively from the gain in R 2 -that non-chromosomal information and interaction effects substantially contribute to the heritability.However, it is well known that R 2 values may increase with increasing number of explanatory variables, and, hence, can not be exclusively applied to meaningfully compare models with different number of variables.
In this study, we applied more sophisticated statistical means and models, such as the adjusted coefficient of determination (R 2 a ), a different cross validation strategy and a combination of Least Absolute Shrinkage and Selection Operator (LASSO) [10] with the Bayesian information criterion (BIC) [11,12], as well as the recently introduced LASSO for hierarchical interactions [13], to detect the effects of non-chromosomal modifiers on the heritability of a trait.Our results confirm the importance of non-chromosomal information and its interaction with chromosomal mutations, when using both LASSO and BIC as well as LASSO for hierarchical interactions.

Results
Previous work studied two inherited non-chromosomal modifiers.First, the presence and absence of the endogenous dsRNA yeast "killer" virus that is transmitted by mitosis and meiosis [14,15], and second, the mitochondrial diversity.Strains were either constructed with ([kil-k]) or without ([kil-0]) the virus, that may spontaneously be lost and transition from [kil-k] to [kil-0].Hence, its presence was constantly controlled by either a petri plate assay, or by detection of dsRNA on a gel.Furthermore, experiments were designed using either S288c ([rho+] S288c ) or Sigma mitochondria ([rho+] Sigma ).The mitochondria differ strongly in their genomes, with about 2-3 SNPs per kilobase and ten times more insertions and deletions compared to the chromosomal genome.The studied model chromosomal modifiers were gene deletions in the yeast strains S288c and Sigma [16].Previous studies already detected several mutations with lethal or slow growth phenotypes in one strain, but not in the other [16].A detailed description of the experiments can be found in Edwards et al. [9].A summary of the colony size measurements is provided in S1 Table (see supplementary).

Non-chromosomal information explains increased heritability
We analysed the effects of non-chromosomal modifiers on growth phenotypes of chromosomal variants using raw data from Edwards et al. [9], with c ij as the colony size of controlled genotype i in replicate j.We applied a variance-stabilising Box-Cox transform on the c ij values with exponent of 0.25 to obtain normalised values y ij , as this parameter choice gave the least-correlated means and variances in the analysed sets (according to Edwards et al. [9]).Next, we split the data (for each genotype i separately) into equally sized training and test sub-samples and applied the following linear models, (i) (additive), and (iii) , with Y = (y 1 , . .., y n ) t as the response vector, = ( 1 , . .., n ) t * N(0, σ 2 I n ) as the noise vector, X j as the jth predictor for j = 1, . .., p, and β = (β 1 , . .., β p ) t as the vector of parameters of interest to be estimated.Each β j , j = 1, . .., p represents the association between the variable X j and the response Y.
The simple model considered only the chromosomal mutation, whereas the additive and interaction model considered both chromosomal and non-chromosomal effects.The interaction model includes the interaction between chromosomal and non-chromosomal effects.We then calculated R 2 a values (similar to Edwards et al. [9]), for the three different models.We applied R 2 a , a modification of R 2 , because it is capable of handling the inflation of R 2 , when comparing different models.
In Fig. 1 we illustrated the fractions of phenotypic variances (y-axis) for ten single gene deletions (x-axis) with the three different models.In order to ensure the stability of the R 2 a values we repeated the procedure of splitting the data into training and test sub-samples and calculating the R 2 a 1000 times.We then plotted the average R 2 a as bar heights.The error bars show the standard deviation across the 1000 sampled test sets.Bars representing the simple model are illustrated in red, additive in orange and the interaction model is shown in yellow.Aside from the control MCM22 and the gene deletion PHO88(non-killer) experiment, it is clear that all experiments had a noticeably increase in model accuracy when including non-chromosomal and interaction effects.
Additional to the R 2 a , we computed, for each of the three linear models, the mean squared error (MSE) (see Material and Methods) to compare their performances.We repeated the procedure, as for the R 2 a , 1000 times and considered the model with the lowest MSE as the best for the given test sub-samples.In Table 1 we show, for each gene deletion experiment, the frequency of the chosen linear models according to their MSEs.Again we observed that, aside from the control MCM22 and the gene deletion experiment PHO88(non-killer), the interaction model is chosen in most cases, which emphasises the importance of the non-chromosomal information and its interaction with chromosomal mutations.
Despite these results indicating the importance of non-chromosomal information, we applied LASSO along BIC to verify our findings.

LASSO and BIC
We analysed the effects of the three predictors (X 1 , X 2 and X 1 X 2 ) using LASSO alongside BIC.We take advantage of the fact that model selection criteria extracts from a set of candidate models (in our case the simple, additive and interaction models) those that best describe a given dataset.One advantage of LASSO over simple linear models is that the regression and model selection can be applied in a single procedure.
As in the previous approach, we performed 1000 LASSO regressions with BIC model selection for each gene deletion experiment.We then studied the complexity of the BIC-selected models that best describe the given data.In Table 2 we summarise the different model sizes selected during 1000 modelling repeats.If we disregard the gene deletions MCM22 and PHO88 (non-killer), most experiments require two or three predictors to explain the data.An exception is the PHO88 gene deletion experiment, where, in 618 out of 1000 modelling repeats, one predictor is sufficient.The predictor representing the interaction between chromosomal and nonchromosomal modifiers was preferentially chosen by BIC.
Furthermore, we analysed the frequency of predictors representing chromosomal (X 1 ) and non-chromosomal effects (X 2 ) as well as their interaction (X 1 X 2 ) within the 1000 procedures (see Table 3).Excluding the cases MCM22 and PHO88(non-killer), the predictors X 1 and X 1 X 2 are selected in most cases.Regarding the gene deletion PHO88 experiment, the interaction term (X 1 X 2 ) is chosen more often as the other predictors.
As for the previous method, these results highlighted the importance of the interaction between chromosomal and non-chromosomal modifiers.However, we discovered that LASSO selected the main or interaction effects arbitrarily (not only for gene deletion experiment PHO88), while it is a general good statistical practice to include interaction effects only if the main effects are also in the model.Therefore, we applied the recently introduced LASSO for hierarchical interaction [13].

LASSO for hierarchical interactions
In order to use LASSO for hierarchical interactions, we used the R software package hierNet [17,18] that fits interaction models with the restriction that the interaction between two variables is only included if both variables are included as main effects (strong hierarchy).The analysis procedure consisted of the following steps.First, we fitted several LASSO models with different values of the regularisation parameter λ (function hierNet.path) to the data.Second, Aside from the MCM22 and PHO88(non-killer) gene deletion experiments, the predictors representing the chromosomal (X 1 ) as well as the interaction effect (X 1 X 2 ) are selected in almost all modelling processes.An exception is the PHO88 deletion experiment where only the interaction predictor (X 1 X 2 ) is most frequently selected. doi:10.1371/journal.pone.0117014.t003 we applied a cross validation (function hierNet.cv)and chose the model with the largest value of λ such that the error is within 1 standard error of the minimum.We repeated this procedure 1000 times and analysed the complexity of the resulting models as before.Interestingly, we identified that in most cases, aside from the experiments MCM22 and PHO88(non-killer), three predictors (X 1 , X 2 and X 1 X 2 ) are required to best describe the data (see S2 Table ).An exception is the gene deletion experiment PHO88, where only around half of the 1000 modelling repeats required three predictors.This recently developed LASSO approach confirmed our earlier findings that non-chromosomal information and its interaction with chromosomal mutation is important.

Discussion
Our analyses revealed that the phenotype of a chromosomal mutation may be affected by nonchromosomal elements such as mitochondria and viral state.We also showed that the introduction of non-chromosomal information and its interaction with chromosomal elements considerably enhanced the fraction of explained phenotypic variance of a trait, which is ensured by conserving the chromosomal contribution and the environment, whilst changing the nonchromosomal effects.Previous studies [19][20][21][22][23], that crossed strains carrying a dsRNA virus with virus-free strains as S288c [24], may have also been affected by non-chromosomal elements or their interaction with chromosomal ones, although we cannot prove it without repeating the experiments and analyses.
However, it is well known that the coefficient of determination, here used as metric for recovered heritability, is prone to an increase when adding more variables to the statistical model.Hence, we could not exclude that the gain in R 2 a arose only from gain in explained heritability.
Due to this fact, we applied model selection criteria BIC to investigate the importance of both chromosomal and non-chromosomal information, as well as their interaction in describing colony size data.BIC highlighted that not only chromosomal mutations, but also the interactions between chromosomal and non-chromosomal elements, such as mitochondria and dsRNA virus are important.The mitochondrial background plays a crucial role in the PHO88 gene deletion case, where cells grow faster with a Sigma mitochondria background ([kil-k] [rho+] Sigma ) compared to cells with a S288c mitochondria ([kil-k] [rho+] S288c ).With BIC model selection, the interaction term (X 1 X 2 ) is sufficient to describe the data for this gene deletion experiment in two out of three cases.
Furthermore, we examined LASSO with hierarchical interactions.This approach, unlike ordinary LASSO, prevents the inclusion of interaction effects unless the main effects are also included.We identified for most cases, that not only non-chromosomal, but also its interaction with chromosomal effects, is essential to best describe the colony size data.For the PHO88 gene deletion case, we identified that the interaction effect is chosen as important in about half the modelling repeats.
In summary, all applied statistical methods point to the fact that non-chromosomal modifiers, and the interaction effects of chromosomal and non-chromosomal elements, account for a substantial fraction of phenotypic variance of growth rates in yeast.

Data
The raw data measurements and the analysis script (R code) can be found in the supplementary section.

Regularisation models
We consider the standard multiple linear regression model with n observations and p explanatory variables (predictors) where Y = (y 1 , . .., y n ) t is the response vector, = ( 1 , . .., n ) t * N(0, σ 2 I n ) is the noise vector; for j = 1, . .., p, X j represents the jth predictor and β = (β 1 , . .., β p ) t is the vector of parameters of interest to be estimated; each β j , j = 1, . .., p represents the association between the variable X j and the response Y.Given estimates b1 ; . ..; bp , we can make predictions using the formula Coefficient of determination (R 2 ) Define TSS ¼ P n i¼1 ðy i À yÞ 2 as the total sum of squares and the residual sum of squares (RSS) The coefficient of determination R 2 or the percentage of variance explained is defined as Adjusted coefficient of determination (R 2 a ) Since RSS always decreases as more variables are added to the model, R 2 always increases as more variables are added.For a least squares model with q variables, the R 2 a statistic is calculated as Maximising R 2 a is equivalent to minimising RSS/(n − q − 1).While RSS always decreases as the number of variables increases, RSS/(n − q − 1) may increase or decrease, due to the presence of q in the denominator.Hence, the R 2 a statistic can be used for selecting among a set of models that contains different number of variables.

MSE
If we have enough observations, we can divide our data set into two parts: a training set of size n train , on which the model is fitted, and a test set of size n test for evaluation of the performance.A measure of prediction performance commonly used is the mean squared error (MSE) on the test set, and it is defined as where y i, test and ŷi;test are respectively the real and predicted values of the response Y in the test data.

LASSO
The LASSO coefficients, bL l , minimises the quantity The LASSO technique penalises the regression coefficients using an l 1 norm.It shrinks the coefficients towards zero.In addition, the l 1 penalty has the effect of forcing some of the coefficient to be exactly equal to zero when the tuning parameter λ is sufficiently large.Hence, the LASSO estimates the coefficients and performs variable selection in a single procedure.The choice of the tuning parameter λ is critical and can be performed using cross validation.

BIC
For the least squares model with q predictors, the BIC is, up to irrelevant constants, given by where ŝ2 is an estimate of the variance of .We select the model that has the lowest BIC value.

LASSO for hierarchical interactions
Bien el al. [13] proposed an interesting approach.They consider the two-way interaction model The additive part is called the main effect, while the quadratic part is called the interaction terms.The goal is to estimate β 2 R p and Y 2 R p × p , where Y = Y t and θ jj = 0 for j = 1, . .., p.This is done using an all-pairs Lasso criterion, which has the following form Minimize b 0 2R;b2R p ;Y2R pÂp 1 2 where kbk 1 ¼ P p j¼1 j b j j, kYk 1 = ∑ j 6 ¼ k jθ jk j, x i is the observed value of X i , β 0 is the intercept, and λ is a positive tuning parameter that can be estimated using cross validation.The method produces sparse interaction models that honour the hierarchy restriction that an interaction is only included in a model if one or both variables are marginally important.

Fig 1 .
Fig 1. Non-chromosomal information enhances the fraction of phenotypic variance explained.Three linear models with different complexity are applied to measure the fraction of phenotypic variance.The first model (simple) includes only the gene deletion status (red), the second model (additive) considers the gene deletion status and non-chromosomal elements (orange), and finally the third model (interaction) includes both chromosomal and nonchromosomal elements as well as their interaction (yellow).The fraction of phenotypic variance is thereby approximated by the average coefficient of determination (R 2 a ) of 1000 randomly sampled sub-sets.Aside from the control MCM22 and the gene deletion PHO88(non-killer) experiment, model accuracy increases considerably when non-chromosomal information is included and much more when the interaction is taken into account.doi:10.1371/journal.pone.0117014.g001

Table 2 .
Complexity of BIC selected statistical models.Model selection using BIC was applied in each of the 1000 modelling repeats for all gene deletion experiments.Except for the control MCM22 and the PHO88(non-killer) experiment most cases require two or three predictors in order to describe the given data best.However, in the PHO88 deletion case only one predictor is necessary in 618 of 1000 modelling cases. doi:10.1371/journal.pone.0117014.t002

Table 3 .
Frequency of BIC selected predictors representing chromosomal (X 1 ) and nonchromosomal effects (X 2 ) as well as their interaction (X 1 X 2 ) within 1000 modelling repeats.