Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Addressing multicollinearity in general linear model: A novel approach for ridge parameter with performance comparison

  • Muhammad Luqman,

    Roles Data curation, Methodology, Validation, Writing – original draft

    Affiliation College of Statistical Sciences, University of the Punjab, Lahore, Pakistan

  • Sajjad Haider Bhatti ,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Software, Supervision, Writing – review & editing

    sajjad.stat@pu.edu.pk

    Affiliation College of Statistical Sciences, University of the Punjab, Lahore, Pakistan

  • Demet Aydin,

    Roles Formal analysis, Methodology, Software, Writing – review & editing

    Affiliation Data Science and Analytics, Sinop University, Sinop, Türkiye

  • Mohsin Jamil

    Roles Formal analysis, Methodology, Resources, Visualization

    Affiliation College of Statistical Sciences, University of the Punjab, Lahore, Pakistan

Abstract

The problem of ill-conditioned data or multicollinearity is common in regression modelling. The problem results in imprecise parameter estimation which leads to inability of gauging true impact of explanatory variables on the response. Also, due to strong multicollinearity, standard errors of parameter estimates get inflated leading to wider confidence intervals and hence increased risk of type-II error. To handle the problem, different approaches have been proposed in literature. Primarily, such techniques penalize the coefficient estimates in one way or other. Ridge regression is one of the most applied among such techniques. In ridge regression, a penalty term is added in the objective function of the general linear model. That penalty term introduces a small amount of bias in parameter estimates with an objective to decrease the mean square error. In the current article, some new choices for ridge constant are proposed. The performance of proposed ridge choices are compared through Monte Carlo simulations under different scenarios, using mean square error as measure of performance. The simulation results indicate that the proposed ridge estimator performs better than existing ridge constants, in most cases catering for severity of multicollinearity, number of explanatory variables, sample size and error variance structure. The simulation results were further corroborated by comparing performance of proposed ridge penalties using two real-life applications.

1. Introduction

In multiple regression, it is difficult to interpret the coefficient estimates if different explanatory variables are highly correlated among each other. Moreover, strong multicollinearity leads to inflated standard errors of parameters estimates which further result in wider confidence intervals and increased risk of type-II error [13].

Consider the multiple linear regression model in matrix form,

(1)

where Y is a vector of response variable, X is matrix with explanatory variables and a constant term , β is vector of parameters with an intercept and slope coefficients and is vector of random error term.

In regression analysis, a fundamental assumption is the absence of strong linear dependence among explanatory variables. Termed as no-multicollinearity, this assumption means that no pair or group of independent variables have a strong linear relationship [2]. Under common assumptions [3,4] of general linear model including absence of strong linear dependence (multicollinearity) among explanatory variables, the parameter vector can be estimated through ordinary least squares approach as,

(2)

OR

(3)

However, in many practical situations, the assumption of no-multicollinearity is not valid and explanatory variables have strong correlation among each other. In such situations, it is challenging to accurately estimate model parameters using the least squares method given in Equation (2).

To deal with the problem of strong multicollinearity or ill-conditioned data, Hoerl and Kennard [5] pioneered the idea of ridge regression (RR) as an alternative way to estimate the parameters of regression model. To reduce the adverse effects of multicollinearity, the idea of RR is to penalize the large coefficients through a regularization constant which results in improved stability of estimates. Simply, a small amount of bias is introduced that results in lower variance of the parameter estimates. The RR estimates can be obtained as,

(4)

OR

(5)

where is ridge parameter and is the identity matrix of order representing number of regressors. It is common practice in penalized regression approaches to remove the constant term from X matrix and then estimate it after estimating the slope coefficients. With , the RR reduces to common OLS estimation. It is the ridge constant which controls the amount of bias in coefficient estimates and it is generally chosen with an objective to achieve the lower mean square error (MSE) of the elements of the estimated parameter vector . The performance of any RR estimator mainly depends on the choice of ridge penalty. Since the emergence of idea, many approaches have been proposed in literature to choose the ridge parameter [1,519]. Each of these choices offers a different approach to determine the optimal value for ridge parameter () and has been shown to work better in different scenarios encountered in practical applications.

Later, this idea lead to other classes of biased estimation methods like liu, lasso and elastic-net approaches [9,1618,20,21]. These estimation strategies penalize the coefficients in ways different from RR.

Although regularization techniques have been extensively explored in literature, but we believe there exist ample space for more refined, data driven (incorporating severity of problem, i.e., multicollinearity) penalization strategies, especially in applied settings where usual/common assumptions may not be valid every time. Motivated with the objective of proposing ridge estimators related to the underlying level of multicollinearity in the data at hand, in the current article, we propose two new penalty terms to be used in ridge regression and compare their performance with some existing historical and recent choices for ridge parameter.

The rest of the article is structured as: Section 2 provides a brief overview of existing ridge penalties to be compared and new choices for ridge penalty being proposed in this study. The section also explains the simulation scheme. Section 3 presents results from Monte Carlo simulations obtained under different scenarios. Section 4 provides the results obtained by comparing existing and proposed approaches on two real-life datasets. Finally, Section 5 concludes the article.

2. Methodology

Consider the general linear model given in Equation (1) as,

(6)(7)

The general linear model given in Equation (1) can be represented in canonical form as,

(8)

where is the matrix of standardized explanatory variables, E is an orthogonal matrix such that and , where is the diagonal matrix consisting of the eigen values of , and y is response variable centered to its mean. Then the OLS and RR estimators of in canonical form are given as;

(9)(10)

where is the ridge parameter. Once are computed, standardized coefficients () and parameter estimates on original units ( can be obtained as:

(11)(12)

and

(13)

The mean square error of and estimators can be computed respectively as,

(14)(15)

where

(16)

2.1 Existing ridge penalties

As already discussed, ridge regression was pioneered by Hoerl and Kennard [5] as an alternative method to ordinary least squares (OLS) for dealing ill-conditioned data. After their pioneering work, numerous ridge parameters have been suggested by different authors in literature focused on dealing with multicollinear data. The ridge choices which have been used in current study for comparison are briefly described in following sub-sections.

2.1.1 Hoerl and Kennard (1970).

In their initial work, Hoerl and Kennard [5] proposed the following ridge constant to be used in ridge regression,

(17)

Later on, several researchers proposed estimators that incorporated various forms of the Hoerl and Kennard [5] estimator.

2.1.2 Hoerl, Kennard and Baldwin (1975).

In another work, Hoerl et al. [22] recommended a different ridge parameter. It is actually based on the mean of estimates from canonical form instead of just taking the maximum of these estimates. Their proposed ridge penalty is given as,

(18)

2.1.3 Lawless and Wang (1976).

Lawless and Wang [12] suggested an estimator by substituting the denominator of with a weighted sum of taking corresponding eigen values as weights. Their proposed estimator is given as,

(19)

2.1.4 Hocking, Speed and Lynn (1976).

An improved version of the estimator suggested by Lawless and Wang [12] was proposed by Hocking et al. [23] as,

(20)

2.1.5 Kibria (2003).

Kibria [24] proposed three estimators based on arithmetic mean, geometric mean, and median of the term (instead of taking maximum as in Hoerl and Kennard, 1970). The three proposed ridge estimators are defined as,

(21)(22)(23)

2.1.6 Khalaf and Shukur (2005).

Khalaf and Shukur [25] proposed a ridge estimator which is based on the estimate of error variance, maximum eigen value, and regression coefficients in canonical form as follows:

(24)

2.1.7 Khalaf, Mansson and Shukur (2013).

Khalaf et al. [15] proposed some modifications in different ridge estimators by multiplying with a certain weight. In the present study, for comparison, we have taken one ridge estimator from Khalaf et al. [15] which is given as,

(25)

2.1.8 Karaibrahimoğlu, Asar and Genç (2016).

Karaibrahimoğlu et al. [26] suggested a new choice for ridge parameter by adjusting the form proposed by Dorugade [27]. Their suggested ridge parameter is given as,

(26)

2.1.9 Shabbir, Chand and Iqbal (2024).

Recently, Shabbir et al. [28] proposed a ridge parameter which was found superior than some other ridge parameters in simulation studies. The mentioned estimator is defined as,

(27)

2.2 Proposed ridge estimators

The core motivation behind our proposed ridge parameters is that they are data-driven, as they are function of the condition number which is considered a meaningful measure of severity of multicollinearity. So, the amount of shrinkage by our proposed ridge constants is related to the underlying level of multicollinearity in the data at hand. While many choices for ridge parameter exist in empirical literature, but most of them assume a globally fixed penalty term. The ridge estimators that take the level of multicollinearity in explanatory variables in the data into account remain relatively scarce. Further, proposed ridge estimators also take into account the sample size and number of explanatory variables which makes our proposal more contributory and meaningful. Therefore, we believe that our approach is a more problem-sensitive form of regularization, which provides new insights in adapting regularization strength in different data environments.

In current study, we refined the ridge estimator introduced by Karaibrahimoğlu et al. [26] by exponentiating it to the ratio of two extreme eigen values raised to the power and , respectively and multiplied it by the sample size . Specifically, the two proposed ridge estimators take the following forms,

(28)(29)

with usual definitions of , , and already provided in previous sections.

2.3 Simulation scheme

The performance of the proposed estimators, along with some existing ridge estimators, is assessed through Monte Carlo simulations under different scenarios. We compared the performance of different ridge estimators for different levels of multicollinearity (), number of explanatory variables (), error variance () and sample sizes ().

The simulation scheme is explained as under:

  1. Generate explanatory variables using multivariate normal distribution with mean vector as zero and covariance matrix that ensures required level of correlation (. For this we have used mvtnorm [29] add-on package in R-language [30].
    Repeat step-2 to step-4 for times (number of Monte Carlo replications).
  2. Generate random error term from normal distribution with mean 0 and variance .
  3. Compute values of response variable as: (30)
    for this we have set while slope coefficients are chosen such that .
  4. Estimate the parameters using OLS and RR estimators using procedures given in Equations (9) and (10).
  5. Finally, estimated mean square error is computed for each choice of ridge parameter as follow:
(31)

Estimated mean square error (EMSE) is commonly used performance indicator in studies focusing on comparison of estimation strategies [1,6,26,3134].

3. Results and discussion

The results from Monte Carlo simulations for various combinations of and for different sample sizes are presented in this section.

Table 1 shows the comparison of different ridge estimators in terms of EMSE for and with varying levels of multicollinearity and different sample sizes. The results indicate that our proposed ridge penalty performs better, as it has lower EMSE compared to other ridge penalties. The proposed ridge estimator () has the lowest value of EMSE for all sample sizes and for all levels of multicollinearity.

Similarly, results in Table 2 (for ) also show the superiority of , as it has a lower EMSE for all cases irrespective of the levels of multicollinearity and sample size. Results given in Table 3 (for ), also indicate that the proposed ridge estimator consistently performs better than existing ridge choices, as it has substantially lower value of EMSE in all cases considered with different levels of multicollinearity and sample size.

From results given in Tables 46, it is evident that proposed ridge estimator ( performs better than all competing ridge estimators, except for sample size 30 and and (only in case of . In fact, in such cases as the level of multicollinearity gets severer, the superiority of proposed estimator gets more profound. For and , the estimator based on median () proposed by Kibria (2003), has lower EMSE values.

It is worth to mention that in cases where performs better among all ridge estimators, our proposed estimator ( is second-best choice. Similarly, in all those cases where newly proposed ridge estimator found better than competing estimators in terms of EMSE, the Kibria’s estimator ( emerges as the second-best choice.

From all these results, we can infer some general findings as well. For instance, all ridge estimators perform better than OLS estimation in general linear model in cases of multicollinearity. Moreover, the EMSE decreases with increasing sample size. The EMSE gets larger for stronger levels of multicollinearity which suggests a possible adverse effect. The results given in Tables 16 are graphically represented in Figs 16.

thumbnail
Fig 1. Comparison of EMSE of ridge estimators for for different values of .

https://doi.org/10.1371/journal.pone.0335072.g001

thumbnail
Fig 2. Comparison of EMSE of ridge estimators for for different values of .

https://doi.org/10.1371/journal.pone.0335072.g002

thumbnail
Fig 3. Comparison of EMSE of ridge estimators for for different values of .

https://doi.org/10.1371/journal.pone.0335072.g003

thumbnail
Fig 4. Comparison of EMSE of ridge estimators for for different values of .

https://doi.org/10.1371/journal.pone.0335072.g004

thumbnail
Fig 5. Comparison of EMSE of ridge estimators for for different values of .

https://doi.org/10.1371/journal.pone.0335072.g005

thumbnail
Fig 6. Comparison of EMSE of ridge estimators for for different values of .

https://doi.org/10.1371/journal.pone.0335072.g006

4. Real data applications

In addition to the simulation study, the performance of proposed and existing ridge estimators has also been evaluated on two real-life multicollinear datasets. Owing to absence of repeated sampling, for real-life applications, mean square error is estimated using empirical formulas given in Equations 14 and 15. We have also compared the amount of shrinkage achieved by different ridge parameters in comparison to OLS estimation. The comparison in the amount of overall shrinkage among different ridge estimators is assessed through absolute shrinkage (AS) and relative shrinkage (RS) which are computed as:

(32)(33)

The performance of different ridge estimators is also evaluated in terms of the coverage percentage of prediction intervals. The coverage percentage of prediction intervals (CPPI) is computed as:

(34)

where

(35)(36)

where, and are estimates of the response and error variance based on a particular ridge parameter, is the common level of significance, while and have usual definitions. The description of data and results by employing different ridge estimators are given in the following sub-sections.

4.1 Bodyfat data

The first real-life application is based on body fat data [35]. The body fat data consist of features that can be used to produce a predictive model for body fat percentage from Siri’s equation [36]. The same data has been used in some other studies [3740] and it is available in add-an packages like, gRbase [41] and UsingR [42] in R-language [30]. We have taken data for individuals aged between 32–40 years. The reason for selecting this age group is that it has the most severe multicollinearity among explanatory variables, as measured by the condition number of their correlation matrix. The dataset consists of 50 observations with a response Y (body fat percentage) and 13 explanatory variables: X1 (Chest circumference in cm), X2 (Abdomen circumference in cm), X3 (Hip circumference in cm), X4 (Thigh circumference in cm), X5 (Knee circumference in cm), X6 (Ankle circumference in cm), X7 (Biceps circumference in cm), X8 (Forearm circumference in cm), X9 (Neck circumference in cm), X10 (Wrist circumference in cm), X11 (Weight in lbs), X12 (Height in inches) and X13 (Density determined from underwater weighing). The correlation matrix (Table 7), condition number (18.33), VIF (15.03, 28.84, 30.41, 15.81, 8.65, 1.74, 7.31, 2.67, 7.63, 5.25, 96.07, 4.72, 4.56) and Farrar and Glauber [43] statistic (933.26) clearly indicates that strong multicollinearity is present in the data.

The results by applying all competing ridge penalties are given in Table 8. These results show better performance of the proposed ridge parameter as it has the lower EMSE value (0.010) than all competing choices for ridge parameter. These results reveal that ridge estimator given by Khalaf et al. [15] is the second-best choice in terms of EMSE. The proposed ridge estimator ( achieves maximum amount of shrinkage as indicated by absolute and relative shrinkage measures. The coverage percentage of prediction intervals is almost similar for all estimators ranging from 94% to 96% coverage.

thumbnail
Table 8. Estimated MSE and parameter estimates for body fat data.

https://doi.org/10.1371/journal.pone.0335072.t008

4.2 Livestock data

The second real data comparison is performed using the livestock data [32] from economic survey of Pakistan which is a yearly report covering all sectors of economy including agriculture, industry, livestock etc. The livestock data consists of 18 observations and five explanatory variables for modelling the response variable Y (production of hair measured in tons). The explanatory variables (taken in millions) are, X1 (buffalos), X2 (cattle), X3 (goats), X4 (sheep), and X5 (poultry). There exists a strong multicollinearity in the data as indicated by correlation matrix among explanatory variables (Table 9), VIF values (17605.65, 9262.50, 4150.09, 3934.14, 721.66), condition number (359.97) and value of Farrar-Glauber statistic (356.26). The same data have been used by Dar et al. [1] for comparison among different ridge penalties.

The parameter estimates (in standardized form) and EMSE values obtained by applying all ridge constants being compared are given in Table 10. From these results, the better performance of the proposed ridge estimator () is evident as it has minimum EMSE (0.0019) compared to all existing ridge choices. Moreover, the is the second-best choice as it has least EMSE after . Similarly, the performance of both proposed ridge parameters achieved high amount of shrinkage in terms of absolute and relative shrinkage for livestock data. Comparison based on coverage percentage of prediction intervals clearly indicates superiority of the proposed ridge estimator ( as it shows 100% coverage.

thumbnail
Table 10. Estimated MSE and parameter estimates for livestock data.

https://doi.org/10.1371/journal.pone.0335072.t010

5. Conclusions

To tackle the problem of multicollinearity in the general linear model, the present study suggests two new ridge penalties to be used in ridge regression. The proposed ridge estimator is compared with some historical as well as recent existing ridge penalties. Based on EMSE as performance metric, the Monte Carlo simulation indicated the superiority of the proposed strategy for different sample sizes under various scenarios (accounting for level of multicollinearity, number of explanatory variables and error variance) in most cases. The proposed choices for ridge constant, gets more superior than competitors with the increasing severity of multicollinearity among regressors. The simulation results were further supported by two real-life applications where the proposed estimator showed better performance as well. The results from real data application indicate that proposed ridge estimators perform much better than existing choice in case of moderate as well as strong multicollinearity. Based on our results, we recommend use of the proposed ridge estimator to tackle with the problem of multicollinearity or ill-conditioned data through ridge regression approach. The findings of the study can help practitioners to handle the adverse effects of strong multicollinearity in a more effective way.

References

  1. 1. Dar IS, Chand S, Shabbir M, Kibria BMG. Condition-index based new ridge regression estimator for linear regression model with multicollinearity. Kuwait J Sci. 2023;50(2):91–6.
  2. 2. Lavery MR, Acharya P, Sivo SA, Xu L. Number of predictors and multicollinearity: what are their effects on error and bias in regression? Commun Stat - Simul Comput. 2017;48(1):27–38.
  3. 3. Gujarati DN, Porter DC. Basic econometrics. McGraw-Hill Irwin; 2009.
  4. 4. Montgomery DC, Peck EA, Vining GG. Introduction to linear regression analysis. 5th ed. John Wiley & Sons; 2013.
  5. 5. Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67.
  6. 6. Kibria BMG, Lukman AF. A new ridge-type estimator for the linear regression model: simulations and applications. Scientifica (Cairo). 2020;2020:9758378. pmid:32399315
  7. 7. Alkhamisi M, Khalaf G, Shukur G. Some modifications for choosing ridge parameters. Commun Stat - Theory Methods. 2006;35(11):2005–20.
  8. 8. De Mol C, De Vito E, Rosasco L. Elastic-net regularization in learning theory. J Complex. 2009;25(2):201–30.
  9. 9. Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc B. 2005.
  10. 10. Dawoud I, Abonazel MR. Robust Dawoud–Kibria estimator for handling multicollinearity and outliers in the linear regression model. J Stat Comput Simul. 2021;91:3678–92.
  11. 11. Dawoud I, Kibria BMG. A new biased estimator to combat the multicollinearity of the gaussian linear regression model. Stats. 2020;3(4):526–41.
  12. 12. Lawless J, Wang P. A simulation study of ridge and other regression estimators. Commun Stat - Theory Methods. 1976;5(4):307–23.
  13. 13. Hoerl AE, Kannard RW, Baldwin KF. Ridge regression:some simulations. Commun Stat. 1975;4(2):105–23.
  14. 14. Muniz G, Kibria BMG. On some ridge regression estimators: an empirical comparisons. Commun Stat - Simul Comput. 2009;38(3):621–30.
  15. 15. Khalaf G, Månsson K, Shukur G. Modified ridge regression estimators. Commun Stat - Theory Methods. 2013;42(8):1476–87.
  16. 16. Chand S, Ahmad S, Batool M. Solution path efficiency and oracle variable selection by Lasso-type methods. Chemom Intell Lab Syst. 2018;183:140–6.
  17. 17. Zou H, Zhang HH. On the adaptive elastic-net with a diverging number of parameters. Ann Stat. 2009;37(4):1733–51. pmid:20445770
  18. 18. Kejian L. A new class of blased estimate in linear regression. Commun Stat - Theory Methods. 1993;22(2):393–402.
  19. 19. Muniz G, Kibria BMG, Mansson K, Shukur G. On developing ridge regression parameters: a graphical investigation. Stat Oper Res Trans. 2012;36:115–38.
  20. 20. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc B. 1996.
  21. 21. Dawoud I, Abonazel MR, Awwad FA. Modified Liu estimator to address the multicollinearity problem in regression models: a new biased estimation class. Sci Afr. 2022;17:e01372.
  22. 22. Hoerl AE, Kannard RW, Baldwin KF. Ridge regression: some simulations. Commun Stat - Theory Methods. 1975;4:105–23.
  23. 23. Hocking RR, Speed FM, Lynn MJ. A class of biased estimators in linear regression. Technometrics. 1976;18(4):425–37.
  24. 24. Kibria BMG. Performance of some new ridge regression estimators. Commun Stat - Simul Comput. 2003;32(2):419–35.
  25. 25. Khalaf G, Shukur G. Choosing ridge parameter for regression problems. Commun Stat - Theory Methods. 2005;34(5):1177–82.
  26. 26. Karaibrahimoğlu A, Asar Y, Genç A. Some new modifications of Kibria’s and Dorugade’s methods: an application to Turkish GDP data. J Assoc Arab Univ Basic Appl Sci. 2016;20(1):89–99.
  27. 27. Dorugade AV. New ridge parameters for ridge regression. J Assoc Arab Univ Basic Appl Sci. 2014;15(1):94–9.
  28. 28. Shabbir M, Chand S, Iqbal F. Bagging-based ridge estimators for a linear regression model with non-normal and heteroscedastic errors. Commun Stat - Simul Comput. 2024;53:5442–52.
  29. 29. Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F. mvtnorm: Multivariate normal and t distributions. Vienna, Austria: R Foundation for Statistical Computing; 2021.
  30. 30. Team RC. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2022.
  31. 31. Shaheen N, Shah I, Almohaimeed A, Ali S, Alqifari HN. Some modified ridge estimators for handling the multicollinearity problem. Mathematics. 2023;11(11):2522.
  32. 32. Yasin S, Kamal S, Suhail M. Performance of some new ridge parameters in two-parameter ridge regression model. Iran J Sci Technol Trans A Sci. 2021;45:327–41.
  33. 33. Suhail M, Chand S, Aslam M. New quantile based ridge M-estimator for linear regression models with multicollinearity and outliers. Commun Stat - Simul Comput. 2021;52(4):1417–34.
  34. 34. Suhail M, Chand S, Kibria B. Quantile based estimation of biasing parameters in ridge regression model. Commun Stat - Simul Comput. 2020;49:2732–44.
  35. 35. Johnson RW. Fitting percentage of body fat to simple body measurements. 1996;4AD:1.
  36. 36. Siri WE. The gross composition of the body. Adv Biol Med Phys. 1956;4:239–80. pmid:13354513
  37. 37. Hussain SA, Cavus N, Sekeroglu B. Hybrid machine learning model for body fat percentage prediction based on support vector regression and emotional artificial neural networks. Appl Sci. 2021;11:9797.
  38. 38. Dar IS, Chand S, Shabbir M. An improved ridge-type estimator leveraging weighted least squares and horn’s scaling for heteroscedastic regression. Commun Stat - Theory Methods. 2025:1–20.
  39. 39. Chiong R, Fan Z, Hu Z, Chiong F. Using an improved relative error support vector machine for body fat prediction. Comput Methods Programs Biomed. 2021;198:105749. pmid:33080491
  40. 40. Wang X, Chang S. Enhancing body fat prediction with WGAN-GP data augmentation and XGBoost algorithm. Sci Prog. 2025;108(3):368504251366850. pmid:40770941
  41. 41. Dethlefsen C, Højsgaard S. A common platform for graphical models in R: the gRbase package. J Stat Softw. 2005;14:1–12.
  42. 42. Verzani J. UsingR: data sets, etc. for the text“ Using R for Introductory Statistics”. Vienna, Austria. R Foundation for Statistical Computing; 2015.
  43. 43. Farrar DE, Glauber RR. Multicollinearity in regression analysis: the problem revisited. Rev Econ Stat. 1967;49:92–107.