Reader Comments
Post a new comment on this article
Post Your Discussion Comment
Please follow our guidelines for comments and review our competing interests policy. Comments that do not conform to our guidelines will be promptly removed and the user account disabled. The following must be avoided:
- Remarks that could be interpreted as allegations of misconduct
- Unsupported assertions or statements
- Inflammatory or insulting language
Thank You!
Thank you for taking the time to flag this posting; we review flagged postings on a regular basis.
closeA methodological flaw and two errors
Posted by LeNormand on 10 Sep 2012 at 12:14 GMT
Your paper has a serious flow that results in two significant sources of error.
While using a linear regression is a fine way to show some linear correlation, it is done under a few basic assumptions on the result, and if those assumptions happen not to be met, the whole analysis is invalidated (which is unfortunately what happens here).
Those assumptions are:
• Normality of the errors.
• Independence of the errors distribution vs prediction.
• Linearity of the relationships.
While your paper don’t give a lot of data to judge on the third, the graph you attached (Figure 1) is enough to reject your linear regression on the basis of the first two:
• The errors are very obviously not normal, as they show much less variation on the “right” than on the “left”, with a worst left hand variation of -0.8 sigma. (the likelihood of having 67 point and none below -1 sigma is less than 0.001%).
• The errors are not independent from prediction: between -10% and +10% of your indicator (belief… - belief …) the standard deviation is of approximately 0.3 sigma (that is 0.3 the deviation of the lot, and I would judge about one fifth of the variation of the subset with an indicator of more than +10%. (this issue is named heteroscedasticity and it’s a classical reason for invalidating the result of any sort of regressions – linear or not, with assumed Gaussian errors or not).
Those two errors are each enough (both on quality and in quantity, when we see they are so large as to be eye-spotted simply on a single graph of a very small subset of data of your analysis (it may well be as well that the part of the linear regression that implies the Z-spread is just as flawed) that they completely invalidate the results.
From a qualitative point of view, you seem to be comparing apple and pears, namely two subset, one comprising mostly those with a score of less than 10% and one comprising most of the rest (and presumably a few of the smaller scores).
The first sample has a very low variation in your predicted Z-score while the other one has a huge one.
It might well be that this constitutes a result (admittedly a very small one) that you can discriminate #### types of countries (your work to understand what ### stands for, not mine) by such a criterion as you have shown here.
It might even well be that said ### is a good predictor of your indicator and that the result you think you have found (although the statistical analysis is invalid and you a jumping very fast from correlation to causality) is a cause of it.
Regards,
(hope the tone of this message is more academic)
RE: A methodological flaw and two errors
azimshariff replied to LeNormand on 10 Sep 2012 at 18:41 GMT
The following is a response to a previous comment by the above commenter that had been removed by PLOS ONE staff. We will update with a revised comment in due time.
Non-normality of residuals can, indeed, affect the outcome of a regression analysis. In particular, when the assumption of normally distributed residuals is not met, the estimated standard errors may be too large, leading to a higher-than-nominal rate of Type-I error (that is, a p-value less than .05 may occur in more than 5% of samples taken from the null hypothesis population, where no effect exists and residuals are non-normally distributed).
To examine whether non-normality affected our results, we re-ran the analyses using “robust” standard errors. (Specifically, we used the “MLR” estimator in Mplus, which uses the Satorra-Bentler formula to correct standard errors based on the degree of non-normality in the data; Satorra & Bentler, 1994). The resulting pattern of significant findings was exactly the same as those we reported in the original paper. Thus, we can safely conclude that non-normality, which can cause errors of interpretation in regression, did not do so here.
Contrary to LeNormand’s comment “In a word, your work is completely invalid (and the results are most likely an artifact)” (which is actually 16 words, not one), the reported results are not artifacts of non-normality.
-Azim Shariff and Mijke Rhemtulla
Reference:
Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C. C. Clogg (Eds.), Latent variables analysis: Applications for developmental research (pp. 399–419). Thousand Oaks, CA: Sage.
RE: RE: A methodological flaw and two errors
LeNormand replied to azimshariff on 11 Sep 2012 at 08:35 GMT
So that you can inform us in your answer, have you checked after using MLR that the residues follow a Chi-squared law ?
It’s no more better to be assuming a Chi-square than a normal law for the purpose of regression if the residues don’t follow said law. (and on first look they don’t quite look like a Chi-squared either at least when regressing on the assumption or normality).
I guess it doesn’t make a difference on heteroscedasticity ?
Regards