Reader Comments
Post a new comment on this article
Post Your Discussion Comment
Please follow our guidelines for comments and review our competing interests policy. Comments that do not conform to our guidelines will be promptly removed and the user account disabled. The following must be avoided:
- Remarks that could be interpreted as allegations of misconduct
- Unsupported assertions or statements
- Inflammatory or insulting language
Thank You!
Thank you for taking the time to flag this posting; we review flagged postings on a regular basis.
closeReferee comments: Referee 1
Posted by PLOS_ONE_Group on 15 Feb 2008 at 18:09 GMT
Referee 1's review:
Major comments:
Overall, I think this paper appears to be several different papers patched together to create one not very cohesive paper, which makes for a somewhat confusing read.
One main concern is that the link between co-adaptation, epistasis and population divergence is not clearly explained. At first, the authors state that two-SNP frequencies between unlinked markers that are under selection will deviate from their expected frequencies assuming independence (Introduction, 2nd paragraph): "The basic rational is that if a specific combination of alleles at different genes outperforms the alternative combination(s), than [sic] not all two-locus genotypes will be observed in their expected frequencies." The authors call this deviation from expectation epistasis in the 1st paragraph of the Introduction. I am assuming that the authors are suggesting that within a single population this will be true.
Later, in the 4th paragraph of the Introduction, the authors state that when trans-haplotypes (pairs of unlinked SNPs) are observed to not match frequencies expected assuming independence, then excess genetic differentiation is expected between different populations, and that this excess differentiation is epistasis, without clearly explaining how this definition of epistasis is related to the definition from the 2nd paragraph: "If selection favors one of these configurations [of trans-haplotypes] above the others, than [sic] excess genetic differentiation between three major human populations at the level of trans-haplotype frequencies is indicative of epistatic interactions between genes". They also do not relate either definition of epistasis to the phenotype of interest in humans, which is childhood anxiety/depression. It is not at all clear why selection for/against this phenotype should cause differentiation across the three main human populations studied or if we should assume that any sort of selection would act on this particular phenotype.
In the RIL LD Results section, the authors state "the real data consistently exhibits an excess of non-independent combinations" but do not show any formal statistical test to support this claim. In the same paragraph, the authors use vague wording ("Several percent of all pairs") which should be changed to the actual percent.
In the methods/results section for the RIL phenotype analysis, the authors should put the number of mice per interaction category for the best fitting model. They do not explain why they used the number of phenotypes for the Bonferroni correction vs. number of phenotypes*number of SNP interaction models tested. They also appear to be using the p-values for the interaction terms in the model as the statistical test for significant interaction, when the likelihood ratio test between nested models (one with main effects and interaction terms and one with main effects only) is more appropriate. I do not believe I read if they state if their reported p-values of 0.04 and 0.06 in the final model are adjusted for multiple testing.
The second main concern is that in the RIL LD analysis, the authors claim that "the most likely explanation of the data is that the specific combinations of SNPs that appear to be in strong LD in fact identify functionally interacting genes that have conveyed a selectively [sic] advantage during the process of inbreeding" but in the next section say that of the 707 SNP pairs that were in significant LD after Bonferroni correction, only 12 pairs contained SNPs that were "gene related SNPs" (e.g., coding, splice site or 5' or 3' UTRs). If the claim these 707 SNPs are identifying functionally interacting genes is correct, why are only 12 of the 707 pairs putatively functional? Is this number close to expected given the number of gene related vs. non-gene related SNPs tested in the RILs? I would expect a much higher proportion of the 707 pairs to be gene related under their definition, and the fact that only 12 of the 707 were gene related seems to not support their claim of "the most likely explanation" stated above so do not support the further analysis of phenotypes.
The selection of GPR156 for follow-up in humans (because it had "the highest number of reasonably strong interactions (r2 > 0.35)") seems arbitrary and this cutpoint for "reasonably strong interactions" is not explained anywhere in the text.
In the Results section for the Dutch twin study, the statistical test is called a population based association test (which would generally indicate a test for independent individuals such as a case-control analysis or a test of severity of phenotype among independent individuals with a disease) but in the methods section it appears that the authors are actually performing a family-based test (QTDT), which should be explained in more detail in the Results. Furthermore, they state that they tested for population stratification (which TDT in general are robust to) and that one SNP was "uninformative" for population stratification analysis, but uninformative generally describes a marker that is not usable in a TDT phenotype association analysis, so the terminology is confusing. Also, the authors claim that "standard software for family based for [sic] genetic association (such as QTDT) do not readily allow for testing the significance of gene-gene interactions" but they are incorrect. Heather Cordell has several papers (and David Clayton has implemented some software for STATA) that describe the use of a conditional logistic regression model (which can be implemented in any statistical software package) to explicitly test for gene-gene interactions in family-based data. The authors may want to try this method in the future. Again, they do not appear to have controlled for multiple testing here although they seem to do so for other analyses.
Finally, the authors should address the use of HapMap data and SNP ascertainment bias in the context of population genetics studies (Clark AG, Hubisz MJ, Bustmante CD, Williamson SH, Nielsen R. Ascertainment bias in studies of human genome-wide polymorphism. Genome Res 2005; 15: 1496-502).
Minor comments:
There are various typographic errors in the text, for example, "than" is used in place of "then" several times and there are misspelled words. The authors should carefully edit the text to fix these errors.
**********
N.B. These are the comments made by the referee when reviewing an earlier version of this paper. Prior to publication the manuscript has been revised in light of these comments and to address other editorial requirements.