Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

How Criterion Scores Predict the Overall Impact Score and Funding Outcomes for National Institutes of Health Peer-Reviewed Applications

  • Matthew K. Eblen ,

    matteblen@gmail.com

    Current address: Office of Public Health Scientific Services, Centers for Disease Control, Atlanta, Georgia, United States of America

    Affiliation Office of Extramural Research, National Institutes of Health, Bethesda, Maryland, United States of America

  • Robin M. Wagner,

    Current address: Office of Public Health Scientific Services, Centers for Disease Control, Atlanta, Georgia, United States of America

    Affiliation Office of Extramural Research, National Institutes of Health, Bethesda, Maryland, United States of America

  • Deepshikha RoyChowdhury,

    Affiliation Office of Extramural Research, National Institutes of Health, Bethesda, Maryland, United States of America

  • Katherine C. Patel,

    Affiliation Office of Extramural Research, National Institutes of Health, Bethesda, Maryland, United States of America

  • Katrina Pearson

    Affiliation Office of Extramural Research, National Institutes of Health, Bethesda, Maryland, United States of America

Abstract

Understanding the factors associated with successful funding outcomes of research project grant (R01) applications is critical for the biomedical research community. R01 applications are evaluated through the National Institutes of Health (NIH) peer review system, where peer reviewers are asked to evaluate and assign scores to five research criteria when assessing an application’s scientific and technical merit. This study examined the relationship of the five research criterion scores to the Overall Impact score and the likelihood of being funded for over 123,700 competing R01 applications for fiscal years 2010 through 2013. The relationships of other application and applicant characteristics, including demographics, to scoring and funding outcomes were studied as well. The analyses showed that the Approach and, to a lesser extent, the Significance criterion scores were the main predictors of an R01 application’s Overall Impact score and its likelihood of being funded. Applicants might consider these findings when submitting future R01 applications to NIH.

Introduction

The National Institutes of Health (NIH) is the world's leading biomedical and behavioral research organization and spends about three-quarters of its nearly $30.1 billion budget on extramural grant research funding to support research in universities, medical schools and research institutions [1]. Peer review is the cornerstone of the NIH’s extramural research program. Applications for research funding from NIH’s extramural research program are vetted through the peer review process [2]. Over the years, the NIH has made periodic efforts to improve its peer review system to ensure fairness and efficiency in evaluating grant applications. The most recent effort began in June of 2007 [3]. The enhancements to the NIH peer review system were implemented, in phases, beginning in 2009 [4]. The key modifications included changes to the grant application review criteria, quantitative scoring of five distinct review criteria (criterion scores), implementation of a new 1–9 point scoring system for both the review criteria and the application as a whole (the “Overall Impact” score), and the clustering of applications for the peer review of new and early stage investigator (ESI) applications for R01s, NIH’s major research grant activity code (see Career Stage of Investigators definition in Table 1). Also, as part of this enhancement, the NIH committed itself to continuous monitoring and evaluation of the peer review system.

thumbnail
Table 1. Summary Statistics for R01-Equivalent Applications, FY 2010–2013.

https://doi.org/10.1371/journal.pone.0155060.t001

NIH peer review is a two-stage process. In the first level of review, research grant applications are evaluated for scientific and technical merit by a Scientific Review Group (SRG), also known as a study section, comprised primarily of non-federal scientists with expertise in relevant scientific disciplines and current research areas. Reviewers from the SRG consider five criteria when assessing an application’s scientific and technical merit. The criteria for research grants are Significance, Investigator(s), Innovation, Approach, and Environment. Additional criteria, such as whether an application involves human or animal subjects, or is a renewal, revision or resubmission, are considered when applicable (see http://grants.nih.gov/grants/peer_review_process.htm for a full description of the criteria). The more meritorious applications are discussed in full at SRG meetings where a final Overall Impact score is assigned by each reviewer. The final Overall Impact score of each discussed application is the mean of all eligible reviewers’ Impact scores times 10. Thus, the final Overall Impact scores range from 10 (high impact) through 90 (low impact). Applications that are not discussed (ND) do not receive a final numerical Overall Impact score.

The second level of peer review is performed by Advisory Councils/Boards for each NIH Institute and Center (IC). This second level of review assesses the relevance of the application’s proposed research to the IC’s programs and priorities, resulting in recommendations for funding. Based on these recommendations, as well as input from NIH program staff and considering the mission and goals of their respective IC, the IC directors make the final funding decisions.

The introduction of quantitative scores for the five research review criteria, beginning in fiscal year (FY) 2010, enabled the examination of the relationship of these criteria to first level peer review outcomes, i.e., the Overall Impact score, and to the likelihood of being funded.

Previous studies of the scientific research peer review process at NIH and other funding agencies have evaluated how the characteristics of peer reviewers, the peer review process, grant applicants and their institutions, and research topics are associated with peer review outcomes [515]. Lindner et al. examined how the variation in Overall Impact scores was explained by the criterion scores and concluded that all the criteria were important contributors to the Overall Impact score [15]. What distinguishes this work from earlier studies is that multivariate techniques were used to estimate the magnitude of the relationship between each individual criterion score and the Overall Impact score. Furthermore, the analysis was broadened to include the relationship between the criterion scores and funding outcomes. This study also measured the degree to which additional factors, including the application’s administrative characteristics, the demographics of the applicant, and characteristics of the applicant’s institution, were associated with peer review and funding outcomes after adjusting for application-specific ratings of scientific and technical merit, as embodied in the criterion scores.

Methods

Data from 123,707 competing R01-equivalent applications (R01s and R37s) submitted to NIH during fiscal year (FY) 2010 to FY 2013 and peer reviewed were included in the current analysis. These data were extracted from the Information for Management, Planning, Analysis, and Coordination II (IMPACII), the database of record for information collected from NIH extramural grant applications, awards and applicants during the receipt, review and award management process. For each application, data were obtained on whether the application was funded, its final Overall Impact score, and its five research criterion scores, which were delinked from the reviewers providing the scores. The research criterion scores were calculated for each criterion by averaging all individual criterion scores available for a particular application. In addition, data were extracted on other characteristics related to the application (such as whether it was a new or renewal application), the applicant (such as applicant demographics and personal NIH funding history) and the applicant’s institution (such as the institutional funding history with NIH). All demographic data were self-reported, on a voluntary basis, by the applicants. Data on the SRG where the application was reviewed were also obtained. See Table 1 for a full list of variables evaluated for each application. Descriptive summary statistics, as well as correlations between the five criterion scores and the Overall Impact score were produced.

Models

Two general models were developed: 1) the Impact model, a linear regression model with the Overall Impact score serving as the dependent variable; and 2) the Funding model, a logistic regression model with the likelihood of being funded serving as the dependent variable. The five research criteria were used as the main predictors in both models, controlling for other application and applicant characteristics delineated in Table 1. Both models controlled for the FY of the application to account for changes in the distribution of Overall Impact scores or funding patterns over time. Hierarchical random effects models, with applications clustered by SRG, were employed to account for possible differences in scoring behavior and funding outcomes between peer review groups. In addition to controlling for the potential clustering of scores by SRG, the use of random effects, by way of intraclass correlations, allowed for the decomposition of the total variation in the models into two categories: within-SRG variation and between-SRG variation [1618].

Three sub-models were developed in a step-wise fashion to assess the marginal contribution of each set of characteristics in both general models. Sub-model A focused on the five research criterion scores, including any significant interactions between them. Sub-model B added the other control variables to sub-model A. Sub-model C was identical to sub-model B, but removed the criterion scores. Sub-model C served to illustrate how the various application and applicant characteristics appeared to be associated with the Impact score and relative odds of funding when the quality of the application, as measured by the criterion scores, was not taken into account.

Because the ND applications are not assigned Overall Impact scores, only the 71,651 applications that were discussed in SRG meetings and assigned Overall Impact scores from FY 2010 to FY 2013, were used to fit the Impact model. ND applications were not removed from the Funding model because their funding outcomes were known, and data on the five research criterion scores were still available. However, applications precluded from being considered for funding were removed, i.e., those with unresolved human subject or animal concerns and resubmitted applications that had a previous version funded. Removing these applications left 111,533 R01-equivalent applications for the Funding model.

Data analyses were performed using Stata 13 (StataCorp). Model estimates and their 95% confidence intervals (CIs) were computed. The Funding model results were expressed as odds ratios. For ease of interpretation, the coefficients of the criterion score estimates were inverted in the Funding model, so that odds ratios greater than unity should be interpreted as the magnitude of the increase in odds of funding due to a one unit decrease (improvement) of the given criterion. Results were considered statistically significant if they had a P-value of less than 0.05, using 2-sided testing.

The NIH Office of Human Subjects Research Protections was consulted and determined this work to be classified as a program evaluation that did not require human subjects research review by an Institutional Review Board.

Results

Fig 1 shows the distribution of the Overall Impact score and criterion scores in the form of boxplots. The criterion scores for Approach had the greatest variability and highest (or worst) scores, with an interquartile range (IQR) of 2.0 and median of 4.3. The criterion scores for Significance and Innovation both had IQRs of 1.2 and medians of 3.0. Investigator(s) and Environment criterion scores were clustered in the low score ranges with median scores of 2.0 and IQRs of 1.0, indicating that most applications received excellent marks for Investigator and Environment. Table 2 provides the correlations between the criterion scores for each of the five research criteria and the Overall Impact score. All criteria had moderate to high correlations with one another, ranging from 0.55 between Significance and Environment to 0.75 between Investigator(s) and Environment. Environment had the lowest correlation with the Overall Impact score, whereas Approach had the highest correlation with the Overall Impact score (0.44 and 0.84, respectively).

thumbnail
Fig 1. Box Plot Distributions of Criterion and Overall Impact Scores for R01 Applications, FY 2010–2013.

Fig 1 shows the box plot distributions of the five research criterion scores (scale: 1–9) and the Overall Impact score (scale: 10–90). Box plot whiskers extend to the most extreme data point which is no more than 1.5 times the interquartile range from the box. Each criterion score N = 123,707 applications; Overall Impact score N = 71,651 applications.

https://doi.org/10.1371/journal.pone.0155060.g001

thumbnail
Table 2. Pearson Correlation Matrix of the 5 Research Criteria and Overall Impact Scores.

https://doi.org/10.1371/journal.pone.0155060.t002

Table 1 shows that the average Overall Impact scores and funding rates varied widely according to different application characteristics. For example, new (type 1) applications had an average Overall Impact score of 37.1 and funding rate of 14.2% while renewal (type 2) applications fared better, with an average Overall Impact score of 30.9 and funding rate of 30.1%. Initial submissions (A0s) had an average Overall Impact score of 38.1 and funding rate of 11.2%, whereas resubmissions (A1s) had a more favorable average Overall Impact score and funding rate (31.7 and 30.6%, respectively). Applications from Early Stage Investigators (ESIs) had an average Overall Impact score of 38.4 and a 17.6% funding rate, whereas applications from experienced investigators had a better average Overall Impact score and funding rate (33.9 and 18.8%, respectively). Applications submitted by white principal investigators (PIs) had an average Overall Impact score of 34.8 and a funding rate of 19.0%; in contrast, applications submitted by black PIs had poorer outcomes (average Impact score: 38.1; funding rate: 11.8%). Male PIs had Overall Impact scores and funding rates of 35.3 and 17.9%, respectively, whereas female PIs had corresponding worse scores and funding rates of 36.2 and 16.4%, respectively.

Fig 2 shows boxplot distributions of the Overall Impact score by IC, with IC names masked. Median scores ranged considerably by IC, from 33 to 50.5. IQRs ranged from 15 to 22 across ICs. Fig 3 shows the percentage of reviewed applications that were funded by each IC. This rate ranged widely from 7.1% to 28.9%. The rank order of the Overall Impact scores and funding rates by ICs, shown in Figs 2 and 3, respectively, do not match as might be expected: ICs that had better (lower) ranges of Overall Impact scores did not necessarily have higher funding levels. This is due, in part, to differences in the number of applications received and available grant funding dollars between the different ICs, and demonstrates the importance of controlling for IC, particularly in the Funding model.

thumbnail
Fig 2. Box Plot Distributions of Overall Impact Scores for R01 Applications by IC, FY 2010–2013.

Fig 2 shows the box plot distributions of the Overall Impact score (scale: 10–90) by IC. Box plot whiskers extend to the most extreme data point which is no more than 1.5 times the interquartile range from the box. IC names have been masked. N = 71,651 applications (discussed applications only).

https://doi.org/10.1371/journal.pone.0155060.g002

thumbnail
Fig 3. Distributions of Funding Rate for R01 Applications by IC, FY 2010–2013.

Fig 3 shows the distribution of the percentage of reviewed applications funded by each IC. IC names have been masked and have been labeled to agree with Fig 3, i.e., the IC labeled as “1” in Fig 2 is the same IC labeled as “1” in Fig 2. N = 123,707.

https://doi.org/10.1371/journal.pone.0155060.g003

S1 and S2 Tables are similar to Table 1, except that they show summary statistics for discussed and ND applications, respectively. In comparing the two tables, ND applications had worse (higher) mean criterion scores for all five research criteria, compared to discussed applications. Furthermore, the Approach criterion had the worst mean scores for both discussed and ND applications. Among discussed applications, the Approach criterion was more variable, with a higher standard deviation than the other criterion scores, underscoring the former criterion’s importance in predicting the Overall Impact score amongst discussed applications. In contrast to discussed applications, which had an overall 29.8% funding rate over the study period, ND applications had almost no chance of being funded (only one ND application was funded in FY 2010–2013).

The Impact model and Funding model results are shown in Tables 3 and 4, separated by sub-model. In sub-model A, with independent variables limited to the criterion scores, all were highly significant in the Impact model, with the coefficients in rank order for Approach, Significance, Innovation, Investigator(s) and Environment estimated at 7.6 (95% CI, 7.5–7.7), 3.4 (3.3–3.5), 1.4 (1.3–1.5), 1.0 (0.9–1.0) and -0.2 (-0.3–-0.1), respectively. That is, a one point improvement in the Approach score was associated with a 7.6 point improvement in the Overall Impact score, controlling for the other criterion scores. The Funding model results for sub-model A had coefficients in the same rank order, with odds ratio estimates of 6.2 (5.9–6.5), 2.1 (2.0–2.2), 1.5 (1.4–1.6), 1.0 (1.0–1.1) and 0.9 (0.8–0.9), respectively, e.g., for every one point improvement in the Approach score, the odds of funding increased by a factor of 6.2. There was a highly significant interaction between Approach and Significance in both the Impact and Funding models; applications that had good scores on both criteria had better than expected outcomes than would be predicted by their independent effects. Sub-model A explained 74.8% of the variation in Overall Impact scores. This result is similar to the Lindner et al. (15) figure of 77.7%. Sub-model A also correctly predicted the funding outcomes of 66.0% of funded applications and 94.7% of unfunded applications, for an overall correct prediction rate of 89.3%. The intraclass correlation coefficient, which measures the amount of variation accounted for by SRGs, was 4.2% in the Impact model and 17.8% in the Funding model; i.e., an application’s criterion scores were much better indicators of its review and funding outcomes than the SRG in which it was reviewed.

thumbnail
Table 3. Impact Score Modela Results for R01-Equivalent Applications, FY 2010–2013.

https://doi.org/10.1371/journal.pone.0155060.t003

thumbnail
Table 4. Funding Modela Results for R01-Equivalent Applications, FY 2010–2013.

https://doi.org/10.1371/journal.pone.0155060.t004

In sub-model B, which adds the full set of application and applicant controls to sub-model A, the coefficients of the criterion scores were largely unchanged. For the Funding model, the only major departure from sub-model A was that the Investigator(s) odds ratio coefficient increased to 1.4 (1.3–1.5), showing that applications with better Investigator(s) criterion scores were associated with better odds of funding once the other application and applicant characteristics were taken into account. Many of the application control factors had statistically significant relationships to the Overall Impact score and odds of funding. Of note, renewal applications were predicted to have Overall Impact scores 0.7 (-0.8–-0.6) points lower (better) than otherwise identical new applications and their odds of funding were predicted to be 1.4 (1.3–1.5) times better. First resubmission applications (A1s) were predicted to have Overall Impact scores 1.3 (-1.5–-1.2) points lower and odds of funding 2.2 (2.1–2.3) times greater than otherwise identical initial submissions (A0s). Applications submitted by ESIs were predicted to have Overall Impact scores 1.2 (-1.5–-0.8) points lower and odds of funding 2.6 (2.2–3.1) times greater than otherwise identical applications from experienced investigators. Applications submitted by black PIs had Overall Impact scores 0.6 (0.1–1.1) points higher or worse than applications submitted by white PIs with the same measured characteristics, though there was no statistically significant difference in odds of funding. Applications submitted by female PIs had slightly better Overall Impact scores (0.2 [-0.3–-0.1] points lower) than those submitted by male PIs, but the odds of funding were not statistically different, all else equal. See Tables 3 and 4 for the full set of control variables. Sub-model B improved the model fit and predictive accuracy of sub-model A by a very small amount, approximately one percentage point in each case.

Differences amongst subgroups in the application and applicant control variables increased substantially in sub-model C, which omits the criterion scores from the full model, sub-model B. Renewal applications were predicted to have Overall Impact scores 3.5 (-3.7–-3.3) points lower and odds of funding 2.2 (2.1–2.3) times greater than new ones. First resubmission applications were predicted to have Overall Impact scores 5.6 (-5.8–-5.4) points lower and odds of funding 3.7 (3.6–3.8) times greater than initial submissions. In contrast to sub-model B, applications submitted by ESI’s were predicted to have Overall Impact scores 1.3 (0.7–1.9) points higher or worse than experienced applications and their funding advantage was reduced to an odds ratio of 1.5 (1.4–1.7). Therefore, the ESI advantage in Overall Impact scores and funding odds was observed only after controlling for the criterion scores. Applications submitted by black PIs and female PIs appeared less likely to be funded, with the odds ratios of black PIs and female PIs falling to 0.7 (0.6–0.8) and 0.9 (0.9–0.9), respectively, and becoming statistically significant in absence of the criterion scores. The amount of variation explained by sub-model C was low (R2 = 16.9%) and the overall correct prediction rate was lower, 80.7% (only 9.6% for funded applications and 97.7% for unfunded applications).

Discussion

The Impact and Funding model results demonstrate that the criterion scores are the best predictors of an application’s Overall Impact score and its likelihood of receiving funding. The model fit statistics support this observation. The R2, or variation explained, and correct prediction rate only improved by one percentage point when going from models which included only the criterion scores, to those which included all the other application and applicant control factors. Furthermore, when the criterion scores were removed from the full model, the variation explained and correct prediction rate fell off markedly, and the control variables increased in magnitude and many became statistically significant. Among the criterion scores, there was a clear hierarchy in terms of each criterion’s relationship with the Overall Impact score and funding odds. In both the Impact model (which contained only discussed applications) and the Funding model (which contained both discussed and non-discussed applications), the Approach score had the strongest association, with more than double the effect of the next largest predictor, the Significance score. The predictive effect of the Environment score was very small and went in a counterintuitive direction, with better Environment scores having worse Overall Impact scores and funding odds, all else equal. This finding suggests that some applications with poor Overall Impact scores can be associated with strong Environment scores, even after controlling for the other criterion scores. Furthermore, in another set of models (not shown here) where whether an application was discussed or not served as the dependent variable, the criterion score coefficients followed the same rank order, with Approach being by far the largest predictor of whether or not an application was discussed.

The criterion scores were moderately to strongly correlated with one another. This is because highly meritorious applications tended to score well on all five criteria, and vice versa for less meritorious applications. As in Lindner et al. [15], these relatively high correlations raised concerns of multicollinearity (MC). MC does not cause bias when estimating coefficients in a correctly specified model, but it can increase the variability of the estimates [19]. This problem was mitigated by the large number of applications in the model [20], which decreased the variance inflation factor (VIF) of each research criterion. VIF measures how much the variance of an estimated regression coefficient is increased because of collinearity with the other independent variables. The literature on MC typically points to VIF scores of more than 4 as potential signs of multicollinearity problems, though this is only a rule of thumb [21]. No VIF score for the criterion scores was above 2.2 in any of the models.

The summary statistics revealed relatively large differences in Overall Impact scores and funding outcomes between applications with different characteristics, such as the difference between funding rates for new and renewal applications. Sub-model C, which controlled for different application characteristics simultaneously, still exhibited these large differences. However, the multivariate models which took into account the application’s criterion scores explained many of the apparent differences in outcomes among different sorts of applications. One notable exception is the fact that ESI applications (and to a lesser extent other applications submitted by New Investigators) had a small advantage in the Impact model and a large advantage in the Funding model. This finding is reflective of NIH policy which strives to support new investigators on new R01-equivalent awards at success rates comparable to that of established investigators submitting new applications.

Consistent with the findings of Ginther et al. [11], the present study found large differences in NIH R01 funding rates by race in the absence of the measured influence of criterion scores. Criterion scores were introduced in FY 2010, and thus were not available for the applications evaluated by Ginther. Differences in outcomes by gender were also discovered in the summary data of the present study. These demographic differences diminished or disappeared once the criterion scores were included in the full models. However, bias cannot be ruled out, particularly in the first stage of peer review, where small but statistically significant differences remain in the Impact model. To ensure fairness, NIH is undertaking an extensive review of potential bias in the peer review system (see http://acd.od.nih.gov/prsub.htm). In contrast to the Impact model, the Funding model showed almost no differences in funding outcomes by demographics once all the measured characteristics of the application were taken into account.

Conclusion

The research criterion scores, specifically the Approach and, to a lesser extent, the Significance score, are the most important predictors of an R01 application’s Overall Impact score and its likelihood of being funded. Other factors, such as the New Investigator status of the application, are associated, particularly with funding outcomes. But the model results show that the quality of the application, as measured by the criterion scores, is the best predictor of an application’s eventual success. Applicants might consider these findings when submitting future R01 applications to NIH.

Supporting Information

S1 Fig. Box Plot Distributions of Criterion Scores for Discussed R01 Applications, FY 2010–2013.

S1 Fig shows the box plot distributions of the five research criterion scores (scale: 1–9) for discussed applications. Box plot whiskers extend to the most extreme data point which is no more than 1.5 times the interquartile range from the box. N = 71,651 applications.

https://doi.org/10.1371/journal.pone.0155060.s001

(TIFF)

S2 Fig. Box Plot Distributions of Criterion Scores for Non-Discussed R01 Applications, FY 2010–2013.

S2 Fig shows the box plot distributions of the five research criterion scores (scale: 1–9) for non-discussed applications. Box plot whiskers extend to the most extreme data point which is no more than 1.5 times the interquartile range from the box. N = 52,056 applications.

https://doi.org/10.1371/journal.pone.0155060.s002

(TIFF)

S1 File. Impact Model Public Use Data Set.

S1 File contains data on the main variables discussed at length in this paper for the 123,707 R01 applications that NIH received between FY 2010 and FY 2013.

https://doi.org/10.1371/journal.pone.0155060.s003

(XLSX)

S1 Table. Summary Statistics for Discussed R01-Equivalent Applications, FY 2010–2013.

https://doi.org/10.1371/journal.pone.0155060.s004

(DOCX)

S2 Table. Summary Statistics for Non-Discussed R01-Equivalent Applications, FY 2010–2013.

https://doi.org/10.1371/journal.pone.0155060.s005

(DOCX)

Acknowledgments

The authors would like to thank NIH staff, Sally J. Rockey, PhD; Della M. Hann, PhD; Luci Roberts, PhD; Nicole J. Garbarini, PhD; Sally A. Amero, PhD; James Onken, PhD; and Richard Ikeda, PhD for their thoughtful reviews and comments on the manuscript.

Author Contributions

Conceived and designed the experiments: RW ME. Analyzed the data: ME RW DR KCP KP. Wrote the paper: ME RW DR KCP KP.

References

  1. 1. NIH Budget History, NIH Extramural & Intramural Funding: FY 2014 Enacted 2014 [cited 2015]. Available from: http://report.nih.gov/NIHDatabook/Charts/Default.aspx?showm=Y&chartId=283&catId=1.
  2. 2. NIH Peer Review: Grants and Cooperative Agreements. 2013. Available from: http://grants.nih.gov/grants/PeerReview22713webv2.pdf.
  3. 3. Enhancing Peer Review at NIH 2011. Available from: http://enhancing-peer-review.nih.gov/index.html.
  4. 4. Enhancing Peer Review: The NIH Announces Updated Implementation Timeline NIH Guide2008 [cited 2014 November 20]. Available from: http://grants.nih.gov/grants/guide/notice-files/NOT-OD-09-023.html.
  5. 5. Graves N, Barnett AG, Clarke P. Funding grant proposals for scientific research: retrospective analysis of scores by members of grant review panel. British Medical Journal. 2011;343.
  6. 6. Viner N, Powell P, Green R. Institutionalized biases in the award of research grants: a preliminary analysis revisiting the principle of accumulative advantage. Research Policy. 2004;33:443–54.
  7. 7. Jayasinghe UW, Marsh HW, Bond N. Peer Review in the Funding of Research in Higher Education: The Australian Experience. Educational Evaluation and Policy Analysis. 2001;23:343–64.
  8. 8. Jayasinghe UW, Marsh HW, Bond N. A multilevel cross-classified modelling approach to peer review of grant proposals: the effects of assessor and researcher attributes on assessor ratings. Journal of the Royal Statistical Society: Series A (Statistics in Society). 2003;166(3):22. Epub 9/4/2003.
  9. 9. Marsh HW, Jayasinghe UW, Bond NW. Improving the Peer-Review Process for Grant Applications Reliability, Validity, Bias, and Generalizability. American Psychologist. 2008;63(3):160–8. pmid:18377106
  10. 10. Marsh HW, Bornmann L, Mutz R, Daniel H- D, O'Mara A. Gender Effects in the Peer Reviews of Grant Proposals: A Comprehensive Meta-Analysis Comparing Traditional and Multilevel Approaches. Review of Educational Research. 2009;79(3):1290–326.
  11. 11. Ginther DK, Schaffer WT, Schnell J, Masimore B, Liu F, Haak LL, et al. Race, ethnicity, and NIH research awards. Science. 2011;333(6045):1015–9. pmid:21852498; PubMed Central PMCID: PMCPMC3412416.
  12. 12. Martin MR, Kopstein A, Janice JM. An Analysis of Preliminary and Post-Discussion Priority Scores for Grant Applications Peer Reviewed by the Center for Scientific Review at the NIH. PLoS ONE. 2010;5(11).
  13. 13. Kotchen TA, Lindquist T, Malik K, Ehrenfeld E. NIH peer review of grant applications for clinical research. JAMA. 2004;291(7):836–43. pmid:14970062.
  14. 14. Kotchen TA, Lindquist T, Miller Sostek A, Hoffmann R, Malik K, Stanfield B. Outcomes of National Institutes of Health peer review of clinical grant applications. J Investig Med. 2006;54(1):13–9. pmid:16409886.
  15. 15. Lindner MD, Vancea a, Chen M-C, Chacko G. NIH Peer Review: Scored Review Criteria and Overall Impact. Am J Eval. 2015:1–12. Epub April 29, 2015.
  16. 16. Garson GD. Hierarchical Linear Modeling: Guide and Applications. SAGE Publications, Inc.2012. 392 p.
  17. 17. Albright JJ. Estimating Multilevel Models using SPSS, Stata, and SAS. 2007.
  18. 18. Rabe-Hesketh S, Skrondal A. Multilevel and Longitudinal Modeling Using Stata. Second ed: Stata Press; 2008. 562 p.
  19. 19. Wooldridge J. Introductory Econometrics. 3rd ed: Cengage Learning; 2006. 912 p.
  20. 20. Goldberger AS. A Course in Econometrics. Cambridge, Massachusetts: Harvard University Press; 1991. 432 p.
  21. 21. O'brien RM. A Caution Regarding Rules of Thumb for Variance Inflation Factors. Quality & Quantity: International Journal of Methodology. 2007;41(5):17.