Peer review of health research funding proposals: A systematic map and systematic review of innovations for effectiveness and efficiency

Objective To investigate methods and processes for timely, efficient and good quality peer review of research funding proposals in health. Methods A two-stage evidence synthesis: (1) a systematic map to describe the key characteristics of the evidence base, followed by (2) a systematic review of the studies stakeholders prioritised as relevant from the map on the effectiveness and efficiency of peer review ‘innovations’. Standard processes included literature searching, duplicate inclusion criteria screening, study keyword coding, data extraction, critical appraisal and study synthesis. Results A total of 83 studies from 15 countries were included in the systematic map. The evidence base is diverse, investigating many aspects of the systems for, and processes of, peer review. The systematic review included eight studies from Australia, Canada, and the USA, evaluating a broad range of peer review innovations. These studies showed that simplifying the process by shortening proposal forms, using smaller reviewer panels, or expediting processes can speed up the review process and reduce costs, but this might come at the expense of peer review quality, a key aspect that has not been assessed. Virtual peer review using videoconferencing or teleconferencing appears promising for reducing costs by avoiding the need for reviewers to travel, but again any consequences for quality have not been adequately assessed. Conclusions There is increasing international research activity into the peer review of health research funding. The studies reviewed had methodological limitations and variable generalisability to research funders. Given these limitations it is not currently possible to recommend immediate implementation of these innovations. However, many appear promising based on existing evidence, and could be adapted as necessary by funders and evaluated. Where feasible, experimental evaluation, including randomised controlled trials, should be conducted, evaluating impact on effectiveness, efficiency and quality.


Introduction
Peer review is a key element of quality assurance in academic research. [1] It is used to reassure research funders that research proposals are of the highest scientific merit and that funded research is appropriate to policy and practice needs. Peer review is also employed at later stages of the research lifecycle to improve the scientific credibility of research outputs, such as articles in academic journals. There is a need to ensure that peer review is effective and efficient, to support the production of high quality research across the sciences. [2] However, there are challenges. Many research funders are facing increasing budgetary pressure and need to ensure that peer review, alongside other aspects of research management, is efficient in time and costs. [3] Peer review has also been subject to criticisms calling into question its validity and usefulness as a process for identifying the 'best' scientific research. [4,5] For example, peer review can be time consuming and therefore expensive, and funders often make substantial efforts to identify and recruit appropriate reviewers and obtain sufficient feedback from them in a timely manner. [3] Researchers typically spend several weeks or months preparing a proposal [6] and each year hundreds of years' worth of total reviewers' time are used by individual research councils, [7,8] which equates to tens of millions of pounds in salary costs. [6] The value of this investment is diminished if peer review is unable to identify good quality proposals that ultimately will have a high impact on policy, practice and science.
Despite the effort involved, it has been argued that peer review leads to inconsistent funding decisions which may be no better than chance decisions in selecting the best proposals. [9] In some cases, however, good correlations have been reported between peer review scores and the estimated scientific impact of the funded proposals. [10] In addition to concerns about the effort involved, peer review has been criticised as being biased, which may reflect a disproportionate influence of individual reviewers' preferences [11] or conflicts of interest. [2] Common concerns are that peer review can be associated with gender bias, or institutional bias, may penalise inexperienced research applicants, and that traditional peer review systems used by major funding agencies tend to be conservative, rejecting innovative or 'high-risk' research proposals. [12] Criticism has also been made of the 'black box' nature of peer review, and attempts have been made to better understand the social and cultural processes by which multi-disciplinary academic funding panels discuss applications, define academic excellence and make funding decisions. [13] Nonetheless, peer review remains a significant aspect of research commissioning, and some funding agencies have attempted to address the criticisms. For example, the US National Institutes of Health and UK Research Councils (among others) have studied their peer review practices to identify opportunities for improvement. Funders are increasingly exploring improvements to peer review processes and methods, or alternatives to peer review itself. [4,14] These include using open rather than blinded review, use of digital technology to discuss proposals rather than face-to-face meetings, testing new proposal scoring methods, and introducing shorter proposal forms and expedited review processes.
Given the costs of peer review and its centrality in ensuring the quality of research, there is a need to map alternative approaches to peer review and assess their impact in addressing some of the criticisms made. There have been few previous systematic reviews in this area. A Cochrane systematic review [15] assessed the impact of a variety of peer review processes on the quality of funded research, identified from the health literature. The review included 10 studies, conducted in a range of countries. Overall, the authors concluded that the quality of the evidence base was limited and that there is a strong need for experimental studies to examine the impact of different peer review processes on the quality of funded research. Given that the literature searches were carried out in 2002 this review is now very out-of-date. This underlines the need for an up-to-date comprehensive review of the evidence.
The question this project set out to investigate was: What is the research evidence on methods and processes for timely, efficient and good quality peer review of research funding proposals in health? The purpose was to make recommendations which could then be made to research funders about useful methods that could potentially be adopted, as well as identifying where further research into peer review of health research proposals is needed. This project was one of a number of complementary research projects conducted within a UK health research funder, the National Institute for Health Research (NIHR), to investigate potential improvements to the process of the peer review of funding applications.

Methods
A two-stage evidence synthesis was conducted comprising: (1) systematic mapping of the key characteristics of the evidence base, followed by: (2) a systematic review of a sub-set of studies on a particular area of relevance prioritised from the map by stakeholders. This is a flexible and pragmatic approach to evidence synthesis that has been successfully applied in a number of published systematic reviews of complex health and education interventions as a means of characterising the evidence base to facilitate a policy-relevant, stakeholder-informed synthesis. [16][17][18][19][20] Stakeholder involvement in systematic reviewing, including the setting of the scope and the research questions, has become increasingly important in evidence-informed health in recent years. [21] The intended methods were described in a research protocol which was circulated amongst NIHR stakeholders for comment before being finalised (S1 Protocol). This was not pre-published in the PROSPERO systematic review repository as it did not include a health outcome, so was ineligible.

Systematic map
Literature searching. A comprehensive search for relevant literature was undertaken by an experienced health information specialist. A draft search strategy was created, piloted, and revised before implementation (S1 Appendix). The following electronic bibliographic databases were searched using the same strategy adapted for each database as necessary (the host platforms used are indicated in brackets): Medline (Ovid); MEDLINE In-Process & Other Non-Indexed Citations (Ovid); Embase (Ovid); The Cochrane Library (comprising the Cochrane Database of Systematic Reviews; Cochrane Central Register of Controlled Trials (CENTRAL); and Database of Abstracts of Reviews of Effects); Psychinfo (Ebsco); Social Sciences Citation Index (Web of Science); and Delphis (a University of Southampton Library database). Database searches were conducted during May-June 2016. We also searched the internet sites of international health research funders and health charities (S1 Protocol) during June-July 2016. Reference lists of a random sample of 25% of articles included in the systematic map, and of all studies included in the systematic review were searched to check that relevant studies had not been missed. All references identified from electronic databases were imported into an Endnote reference management library for storage, removal of duplicates, retrieval of the full text versions, and eligibility screening.
Systematic map eligibility criteria. To be included in the map the references needed to report a research study, of any design, investigating any aspect of the peer review of health research funding application process. Systematic reviews were also permitted but commentary, opinion and editorial articles were excluded. For this project health research was defined broadly to include research into health and social care, public health, and health promotion. References reporting investigations into the peer review of research outputs were not eligible unless they also reported an investigation into the peer review of funding applications. Study inclusion was limited to articles published in the English language. Before being fully implemented, the inclusion criteria were piloted by two reviewers independently on a sample of titles and abstracts which were published in 2015-2016 and retrieved by the literature search.
Each title and abstract was screened independently by two reviewers (JS, GF or KP) with extensive experience of systematic reviewing. If agreement between reviewers could not be reached a third reviewer was consulted. The full text versions of references deemed potentially relevant on checking their titles and abstracts were retrieved for further screening. All full text articles were screened by one reviewer and checked by a second. A third reviewer was consulted in cases of disagreement.
Systematic map coding. A draft set of keywords was devised and agreed by the research team (JS, GF, KP, JW) to describe the key characteristics of the studies relevant to this project. Terms were created for aspects such as: the scope of the studies; the study population (e.g. researchers, health professionals); the study design (e.g. experimental, observational); the study context (e.g. country; type of research funder); and study measures, including outcome and process measures. The keywords did not, however, characterise the results of studies as this was the purpose of the subsequent systematic review.
The draft keyword list was pilot-tested on a subset of 13 studies from the map, [6,10,[22][23][24][25][26][27][28][29][30][31][32] to ensure validity and consistency of application between reviewers. The draft list was also circulated for general comment amongst relevant stakeholders from a working group on peer review as part of the NIHR's strategic priority project 'Push the Pace 2' (which aims to establish a proportionate peer review system for research proposals). The final version of the keyword list is provided in a Microsoft Excel worksheet (S1 Database). All included full-text articles reporting an individual study were grouped and read together and the keywords which were applicable to the study were coded in the worksheet by one reviewer. A random sample of 20% of the studies (n = 16/83) was checked by a second reviewer to ensure reliability and comprehensiveness. The level of reliability between reviewers was considered sufficient, since fewer than 2% of the checked data cell entries in the map worksheet required amendments, which were relatively minor.
Upon completion of the keywording the applied coding was analysed within the database to generate frequencies and cross-tabulations of keywords, permitting an overview of the characteristics of the evidence. The research team met to discuss the results and to identify potential sub-sets of studies grouped by sets of keywords reflecting a particular issue or theme ('scenario') for potential inclusion in the systematic review.
Stakeholder topic prioritisation. Based on the peer review issues reported in the systematic map e.g. bias, quality assurance, efficiency, and study context (e.g. country, type of research funder), and the study outcomes and process measures (e.g. funding decisions made, impact of the funded research), the research team identified three contrasting evidence scenarios for potential systematic review. The scenarios were devised to be relevant to stakeholders involved in research commissioning and management.
The three scenarios were tabulated and emailed to the NIHR Working Group on peer review prior to a face-to-face meeting to discuss the scenarios. The meeting was attended by three of the current authors and 11 members of the working group, who represented all of the different NIHR research commissioning centres. Each scenario was described and discussed in turn and stakeholders were given the opportunity to ask the research team for more information about the scenario and pertinent evidence from the map.
Following the meeting a summary of the discussion was circulated to the NIHR working group members not present at the meeting to seek any additional comments. There was no disagreement from any of these other group members on the prioritised scenario. Further detail on the stakeholder topic prioritisation process is reported in S2 Appendix.

Systematic review
Following the stakeholder consultation exercise the prioritised scenario question for the systematic review was: "Which innovations can improve the efficiency and/or effectiveness of the peer review of health research proposals?" A set of inclusion criteria for the systematic review was drafted to reflect this research question. The final criteria were: 1) Primary outcome evaluation studies or systematic reviews on the peer-review of research funding proposals in health published after 2005 (N.B. Systematic reviews were to be included as a source of references only); 2) Any peer review system structure innovation, with the exception of ranking or scoring of grant proposals (these were not considered relevant by the stakeholders); 3) At least one outcome measure relating to the efficiency of peer review (e.g. time required by peer reviewers; administrative costs of peer review; level of agreement between reviewers) and/or the effectiveness of peer review (e.g. ability of peer review to inform funding decisions; quality of the peer review process; scientific quality of the funded research and its impact on policy, practice and science).
The inclusion criteria were applied to the full text articles of studies already located in the systematic map. One reviewer applied the criteria and a second checked their decision, with any disagreements resolved through discussion. Studies meeting the criteria underwent data extraction and critical appraisal using a template devised for this study.
Due to the diverse range of potentially eligible studies, a number of critical appraisal instruments were considered for use. Any randomised controlled trials (RCTs) identified were to be appraised using the Cochrane risk of bias tool. [33] A modification to these criteria for nonrandomised studies by the Cochrane Effective Practice and Organisation of Care (EPOC) group was also planned. However, this was not subsequently applied to any of the included studies due to the nature of their designs (see 'Results' below). Few existing instruments were considered appropriate for critically appraising the included studies and therefore we undertook a narrative appraisal of the quality of each study, commenting on key aspects of data collection and analysis and threats to internal validity. Data extraction and critical appraisal was performed by one reviewer (JS or GF) and checked by a second with any disagreements resolved through discussion.
Given the heterogeneous nature of the included studies (the studies differed considerably in their designs and characteristics) it was not considered appropriate to conduct meta-analysis. A narrative synthesis was therefore conducted.

Systematic map results
A total of 1824 titles and abstracts was screened, and 198 of these were further screened as full text articles (Fig 1). The rate of agreement between the two reviewers at full-text screening was The effectiveness and efficiency of peer review of health research funding proposals 90%, with 10% of the decisions requiring further discussion or referral to a third reviewer to reach a final decision. A total of 83 studies (described in 89 publications) met the inclusion criteria for the systematic map. [3, 6, 9, 10, 12, 15, 23-30, 32, 34-104] (S1 Database).
Most studies (72%) were published from 2005 onwards (49% from 2010 onwards). Fifteen countries were represented, with 49% of studies having been conducted in the USA. Other locations included Europe (23%, most frequently in Germany and the UK [each 6%]); Canada (11%), and Australia (9%). Of the study types, 61% were observational; 31% were based on surveys, interviews or focus groups; and 7% were experimental (of which 3 studies [4%] were randomised). In the majority of studies (73%) the setting was a national research council (e.g. the US National Institutes of Health; NIH). A smaller proportion of studies were based in charities or local funders. In around one third of the studies the peer reviewers were academics and/or health professionals, and in 10% they were lay people. In the majority of studies, however, the professional status of the peer reviewers was not reported. In some studies the peer reviewers were external to the funder and its funding decision panel, whilst in other cases the reviewers were also involved in making funding decisions. In many studies the extent of the reviewer's role (e.g. funding panel member) was not clearly defined.
A variety of peer review issues have been studied. We categorised these as relating to the process and structure of a peer review system, such as: scoring/ranking methods (12%); configurations of reviewers (e.g. the number needed or expertise required) (12%); or methods for identifying peer reviewers (7%); and peer reviewer processes, such as: bias in peer review (20%); predictive ability of peer review to identify research projects that will ultimately be successful (22%); consistency in reviewing scoring/judgements between reviewers (18%); and stakeholder opinions on the peer review process (30%).

Systematic review results
Eight studies met all the inclusion criteria for the systematic review and are summarized in Table 1. These evaluated a broad range of innovations which can be categorised as: shortening of grant proposals (alongside other peer review simplifications); [6,23,29] videoconferencing or teleconferencing approaches; [47, 60, 100] a Delphi consensus approach; [27], a video training module for peer reviewers; [95] and involvement of patients and other care-giving stakeholders to improve peer review. [57] Table 2 provides our critical appraisal of each study and Table 3 describes features of the studies which relate to their generalizability. S1 Table provides tabulated details of the study results, ordered by outcome and process measure. A structured narrative description of the methods and results of each study follows. [23]) Overview: A streamlined funding protocol for a new health services research stimulus grant awards programme-the Australian Centre for Health Services Innovation (AusHSI). The protocol comprised a short proposal form and accelerated peer review process. The aim was to reduce the content and time required by applicants and reviewers in order to provide rapid and transparent funding decisions.

Shortening of grant proposals and simplified approaches. Short proposal with simplified scoring & accelerated peer review (Barnett et al
Innovation method: In the protocol applicants are given four weeks to submit electronically a 1,200 word limit form describing the research question, methods, budget and expected impact on health services. Two members of the multi-disciplinary funding committee shortlist proposals and provide written feedback to unsuccessful applicants. Shortlisted applicants attend interviews within 10 days where they make a brief 10 minute presentation to the committee. The proposals are then ranked against a set of criteria and funding is allocated in order of rank until the pre-defined budget limit is met. Successful applicants are notified within two weeks. There is particular emphasis on providing feedback with unsuccessful applicants receiving written feedback and suggested improvements for resubmission. Method for assessing the innovation: The protocol was evaluated as part of a prospective quality improvement evaluation, with internal monitoring data collected at four cross-sectional time points (funding round 1 and 2 in 2012, and round 1 and 2 in 2013). Brief data are also reported on applicants' views and experiences of the proposal and peer review system.
Principal results and conclusions: The average time applicants' spent preparing their proposals (described as a primary outcome) was seven days over the four funding rounds. The committee members spent on average 36 minutes (range 15-105 minutes) reviewing each proposal prior to the committee meeting where the same reviewers spent 10 minutes discussing each proposal. The mean time from proposal submission to decision notification over the four rounds was seven weeks. Successful research teams were notified within two weeks of interview, which was a maximum of eight weeks after proposal submission. Selected quotations suggest applicants' views of the protocol were positive. Although for some applicants the 1,200 word limit was challenging the reduction in unnecessary paperwork was appreciated. The feedback given to applicants was also appreciated and they found it enabled them to create better research proposals. In their discussion the authors suggest that, over time, the comprehensive feedback given to applicants who were not successful led to receipt of fewer proposals but  of better quality. They conclude that this has improved efficiency for both applicants and reviewers. Key strengths and limitations: The innovation was used in a 'live' review round to allocate funding. Overall, limited details are given on the study methods and there is little detailed quantitative or qualitative analysis. The protocol evaluated here was for a relatively smaller scale funding programme, funding award $80,000 (AUSD) for a maximum 12 month project. The findings may not necessarily be applicable to larger funding awards of longer duration.
Shorter proposal & smaller peer reviewer panel ± face-to-face meeting (Herbert et al [6]) Overview: A prospective evaluation of shortened research proposals and simplified peer review processes for the Project Grant scheme of the National Health and Medical Research Council (NHMRC) of Australia. The aim was to identify the agreement between the programme's official process and two new simplified processes, and the peer review cost savings for the simplified processes.
Innovation method: A simplified process where panel members reviewed a nine-page research plan and a two-page track record for each chief investigator. There were two types of simplified panels. One comprised seven members who reviewed proposals during a one and a half day face-to-face meeting (15 minutes discussion of each proposal). The other was a two person 'journal panel' (similar to peer review in an academic journal) who independently reviewed and scored proposals (without the two-page track investigator track record). A The effectiveness and efficiency of peer review of health research funding proposals The effectiveness and efficiency of peer review of health research funding proposals simplified scoring process was used for both panels (definitely fund, possibly fund, or definitely do not fund). The topics of the proposals were classified as basic science or public health. Method for assessing the innovation: The project was described as a prospective parallel study. The authors compared the outcomes from the two simplified peer review panels in The effectiveness and efficiency of peer review of health research funding proposals parallel with the existing official NHMRC programme. The study included a sample of 72 research proposals that had been submitted to the official programme and were undergoing assessment in parallel to the research study. The simplified process was initiated by the authors, whilst the official process was independent of the research study (though it was used for purposes of comparison). The official programme comprised 43 panels each with 12 members who meet for a week, and who discuss an average of 91 proposals each of around 100 pages long. Proposals are ranked using a weighted calculation using three criteria-based integer scores (from a one to seven). Principal results and conclusions: The time spent reviewing proposals was similar between the two simplified panels (3.6 to 3.9 hours per proposal on average) (NB. no comparison was made with the official process for this measure). There was near satisfactory agreement in funding decisions between simplified processes and the official processes (72%-74%). The authors estimate that the two simplified panels could result in cost-savings equivalent to AUD $A2.1-$A4.9 million per year compared to the official process (based on costs for the year 2013, equating to a reduction in costs of between 34% to 78%), achieved through reductions in reviewers' time (and therefore salary costs). The journal panel achieved the highest savings, as no meeting expenses were incurred.
Key strengths and limitations: A strength of this study was that the innovation was evaluated in the context of a 'live' funding round of a national funder. In terms of limitations there were differences between the official programme and the two simplified processes in terms of how proposals were scored and therefore how funding decisions were made. This may potentially confound the comparison in funding agreement between the processes. The sample of proposals analysed may not be wholly generalisable as they were provided to the study by contacts of the authors, rather than being sampled on a representative basis. [29]) Overview: A comparison of two methods of peer review on the probability of funding a research proposal: a panel of reviewers who ranked proposals; and a two peer reviewer method. This was a research project funding competition at a major Canadian university medical centre aimed at stimulating pilot clinical research from new investigators and teams. The intention was that they would later submit a full proposal to an external funding agency.

Peer review panel (11 members) with short proposal vs standard 2-reviewer critique (Mayo et al
Innovation method: A committee of 11 experienced researchers and peer reviewers read and ranked 32 proposals (divided into two streams-new teams and new investigators) and ranked them, without using any explicit criteria (the 'RANKING' method). At the start of the committee meeting (before discussion of any results) it was decided that the top two ranked projects in each stream would be funded. For projects ranked three to eight the committee reviewed the ratings from an alternative two-reviewer method (the CLassic Structured Scientific In-depth two reviewer critique 'CLASSIC' method) and discussed the projects. Consensus was reached for the next three in each stream to be recommended for funding (thus a total of 10 proposals would be funded).
Method for assessing the innovation: The study was a prospective evaluation of two parallel models of peer reviewing. Under the CLASSIC method each proposal was assessed and scored by two assigned peer reviewers using a five point rating scale. The study measured agreement in proposal scoring rank and in the funding decision between the two methods, and the number of reviewers needed to arrive at a consistent ranking.
Principal results and conclusions: There was variability in the mean ranks assigned to each proposal between the two methods. The kappa value for agreement in funding decision (based on rank) was 0.36 (95% confidence interval 0.02 to 0.70) indicating poor quality agreement between the two methods. Of the 10 funded projects, the frequency of simulated reviewer pairings drawn from the RANKING committee in which the project failed to meet the funding cut-off ranged from 75% to 9%. Also, projects that were recommended for funding had a 9% to 60% probability of failing to meet the funding cut-off had only two reviewers been assigned (i.e. based on the CLASSIC method). It was estimated that least 10 reviewers would be needed for optimal agreement in funding of proposals. The authors call into question the appropriateness of using the two peer reviewer assessment of research proposals.
Key strengths and limitations: The innovation was used in a 'live' review round to allocate funding. The study simulated the percentage of possible reviewer pairings (drawn from the 11 member committee) in which a proposal failed to meet the funding cutoff. This was done to mimic the standard practice of (approximate) random allocation of pairs of reviewers to proposals. However, in actuality these proposals were not prospectively distributed amongst pairs of reviewers for review and ranking. Furthermore, ranking criteria differed between groups, confounding comparisons, and the sample of proposals was small.
Videoconferencing or teleconferencing approaches. Teleconference-based peer review meetings (Gallo et al; Carpenter et al [47,60]) Overview: Retrospective comparison of two scientific peer review processes used by the American Institute of Biological Sciences (AIBS) for an anonymous federal funding programme. Specifically, effects on the peer review process and outcomes were compared for face-to-face meetings (held up to 2010) and teleconference meetings (introduced in 2011) [60]. Part of the study focused on examining the effects of discussion on peer review outcomes. [47] Innovation method: Peer reviewers met by teleconference and presented the strengths and weaknesses for each grant proposal using specific review criteria. Each proposals was then discussed by a panel, comprising 7-12 subject matter experts plus one or more 'consumer' reviewers, guided by an AIBS chairperson to ensure consistency and fairness. Reviewers then submitted their final scores using an online system. The process was repeated for each proposal, and an overall summary paragraph prepared by assigned reviewers for each proposal, showing the panel's evaluation and recommendations.
Method for assessing the innovation: Case-control type study comparing two years of teleconference peer review meetings (2011-2012) against two years of face-to-face meetings (2009)(2010). Face-to-face meetings appear to have had similar structure to teleconferences except that reviewers had to travel to the meeting (usually in a hotel) to participate. Outcomes included: the average time spent discussing each proposal; reviewer agreement estimated using the intra-class correlation coefficient (ICC); the effect on the funding decision of prepost meeting score changes after discussion (indicated by the proportion of proposals that crossed a theoretical funding threshold); and reviewers' views on the panel discussions (surveyed at the end of each meeting using a numerical Likert-type scale).
Principal results and conclusions: Average review time per proposal was slightly shorter for teleconferences (20.0 minutes) than face-to-face meetings (23.9 minutes) (ANOVA: F 3,61 = 14.54; p<0.001). Reviewer agreement ranged from ICC = 0.84 to 0.87 across all years, with no clear difference between meeting settings. Slightly more (12.7%) proposals assessed in teleconferences than in face-to-face meetings (10.0%) crossed the funding threshold either way after discussion. After peer review discussion, 19.8% of proposals scored in teleconferences and 15.4% in face-to-face meetings fell within the fundable score range. The authors' conclusion that most of the outcomes were unaffected by the review setting appears reasonable, although it is unclear how important the reduced discussion time in teleconferences is and unclear whether the reviewers reported any limitations to the process.
Key strengths and limitations: The innovation and comparator were used in 'live' review rounds of a national funder to allocate funding, with both approaches replicated in two years. Sample size was relatively large (circa 1600 proposals in total; range 291 to 669 per meeting).
The retrospective case-control design is a limitation, but reviewer demographic characteristics appear to have been similar across the groups and years. Uncertainties are that the 'consumer reviewers' identity is unclear; and only a limited set of reviewers' views are reported, making it unclear how representative they are.

WebEx-based virtual peer review meetings (Vo et al [100])
Overview: Evaluation of the first six unplanned virtual review sessions conducted during the US 2012 hurricane season at the Agency for Healthcare Research and Quality (AHRQ), to assess their effects on review outcomes and to compare them with five face-to-face peer-review sessions.
Innovation method: Virtual online meetings of peer reviewers using WebEx software, which had: audio; high-definition video; real-time content sharing; and the capability to feed up to seven simultaneous webcam videos. A 30-minute basic training session on use of WebEx software was provided. Four Study Section meetings and two Special Emphasis Panel meetings were conducted. In total, 110 reviewers participated, ranging from 7 to 24 per section or panel. Of 194 total grant proposals reviewed, 128 were discussed, ranging from six to 34 proposals per session. Low-scoring proposals were not discussed so as to give reviewers ample time to concentrate on those with higher scores.
Method for assessing the innovation: Retrospective case-control type study which compared the six unplanned virtual grant proposal review sessions held in October 2012 against five face-to-face review sessions held in June 2012. The time taken for peer review and the cost of peer review were recorded. Views of reviewers on the advantages and disadvantages of the WebEx software and review process were obtained using a 10-item questionnaire.
Principal results and conclusions: The mean time spent discussing each proposal was 20 minutes for virtual review sessions and 26 minutes for face-to-face sessions and the average meeting lengths were 587 minutes and 430 minutes respectively. This gave costs per reviewer per day of US$ 324 and US$1314 respectively (a reduction in costs of 76%). The authors concluded that the virtual review process is a replicable and low cost method of review, but this is subject to the proviso that there are numerous uncertainties around the methods (Table 2). Furthermore, reviewers' responses to questionnaires indicated that 26% experienced technical difficulties and 33% would not use virtual review again.
Key strengths and limitations: The innovation and comparator were used in 'live' review rounds of a national funder to allocate funding, with five or six replicate sessions analysed. However, no information about the face-to-face sessions is provided so it is unclear whether these reflected usual AHRQ practice and whether they had comparable proposals, reviewers, and overall processes to the virtual review sessions. There is also uncertainty around several aspects of the virtual peer review process which were not reported, and whether all costs had been accounted for, which limits generalisability.
Other approaches. Modified Delphi process for selecting 'innovator' grants (Holliday and Robotin [27]) Overview: 'Modified Delphi' process, conducted online by the Cancer Council of New South Wales (CCNSW, Australia) for selecting 'innovator' grants, based on proposals limited to six pages. The approach was developed because most potential cancer expert peer reviewers were listed as investigators, or had conflicts to declare. This made it inappropriate to use traditional peer review in which local experts are invited as peer reviewers. The grants aimed to support innovative research unlikely to be considered by traditional funding bodies.
Innovation method: The process was applied to the 10 best proposals received and involved five non-conflicted experts who held pancreatic cancer research grants in another country (the US). Three Delphi rounds were held over a 16-day period in March 2009 to score: (1) scientific merit (clarity, measurability of the endpoint, scientific quality, originality, adequacy of the study design to achieve the stated goal, whether the potential impact would warrant funding); (2) innovativeness; and (3) level of risk. At the end of each round scores were converted to ranks and the two lowest-ranking proposals at each round were excluded. The four remaining proposals were funded.
Method for assessing the innovation: Single-group prospective study in which reviewer agreement was assessed at the end of each round. Reviewers were provided with a table of deidentified scores and an overall ranking of proposals and were asked to advise whether they wished to proceed to the next round, or raise any objections. On completion of the Delphi process feedback was sought from the reviewers on the process, its usefulness, and possible alternatives or modifications (methods for obtaining feedback are not explicitly reported).
Principal results and conclusions: The authors' conclusion was that "the modified Delphi process was an efficient, transparent and equitable method of reviewing novel grant proposals in a specialised field of research, where no local expertise was available" (p. 225). Reviewer feedback indicated that additional discussion would be helpful, suggesting that the innovation may benefit from further modification.
Key strengths and limitations: The innovation was used in a 'live' review round of a national funder to allocate funding. The process was relatively simple and quick, although it was only tested in one small group of five reviewers, and assessed only 10 proposals. As such, the generalisability is likely to be limited to very small-scale grant programmes or programmes where a subset of the 'best' proposals has already been identified for further prioritisation. Further research would be needed to confirm the findings and clarify whether the method could accommodate a larger number of reviewers and proposals. Several aspects of the methodology are unclear, particularly relating to the assessment of reviewer feedback.
Inclusion of patient-centred stakeholders in peer review meetings (Fleurence et al [57]) Overview: The study explored contributions of scientist, patient, and stakeholder reviewers (e.g. nurses, physicians, other caregivers, patient advocates) to the merit-review process of the Patient-Centred Outcomes Research Institute (PCORI) in its inaugural funding round. The rationale was that using scientists alone might bias against novelty, and could lead to selection of proposals similar to the scientists' interests.
Innovation method: The two phase inaugural PCORI merit-review process. In phase one (no discussion), proposals (n = 480) were reviewed by three scientific reviewers who submitted their reviews online. Reviewers received webinar training in PCORI's review process and criteria. Proposals with average scores in the top third (n = 152) moved to phase two. Proposals in phase two were first given "pre-discussion" scores by two scientists (who did not participate in phase one), one patient and one stakeholder. These four lead reviewers had access to phase one critiques and scores. Patient and stakeholder reviewers based their overall score on three of eight PCORI merit criteria (innovation and potential for improvement; patient centeredness; patient and stakeholder engagement). Proposals in the top two-thirds based on the four lead reviewers' scores (n = 98) were then given a final "post discussion" score by each member of a 21-person panel (including revised scores from the lead reviewers) during a face-to-face meeting. Lead reviewer scores were available to all reviewers during the discussion. The 25 proposals with the best average post discussion scores were funded. In total 59 scientists, 21 patients and 31 stakeholders participated in phase two.
Method for assessing the innovation: Single-group study. Agreement between scientist scores and patient and stakeholder scores was assessed before and after the in-person panel discussions in phase two. The effect on the funding decision of using the 2-phase (scientist, patient and stakeholder) or only a one phase (scientist-only) review process was assessed by comparing proposal rankings after each phase. Web-based surveys and focus groups were used to elicit reviewers' views.

Principal results and conclusions:
Of the 25 proposals with the best scores after phase two, together with the experimental setting which did not involve assessment of 'real' proposals or making any funding decisions.

Discussion
Our study is the most detailed systematic description of the characteristics of research into the peer review of funding proposals in the health sciences to date. The systematic map has revealed a burgeoning area of investigation, with just under half the studies in the map having been published since 2010. The topics investigated were diverse and the studies were mainly observational in design, typically comprising longitudinal or cross-sectional studies, or retrospective analyses of data collected during funding proposal calls. Experimental studies were very rare, which may demonstrate a preference to study peer review within the context of real world funding programmes, for example on grounds of feasibility, potentially at the expense of internal validity.
Our systematic review included a broad range of innovations and assessed their impact on various measures of effectiveness and efficiency. The majority of the outcomes measured represent ways to make peer review (as well as the research funding process in general) more efficient. The studies showed that innovations could reduce the time spent on peer review and the costs incurred, in varying magnitudes. For example, in one retrospective, case-control-type study, use of teleconferences compared to face-to-face meetings led to a slight reduction in discussion times of up to 10 minutes per proposal, though the overall importance of this reduction was not quantified in terms of changes in costs, or perceived significance. [47,60] In another retrospective, case-control-type study, use of internet-based video conferences compared to face-to-face meetings resulted in shorter discussion times per proposal (by around six minutes on average) and shorter average meeting lengths (by around 2.5 hours).
[100] This was associated with an estimated cost saving of around $1000 (US dollars) per reviewer per day (a 76% reduction), which could be considered an important efficiency improvement. The peer review time per proposal was similar between two variants of an innovation that included shorter proposal forms and smaller peer review panels (3.6 to 3.9 hours), assessed in a prospective parallel group study. [6] The authors of this study estimated that use of these simplified panels could result in cost savings of between $2.1 to $4.9 million (Australian dollars) per year compared to the standard process of a larger panel and a longer proposal form (equating to a reduction in costs of between 34% to 78%). Again, this could represent substantial savings to funders, particularly those that operate at a large scale.
A prospective uncontrolled study [23] which evaluated a simplified process (comprising short proposal forms with accelerated peer review) reported relatively short peer review times per proposal (an average of 36 minutes) and an average time from proposal submission to funding outcome notification of between six to eight weeks. This suggests that accelerated peer review can enable timely funding decisions in certain contexts. The study also provided comprehensive feedback to applicants (both those successful and unsuccessful) on how their proposals could be improved, and the authors noted that over time they received fewer proposals but those submitted were of better quality. However, the trade-off between the costs to funders (in terms of time and resources required to provide detailed feedback to applicants), and the potential benefits to funders and applicants (in terms of production and submission of fewer, better quality, proposals) were not fully quantified by this study. Provision of detailed feedback to applicants has potential to improve the efficiency of the research funding system as a whole, and is an area for future research to investigate.
A number of the studies included in the systematic review measured inter-reviewer agreement, in terms of scores and in funding decisions, with varied findings. For example, good reviewer agreement was found in the study which compared peer review by teleconference discussions with face-to-face meetings, with ICCs ranging between 0.84 and 0.87. [47,60] The authors suggested that this, and the absence of other differences in review outcomes between the two approaches, supports the case for moving to teleconferences. In contrast, a study which compared ranking of proposals by a committee of 11 reviewers against ranking of proposals by two peer reviewers found poor reviewer agreement in ranking scores (and therefore decisions to fund) as measured by a kappa score of 0.36. [29] Lack of good agreement might not necessarily be a limitation of peer review if this is offset by other efficiency benefits such as time and cost reductions. However, none of the studies included in our systematic review measured all of these outcomes, so possible trade-offs among different aspects of efficiency cannot be ascertained currently.
There were mixed findings across the studies indicating perceived benefits but also drawbacks of the innovations. For example, in the study in which patients and care-giving stakeholders peer reviewed funding proposals alongside scientific reviewers, scientists appreciated the perspectives offered by patients and stakeholders and there was recognition of a collegial and respectful process. [57] However, there was concern from scientists about the level of technical expertise of some non-scientist reviewers. The study comparing internet-based video conferences to face-to-face meetings [100] reported both positive and negative views expressed by peer reviewers. Perceived advantages included less travel, decreased costs, and faster reviews. However, some technical problems were experienced, and there was concern that video-conferences might impair interaction among reviewers and result in less thorough reviews. It is important that any implementation of these peer review innovations takes into account the limitations, and future evaluations should thoroughly evaluate process issues to facilitate optimal planning and execution of peer review activity.
Our findings can be contextualised with those of a non-systematic literature review by Guthrie et al.[105] published in 2017 which included 105 empirical articles on the effectiveness and burden of peer review for grant funding. That review had a broader focus than our systematic review, covering issues such as bias and fairness, reliability, timeliness of peer review, and the burden of peer review on the research system as a whole. It also included studies of peer review in disciplines other than health sciences. The review included many of the studies included in our systematic review, but described them in less detail. Notably, Guthrie et al.'s review incorporated a different conceptualisation of effectiveness and efficiency than in our review: 'effectiveness' is a multi-dimensional concept that incorporates factors such as whether peer review selects the 'best' research; whether it is reliable, fair, accountable, timely and has the confidence of key stakeholders. The 'burden' of peer review on the research system is a concept that incorporates the time, resources and costs expended in the production and review of grant applications. 'Efficiency' is the trade-off between effectiveness and burden. Thus, an efficient peer review system is one that has one or more markers of effectiveness whilst being low in system burden. Guthrie et al. [105] found there was a lack of evidence about the overall efficiency of peer review of grant applications. In terms of markers of effectiveness they found evidence to indicate a bias against innovative research, and evidence of the poor prediction of peer review on future research performance. They found some evidence to suggest a high burden on applicants, though much of the research evidence in their review has focused on reducing burden on funders and reviewers. Applying Guthrie's conceptualisation to our systematic review results there is evidence to show a reduction in burden for funders (which we refer to as efficiency in our review). However, evidence for the effectiveness of peer review in our systematic review is limited to whether innovations which aim to reduce peer review burden can lead to the same research applications being funded as would have been funded under existing (more burdensome) peer review systems. The studies in our systematic review did not assess other markers of effectiveness such as the predictive ability to identify the best research. Thus, we cannot conclude that there is strong evidence to support improving the 'efficiency' (as defined by Guthrie et al,[105]) of peer review of grant applications, but we can conclude there is evidence (albeit with methodological limitations) on burden reduction.
Our research used systematic methods to identify, collate, appraise and analyse the evidence, employing standard approaches in evidence synthesis. [106, 107] Extensive internet searching was conducted to identify material not formally published in academic journals. Quality assurance procedures, such as independent screening and data checking, were used where possible to minimise bias and error. However, there were some potential limitations of this study. We could not check the reference lists of all studies included in the map to identify any additional relevant studies, though we did check the reference lists of all studies included in the systematic review. Not all of the keywords applied to studies included in the map were checked by a second reviewer. However, as mentioned above, following checking of a random sample of studies the level of reliability between reviewers was considered sufficient as few amendments were necessary. We restricted inclusion to studies published in the English language. It is unknown whether there is a significant pool of relevant evidence published in other languages. The scope of our evidence synthesis is limited to studies of peer review of research proposals in health; we did not investigate studies of peer review of research proposals in other disciplines. Whilst it is possible that findings from studies in non-health disciplines could also have relevance to health research, a substantial effort would be required to synthesise the evidence across multiple disciplines. Our findings suggest, however, that even within health research the studies had limited generalisability.
A strength of this evidence synthesis was the close consultation with stakeholders throughout the project, and in particular their role in setting the focus for the systematic review. [21] It should be reiterated that the scope of the systematic review was to focus on peer review innovations evaluated for effectiveness and efficiency. Only a small proportion (around 10%) of the evidence from the map met the inclusion criteria for the review, meaning that there remains a larger pool of evidence that could be included in future systematic reviews focusing on other aspects of peer review. Also of note, our systematic review included studies of innovations, which we defined as being new activity distinct from existing practice (or in addition to existing practice). Some of the literature evaluated only what appeared to be existing peer review practice, and useful information could be gleaned from these studies in further reviews.

Conclusions
This project has found that there is increasing international research activity into the peer review of health research funding. Overall, it appears that simplifying peer review by shortening proposals, using smaller panels of reviewers and accelerating the process could reduce the time needed for review, speed up the general process, and reduce costs. However, this might come at the expense of peer review quality, a key aspect that has not been fully assessed. Virtual peer review using videoconferencing or teleconferencing appears promising for reducing costs by avoiding the need for reviewers to travel, but again any consequences for the quality of the peer review itself have not been adequately assessed. All of the eight studies included in the systematic review were relatively weak methodologically or had variable generalisability, which limits how much emphasis should be placed on their results.
Given the methodological limitations of the evidence included in this systematic review it is not possible to recommend direct implementation of these innovations currently. However, many of them appear promising based on current evidence and could be adapted as necessary by funders and subjected to evaluation. Future evaluations should be conducted to a sufficient standard, to ensure high internal and external validity. In particular, we have identified a number of measures of generalisability of studies which we recommend that evaluators incorporate into the design and reporting of their work (Table 3). Where feasible, experimental evaluations, including RCTs, should be conducted including economic evaluation to assess costs of peer review innovations as this is lacking in the currently available evidence.