Assessing the potential of a Bayesian ranking as an alternative to consensus meetings for decision making in research funding: A case study of Marie Skłodowska-Curie actions

doi:10.1371/journal.pone.0317772

Fig 1.

In the top layer of the Figure, the MSCA funding allocation process is represented for ten hypothetical proposals: the individual evaluation reports established by three experts to each proposal are followed by a consensus report prepared during the consensus meeting (A1).

The consensus total scores are then used to compound the final ranking (A2). The final ranking, established per scientific panel, and considering the available budget, determines the proposals that will be included in the three funding ranking groups: the main list, the reserve list and the rejected proposals (A3). Similarly, the bottom layer represents the Bayesian ranking process for the same hypothetical proposals: the individual total scores from the experts’ individual evaluation reports are used in the Bayesian hierarchical model intended to model the evaluation process. From this model, various outputs can be extracted, two of which are predictions of the consensus reports established for each of the ten proposals (B1) and the expectations and distribution of the final rankings (B2). Combining the BR output with the available budget gives us the BR recommendations, allocating the proposals to either the accepted, lottery or rejected group (B3).

More »

Expand

Table 1.

Number of proposals evaluated by the selected panels (Mathematics, Social Science/Humanities and Life Sciences) for the two selected calls (2015 and 2019), together with the percentage of successful and unsuccessful proposals, and proposals on the reserve list.

The number of experts per panel and the average number of proposals evaluated by each expert with its minimum and maximum number are represented. The median budget requested by the proposals in the panels for the two calls, together with the budget that is available and can be distributed is also shown. Soc. Sci./Hum.: Social Sciences and Humanities, Av: Average, N: Number.

More »

Expand

Fig 2.

Distributions of the scores after consensus meetings for the 3 different panels for the calls in 2015 and 2019.

For each panel, the first plot represents the distribution of the consensus report, while the other three plots show the distribution of the consensus scores for the different criteria. The total number of proposals evaluated in each call is also shown. STE: scientific excellence, IPT: impact, IPL: implementation.

More »

Expand

Fig 3.

Social Sciences and Humanities panel of the 2015 Call: Bayesian Ranking and recommendations.

The expected ranks are represented with their 50% credible intervals for the three panels. The provisional funding line (dashed blue line) is defined by allocating the available budget to the best ranked proposals until there is not enough funding for the next proposal. Those proposals with their 50% credible interval crossing the provisional funding line are recommended to be in the lottery group.

More »

Expand

Table 2.

Bayesian Ranking recommendations: number of proposals recommended to be part of the accepted or rejected proposals, or of the lottery group, for the selected panels (Life Sciences, Mathematics and Social Science/Humanities) and the two selected calls (2015 and 2019). N: Number.

More »

Expand

Fig 4.

Scatter plots comparing the consensus report and the prediction from the BHM for each panel.

Two outlier proposals in the Life Sciences panels, highlighted in dark red, are discussed more in detail in the text.

More »

Expand

Table 3.

Individual evaluation reports and consensus report given to two specific outlier proposals, highlighted in Fig 4. STE: scientific excellence, IPT: impact, IPL: implementation.

More »

Expand

Fig 5.

95% credible intervals of the predictions of the consensus report from the individual evaluation reports.

Note that even though the y-axes only start at 40, the scores could have been on a scale from 0 to 100. This further shows the skewness of the evaluation sores.

More »

Expand

Table 4.

The share of proposals with consensus report within, above or below the 95% credible intervals of what would have been expected given the individual evaluation reports using the Bayesian hierarchical model.

More »

Expand

Fig 6.

The final and official ranking compared to the rank expected using the Bayesian hierarchical model with its 95% credible intervals for all three panels in 2015.

Note that even though the y-axes only start at 40, the scores could have been on a scale from 0 to 100. This further shows the skewness of the evaluation sores.

More »

Expand

Table 5.

The share of proposals with official final ranking better or worse than what would have been expected given the individual evaluation reports used for the Bayesian ranking.

More »

Expand

Fig 7.

Percentage of agreement between Bayesian Ranking and the official ranking for different group sizes.

For example, do the BR and official ranking agree on the 10% best ranked proposals? This is done for each panel in both call years.

More »

Expand

Table 6.

Agreement for the panels in 2015, with the counts in the different funding ranking groups.

More »

Expand

Table 7.

Agreement for the panels in 2019, with the counts in the different funding ranking groups.

More »

Expand