Fig 1.
In the top layer of the Figure, the MSCA funding allocation process is represented for ten hypothetical proposals: the individual evaluation reports established by three experts to each proposal are followed by a consensus report prepared during the consensus meeting (A1).
The consensus total scores are then used to compound the final ranking (A2). The final ranking, established per scientific panel, and considering the available budget, determines the proposals that will be included in the three funding ranking groups: the main list, the reserve list and the rejected proposals (A3). Similarly, the bottom layer represents the Bayesian ranking process for the same hypothetical proposals: the individual total scores from the experts’ individual evaluation reports are used in the Bayesian hierarchical model intended to model the evaluation process. From this model, various outputs can be extracted, two of which are predictions of the consensus reports established for each of the ten proposals (B1) and the expectations and distribution of the final rankings (B2). Combining the BR output with the available budget gives us the BR recommendations, allocating the proposals to either the accepted, lottery or rejected group (B3).
Table 1.
Number of proposals evaluated by the selected panels (Mathematics, Social Science/Humanities and Life Sciences) for the two selected calls (2015 and 2019), together with the percentage of successful and unsuccessful proposals, and proposals on the reserve list.
The number of experts per panel and the average number of proposals evaluated by each expert with its minimum and maximum number are represented. The median budget requested by the proposals in the panels for the two calls, together with the budget that is available and can be distributed is also shown. Soc. Sci./Hum.: Social Sciences and Humanities, Av: Average, N: Number.
Fig 2.
Distributions of the scores after consensus meetings for the 3 different panels for the calls in 2015 and 2019.
For each panel, the first plot represents the distribution of the consensus report, while the other three plots show the distribution of the consensus scores for the different criteria. The total number of proposals evaluated in each call is also shown. STE: scientific excellence, IPT: impact, IPL: implementation.
Fig 3.
Social Sciences and Humanities panel of the 2015 Call: Bayesian Ranking and recommendations.
The expected ranks are represented with their 50% credible intervals for the three panels. The provisional funding line (dashed blue line) is defined by allocating the available budget to the best ranked proposals until there is not enough funding for the next proposal. Those proposals with their 50% credible interval crossing the provisional funding line are recommended to be in the lottery group.
Table 2.
Bayesian Ranking recommendations: number of proposals recommended to be part of the accepted or rejected proposals, or of the lottery group, for the selected panels (Life Sciences, Mathematics and Social Science/Humanities) and the two selected calls (2015 and 2019). N: Number.
Fig 4.
Scatter plots comparing the consensus report and the prediction from the BHM for each panel.
Two outlier proposals in the Life Sciences panels, highlighted in dark red, are discussed more in detail in the text.
Table 3.
Individual evaluation reports and consensus report given to two specific outlier proposals, highlighted in Fig 4. STE: scientific excellence, IPT: impact, IPL: implementation.
Fig 5.
95% credible intervals of the predictions of the consensus report from the individual evaluation reports.
Note that even though the y-axes only start at 40, the scores could have been on a scale from 0 to 100. This further shows the skewness of the evaluation sores.
Table 4.
The share of proposals with consensus report within, above or below the 95% credible intervals of what would have been expected given the individual evaluation reports using the Bayesian hierarchical model.
Fig 6.
The final and official ranking compared to the rank expected using the Bayesian hierarchical model with its 95% credible intervals for all three panels in 2015.
Note that even though the y-axes only start at 40, the scores could have been on a scale from 0 to 100. This further shows the skewness of the evaluation sores.
Table 5.
The share of proposals with official final ranking better or worse than what would have been expected given the individual evaluation reports used for the Bayesian ranking.
Fig 7.
Percentage of agreement between Bayesian Ranking and the official ranking for different group sizes.
For example, do the BR and official ranking agree on the 10% best ranked proposals? This is done for each panel in both call years.
Table 6.
Agreement for the panels in 2015, with the counts in the different funding ranking groups.
Table 7.
Agreement for the panels in 2019, with the counts in the different funding ranking groups.