Advertisement
  • Loading metrics

Assessing scientists for hiring, promotion, and tenure

  • David Moher ,

    dmoher@ohri.ca

    Affiliations Centre for Journalology, Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Canada, Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California, United States of America

    ORCID http://orcid.org/0000-0003-2434-4206

  • Florian Naudet,

    Affiliations Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California, United States of America, INSERM CIC-P 1414, Clinical Investigation Center, CHU Rennes, Rennes 1 University, Rennes, France

  • Ioana A. Cristea,

    Affiliations Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California, United States of America, Department of Clinical Psychology and Psychotherapy, Babeş-Bolyai University, Cluj-Napoca, Romania

  • Frank Miedema,

    Affiliation Executive Board, UMC Utrecht, Utrecht University, Utrecht, the Netherlands

  • John P. A. Ioannidis,

    Affiliations Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California, United States of America, Department of Medicine, Stanford University, Stanford, California, United States of America, Department of Health Research and Policy, Stanford University, Stanford, California, United States of America, Department of Biomedical Data Science, Stanford University, Stanford, California, United States of America, Department of Statistics, Stanford University, Stanford, California, United States of America

  • Steven N. Goodman

    Affiliations Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California, United States of America, Department of Medicine, Stanford University, Stanford, California, United States of America, Department of Health Research and Policy, Stanford University, Stanford, California, United States of America

Assessing scientists for hiring, promotion, and tenure

  • David Moher, 
  • Florian Naudet, 
  • Ioana A. Cristea, 
  • Frank Miedema, 
  • John P. A. Ioannidis, 
  • Steven N. Goodman
PLOS
x

Abstract

Assessment of researchers is necessary for decisions of hiring, promotion, and tenure. A burgeoning number of scientific leaders believe the current system of faculty incentives and rewards is misaligned with the needs of society and disconnected from the evidence about the causes of the reproducibility crisis and suboptimal quality of the scientific publication record. To address this issue, particularly for the clinical and life sciences, we convened a 22-member expert panel workshop in Washington, DC, in January 2017. Twenty-two academic leaders, funders, and scientists participated in the meeting. As background for the meeting, we completed a selective literature review of 22 key documents critiquing the current incentive system. From each document, we extracted how the authors perceived the problems of assessing science and scientists, the unintended consequences of maintaining the status quo for assessing scientists, and details of their proposed solutions. The resulting table was used as a seed for participant discussion. This resulted in six principles for assessing scientists and associated research and policy implications. We hope the content of this paper will serve as a basis for establishing best practices and redesigning the current approaches to assessing scientists by the many players involved in that process.

Introduction

Assessing researchers is a focal point of decisions about their hiring, promotion, and tenure. Building, writing, presenting, evaluating, prioritising, and selecting curriculum vitae (CVs) is a prolific and often time-consuming industry for grant applicants, faculty candidates, and assessment committees. Institutions need to make decisions in a constrained environment (e.g., limited time and budgets). Many assessment efforts assess primarily what is easily determined, such as the number and amount of funded grants and the number and citations of published papers. Even for readily measurable aspects, though, the criteria used for assessment and decisions vary across settings and institutions and are not necessarily applied consistently, even within the same institution. Moreover, several institutions use metrics that are well known to be problematic [1]. For example, there is a large literature on the problems with and alternatives to the journal impact factor (JIF) for appraising citation impact. Many institutions still use it to assess faculty through the quality of the literature they publish in, or even to determine monetary rewards [2].

That faculty hiring and advancement at top institutions requires papers published in journals with the highest JIF (e.g., Nature, Science, Cell) is more than just a myth circulating among postdoctoral students [36]. Emphasis on JIF does not make sense when only 10%–20% of the papers published in a journal are responsible for 80%–90% of a journal’s impact factor [7,8]. More importantly, other aspects of research impact and quality, for which automated indices are not available, are ignored. For example, faculty practices that make a university and its research more open and available through data sharing or education could feed into researcher assessments [9,10].

Few assessments of scientists focus on the use of good or bad research practices, nor do currently used measures tell us much about what researchers contribute to society—the ultimate goal of most applied research. In applied and life sciences, the reproducibility of findings by others or the productivity of a research finding is rarely systematically evaluated, in spite of documented problems with the published scientific record [11] and reproducibility across all scientific domains [1213]. This is compounded by incomplete reporting and suboptimal transparency [14]. Too much research goes unpublished or is unavailable to interested parties [15].

Using more appropriate incentives and rewards may help improve clinical and life sciences and their impact at all levels, including their societal value. We set out to ascertain what has been proposed to improve the evaluation of ‘life and clinical’ research scientists, how a broad spectrum of stakeholders view the strengths and weaknesses of these proposals, and what new ways of assessing scientists should be considered.

Methods

To help address this goal, we convened a 1-day expert panel workshop in Washington, DC, in January 2017. Pre-existing commentaries and proposals to assess scientists were identified with snowball techniques [16] (i.e., an iterative process of selecting articles; the process is often started with a small number of articles that meet inclusion criteria; see below) to examine the literature to ascertain what other groups are writing, proposing, or implementing in terms of how to assess scientists. We also searched the Meta-Research Innovation Center at Stanford (METRICS) research digest and reached out to content experts. Formal searching proved difficult (e.g., exp*Reward/(7408); reward*.ti,kw (9708), incentiv*.ti,kw. (5078)) and resulted in a very large number of records with low sensitivity and recall. We did not set out to conduct a formal systematic review of every article on the topic.

Broad criteria were used for article inclusion (the article focus had to be either bibliometrics, research evaluation, and/or management of scientists and it had to be reported in English). Two of us selected the potential papers and at least three of us reviewed and discussed each selection for its inclusion. From each included article we extracted the following information: authors, name of article/report and its geographic location, the authors’ stated perspective of the problem assessing research and scientists, the authors’ description of the unintended consequences of maintaining the current assessment scheme, the article’s proposed solutions, and our interpretation of the potential limitations of the proposal. The resulting table (early version of Table 1) along with a few specific publications was shared with the participants in advance of the meeting in the hopes that it would stimulate thinking on the topic and be a reference source for discussions.

thumbnail
Table 1. A list of sources examining the problems, potential unanticipated consequences, proposed solutions, and potential limitations when assessing science and scientists.

https://doi.org/10.1371/journal.pbio.2004089.t001

We invited 23 academic leaders: deans of medicine (e.g., Oxford), public and foundation funders (e.g., National Institutes of Health [NIH]), health policy organisations (e.g., Science Policy, European Commission; Belgium), and individual scientists from several countries. They were selected based on contributions to the literature on the topic and representation of important interests and constituencies. Twenty-two were able to participate (see S1 Table for a complete list of participants and their affiliations). Prior to the meeting, all participants were sent the results of a selected review of the literature distilled into a table (see Table 1) and several selected readings.

Table 1 served as the basis for an initial discussion about the problems of assessing scientists. This was followed by discussions of possible solutions to the problems, new approaches for promotion and tenure committees, and implementation strategies. All discussions were recorded, transcribed, and read by five coauthors. For this, six general principles were derived. This summary was then shared with all meeting participants for additional input.

Results

We included a list of 21 documents [11,1736] in Table 1. There has been a burgeoning interest in assessing scientists in the last 5 years (publication year range: 2012–January 2017). An almost equal number of documents originated from the US and Europe (one also jointly from Canada). We divided the documents into four categories: large group efforts (e.g., the Leiden Manifesto [20]), smaller group or individual efforts (e.g., Ioannidis and Khoury’s Productive, high-Quality, Reproducible, Shareable, and Translatable [PQRST] proposal [29]); journal activities (e.g., Nature [32]); and newer quantitative metric proposals (e.g., journal citation distributions [34]).

We interpreted all of the documents to describe the problems of assessing science and scientists in a similar manner. There is a misalignment between the current problems in research. The right questions are not being asked; the research is not appropriately planned and conducted; and when the research is completed, results remain unavailable, unpublished, or get selectively reported; reproducibility is lacking as is evidence about how scientists are incentivised and rewarded. We interpreted that several of the documents pointed to a disconnect between the production of research and the needs of society (i.e., productivity may lack translational impact and societal added value). We paraphrased the views expressed across several of the documents: ‘we should be able to improve research if we reward scientists specifically for adopting behaviours that are known to improve research’. Finally, many of the documents described the JIF as an inadequate measure for assessing the impact of scientists [19,20,29]. The JIF is commonly used by academic institutions [4,5,37] to assess scientists, although there are efforts to encourage them not to do so. Not modifying the current assessment system will likely result in the continued bandwagon behaviour that has not always resulted in positive societal behaviour [25,38]. Acting now to consider modifying the current assessment system might be seen as a progressive move by current scientific leaders to improve how subsequent generations of scientists are evaluated.

Large group proposals for assessing scientists

Nine large group efforts were included [11,1724] (see Table 1), representing different stakeholder groups. The Leiden Manifesto [20] and the Declaration on Research Assessment (DORA) [19] were both developed at academic society meetings and are international in their focus, whereas the Metric Tide [21] was commissioned by the UK government (operating independently) and is more focused on the UK academic marketplace.

The Leiden Manifesto authors felt that research evaluation should positively support the development of science and its interactions with society. It proposes 10 best practices, for example, that expert qualitative assessment should take precedence, supported by the quantitative evaluation of a researcher using multiple indices. Several universities have recently pledged to adopt these practices [39].

The San Francisco DORA, which is also gaining momentum [4042], was first developed by editors and publishers and focuses almost exclusively on the misuse of the JIF. DORA describes 17 specific practices to diminish JIF dependence by four stakeholder groups: scientists, funders, research institutions, and publishers. DORA recommends focusing on the content of researchers’ output, citation of the primary literature, and using a variety of metrics to show the impact. Within evidence-based medicine, systematic reviews are considered stronger evidence than individual studies. DORA’s clarification about citations and systematic reviews might facilitate its endorsement and implementation within faculties of medicine.

The National Academy of Sciences proposed that scientists should be assessed for the impact of their work rather than the quantity of it [22]. Research impact is also raised as an important new assessment criterion in the UK’s Research Excellence Framework (REF) [24], an assessment protocol in which UK higher education institutions and their faculty are asked to rate themselves on three domains (outputs, impact, and environment) across 36 disciplines. These assessments are linked to approximately a billion Great Britain Pounds (GBPs) of annual funding to universities, of which 20% (to be increased to 25% in the next round) is based on the impact of the faculty member’s research. The inclusion of assessing impact (e.g., through case studies) has fostered considerable discussion across the UK research community [43]. Some argue it is too expensive, that pitting universities against each other can diminish cooperation and collegiality, that it does not promote innovation, and that it is redundant [44]. The Metric Tide, which has been influential in the UK as it relates to that country’s REF, made 20 recommendations related to how scientists should be assessed [21]. It recommends that universities should be transparent about how faculty assessment is performed and the role that bibliometrics plays in such assessment, a common theme across many of the efforts (e.g., [19,20]).

Individual or small group proposals for assessing scientists

Six smaller group or individual proposals were included [2530] (see Table 1). For example, Mazumdar and colleagues discussed the importance of rewarding biostatisticians for their contributions to team science [28]. They proposed a framework that separately assesses biostatisticians for their unique scientific contributions, such as the design of a grant application and/or research protocol, and teaching and service contributions, including mentoring in the grant application process.

Ioannidis and Khoury have proposed scientist assessments revolving around ‘PQRST’: Productivity, Quality, Reproducibility, Sharing, and Translation of research [29]. Benedictus and Miedema describe approaches currently being used at the Utrecht Medical Centre, the Netherlands [25]. For the Utrecht assessment system, a set of indicators was defined and introduced to evaluate research programs and teams that mix the classical past-performance bibliometric measures with process indicators. These indicators include evaluation of leadership, culture, teamwork, good behaviours and citizenship, and interactions with societal stakeholders in defining research questions, the experimental approaches, and the evaluation of the ongoing research and its results. The latter includes semi-hybrid review procedures, including peers and stakeholders from outside academia. The new assessment for individual scientists is complemented by a semi-qualitative assessment in a similar vein for our multidisciplinary research programs. Taking a science policy perspective, these institutional policies are currently being evaluated with the Centre for Science and Technology Studies (CSTS) in Leiden by investigating how evaluation practices reshape the practice of academic knowledge production.

Not surprisingly, many of these proposals and individual efforts overlap. For example, the Leiden Group’s fourth recommendation, ‘Keep data collection and analytical processes open, transparent and simple’, is similar to DORA’s 11th specific (of 17) recommendation, ‘Be open and transparent by providing data and methods used to calculate all metrics’. Some groups use assessment tools as part of their solutions. The Academic Careers Understood through Measurement and Norms (ACUMEN) group produced a weblike document system [17], whereas the Mazumdar group created checklists [28].

Groups targeted different groups of stakeholders. The Nuffield Council on Bioethics targeted funders, research institutions, publishers and editors, scientists, and learned societies and professional bodies [23], as did the Reduce research Waste And Reward Diligence (REWARD) Alliance, who added regulators as one of their target groups [11].

Most of the proposals were aspirational in that they were silent on details of how exactly faculty assessment committees could implement their proposals, what barriers to implementation existed, and how to overcome them. Integrating behavioural scientists and implementation scientists into implementation discussions would be helpful. Ironically, most of the proposals do not discuss whether or how to evaluate the impact of adopting them.

Journal proposals for assessing scientists

Journals may not appear to be the most obvious group to weigh in on reducing the reliance on bibliometrics to assess scientists. Traditionally, they have been focused on (even obsessed with) promoting their JIFs. Yet, some journals are beginning to acknowledge the JIF’s possible limitations. For example, the PLOS journals do not promote their JIFs [45]. BioMed Central and Springer Open have signed DORA, stating, ‘In signing DORA we support improvements in the ways in which the output of scientific research is evaluated and have pledged to “greatly reduce emphasis on the journal Impact Factor as a promotional tool by presenting the metric in the context of a variety of journal-based metrics”’ [46].

We included two journal proposals [31,32]. Nature has proposed, and in some cases implemented, a broadening of how they use bibliometrics to promote their journals [32]. We believe a reasonable short-term objective is for journals to provide more potentially credible ways for scientists to use journal information in their assessment portfolio. eLife has proposed a menu of alternative metrics that go beyond the JIF, such as social and print media impact metrics, that can be used to complement article and scientist assessments [31].

Journals can also be an instrument for promoting best publication practices among scientists, such as data sharing, and academic institutions can focus on rewarding scientists for employing those practices rather than quantity of publications alone. Reporting biases, including publication bias and selective outcome reporting, are prevalent [14]. A few journals, particularly in the psychological sciences, have started using a digital badge system that promotes data sharing, although there has been some criticism of them [47]. The journal Psychological Science has evaluated whether digital badges result in more data sharing [48]. Such badges may potentially be used for assessing scientists based on whether they have adhered to these good publication practices. As of mid-2017, 52 mostly psychology journals have agreed to review and accept a paper at the protocol stage if a ‘registered report’ has been recorded in a dedicated registry [49]. If such initiatives are successful, assessors could reward scientists for registering their protocols, as with clinical trials and systematic reviews they can currently monitor timely registration in one of several dedicated registries.

Proposals to improve quantitative metrics

There are many efforts to improve quantitative metrics in the vast and rapidly expanding field of scientometrics. We included four proposals [3336]. An influential group including editors (Nature, eLife, Science, EMBO, PLOS, The Royal Society), a scientometrician, and an advocate for open science proposed that journals use the journal citation distribution instead of the JIF [36]. This allows readers to examine the skewness and variability in citations of published articles. Similarly, the relative citation ratio (RCR) or the Source Normalized Impact per Paper (SNIP) have been proposed to adjust the citation metric by content field, addressing one of the JIF’s deficiencies [19]. Several proposals have recognised the need for field-specific criteria [20,21]. However, the value of different normalisations needs further study [50] and there has been criticism of the proposed RCR [51]. There are currently dozens of citation indicators that can be used alone or in combination [52]. Each has its strengths and weaknesses, including the possibility for ‘gaming’ (i.e., manipulation by the investigator).

Besides citation metrics, there is also increasing interest in developing standardised indicators for other features of impact or research practices, for example, alternative metrics, as discussed above. For example, Twitter, Facebook, or lay press discussions might indicate influence on or accessibility to patients. However, alternative metrics can also be gamed and social media popularity may not correlate with scientific, clinical, and/or societal benefit. An ‘S-index’ has been proposed to assess data sharing [36], albeit with limitations (see Table 1).

Principles for assessing scientists

Six general principles emerged from the discussions, each with research and policy implications (see Table 2). Several have been proposed previously [53].

thumbnail
Table 2. Key principles, participant dialogue, and research and policy implications when assessing scientists.

https://doi.org/10.1371/journal.pbio.2004089.t002

The first principle is that contributing to societal needs is an important goal of scholarship. Focusing on research that addresses the societal need and impact of research requires a broader, outward view of scientific investigation. The principle is based on academic institutions in society, how they view scholarship in the 21st century, the relevance of patients and the public, and social action [10]. If promotion and tenure committees do not reward these behaviours, or penalise practices that diminish the social benefit of research, maximal fulfillment of this goal is unlikely [25].

The second principle is that assessing scientists should be based on evidence and indicators that can incentivise best publication practices. Several new ‘responsible indicators for assessing scientists’ (RIAS’s) were proposed and discussed. These include assessing registration (including registered reports); sharing results of research; reproducible research reporting; contributions to peer review; alternative metrics (e.g., uptake of research by social media and print media) assessed by several providers, such as Altmetric.com; and sharing of datasets and software assessed through Impact Story [54]). Such indicators should be measured objectively and accurately, as publication and citation tools do currently. Some assessment items, such as reference letters from colleagues and stakeholders affected by the research, cannot be converted into objective measurements, but one may still formally investigate their value [55].

As with any new measures, RIAS characteristics need to be studied in terms of ease of collection, their frequencies and distributions in different fields and institutions, the kind of systems needed to implement them, and their usefulness in both evaluation and modifying researcher behaviours and the extent to which each may be gamed. Different institutions could and should experiment with different sets of RIAS’s to assess their feasibility and utility. Ultimately, if there were enough consensus around a core set, institutional research funding could be tied to their collection, such as underlies successful implementation of Athena Scientific Women's Academic Network (SWAN) for advancing gender equity, which has been highly successful in the UK [56].

One barrier to implementation of any RIAS scheme is whether it would affect current university rankings (e.g., Times Higher Education World University Rankings). Productivity, measured in terms of publication output, is an important input into such rankings. Participants felt that any RIAS dashboard could be included in or as an alternative to university ranking schemes. However, these ranking systems are themselves problematic; the Leiden CSTS has recently proposed 10 principles regarding the responsible use of such ranking systems [57].

The third principle is that all research should be published completely and transparently, regardless of the results. Academic institutions could implement policies in the promotion process to review complete reporting of all research, and/or penalise noncompleted or nonpublished research—particularly clinical trials, which must be registered. For nonclinical research, participants discussed the need to reward other types of openness, such as sharing of datasets, materials, software and methods used, and explicit acknowledgment of their exploratory nature, when appropriate [58]. Finally, finding fair ways to reward team endeavors is critical, given the growing collaborative nature of research, which bibliometrics cannot properly assess. For example, some promotion and tenure committees largely disregard work for which the faculty candidate is not the first or senior author [4]. Conversely, citation metrics that do not correct for multiple coauthorship and thus authors who are just appearing in long author mastheads can result in inappropriately high citation metrics.

The fourth principle relates to openness—facilitating dissemination and use of research data and results by others. Researchers can share their data, procedures, and code in various ways, such as in open access repositories. Some journals are supporting this process by endorsing and implementing the transparency and openness promotion (TOP) guidelines [59]. Groups that rank universities can also support this principle by sharing the underlying data used to make their assessments.

The fifth principle requires investing in research to provide the necessary evidence to guide the development of new assessment criteria and to evaluate the merits of existing ones. Funders are the ones to make such investments and some, particularly in Europe (e.g., Netherlands Organisation for Scientific Research), have already started.

The final principle involves rewarding researchers for intellectual risk-taking that might not be reflected in early successes or publications. The need for a young researcher to obtain their own funding early often results in a conservatism that is inimical to groundbreaking work at a time when they might be the most creative. Changing assessments to evaluate and reward such hypotheses might encourage truly creative research. It is also possible to conduct some forms of research with limited funding [60].

Implementation

A challenge introducing any of these principles, or other new ideas, is how best to operationalise them. The TrialsTracker tool [61] enables institutions from around the world (with more than 30 trials) to monitor their trial reporting. Although the tool has limitations [62], it has a low barrier to implementation and provides a useful and easy starting point for audit and feedback. Promotion and tenure committees could receive such data as part of annual faculty assessment. They could also ask scientists to modify their CVs to incorporate information about registration through indicating the name and registration number of the registry, whether they have participated in a journal’s registered reports program, and a citation of the completed and published study. For each new initiative, it is important to generate evidence, ideally from experimental studies, on whether it leads to better outcomes.

Participants also discussed that efforts to reward good, rigorously conducted evaluation should not come at the expense of stifling creative ‘blue-sky’ research primarily aimed at understanding biologic processes.

Moving forward

Current systems reward scientific innovation, but if we want to improve research reproducibility, we need to find ways to reward scientists who focus on it [6365]. A scientist who detects analytical errors in published science and works with the authors to help correct the error needs to have such work recognised. This benefits the original scientists, participants in the original research, the journal publishing the original research, the field, and society. The authors of the original report could include documentation, perhaps in the form of an impact letter, attesting to the value of the reproducibility efforts, which could be included in the evaluation portfolio.

High-quality practice guidelines are evidence based, typically using systematic reviews as one of their foundational building blocks. We likely need to develop similar evidence-based approaches when assessing scientists. Despite some criticism, the UK’s REF is a step in this direction [66,67]. The metrics marketplace is large and confusing. Institutions can choose or pick metrics with an evidence base and endorsed by reputable organizations: the NIH sponsored development of the RCR and the SNIP was developed by Leiden University’s CSTS. Regardless of which approach is adopted, evidence on the accuracy, validity, and impact of indicators is necessary.

If best practices for appraising scientists can be identified, achieving widespread adoption will be a major challenge [68]. Ultimately, this may depend on institutional values, which might be elicited from the institution’s faculty. Junior faculty may put a high value on open access publications [69]. If open access were to become part of RIAS [70] and included in faculty assessments, the institution would need to support open access fees. Committed support from leadership and senior faculty would be needed to implement the policy points discussed in Table 2. Finally, implementation for some of the six principles should be easier if stakeholders worked collaboratively, so as not to work at cross-purposes.

Institutional promotion and tenure committee guidelines are not easily available outside researchers, although there is an effort underway to compile them [71]. If institutions made them available, this information could be used as a baseline to gauge changes in criteria and also to disseminate institutional innovations. We also call for institutions to examine their own awards and promotion practices to understand how their high-level criteria are being operationalised and to see the effect of criteria such as counting the number of first and last author publications. Funders can also make widely available what criteria they use from grant applications to assess scientists.

Whether implemented at the local or national level, changes in assessment criteria should be fully documented and made openly available. Institutions making changes to their promotion and tenure criteria and faculty assessment should implement an evaluation component as part of the process. Evaluations using experimental approaches are likely to provide the most internally valid results and may offer greater generalizability. Stepped wedge designs [72] or interrupted time series [73] might be appropriate for assessing the effects of individual or multiple department promotion and tenure committees’ uptake of new assessment criteria for scientists, together with audit and feedback. These data can inform the development of new systems [74].

A few funders have set aside specific funding streams to fund research on research. The Dutch Medical Research Council has established a funding stream called ‘Responsible Research Practices’, which recently awarded eight grants. A central aspect of the approach taken by Mark Taylor, the new Head of Impact for the UK National Institute for Health Research (NIHR) is to ‘… give patients more tools to help shape the research future, their future itself’ [75]. Widening the spectrum of funders making such investments would serve as a powerful message about their values to both researchers and institutions.

We did not complete a systematic review (including other fields, such as engineering), the results of which may provide additional knowledge to what we have reported here. The principles are not comprehensive. They reflect discussions between participants. This field of research and researcher assessments is currently fragmented with an uneven evidence base that has an enormous volume of publications on some topics (e.g., scientometrics) and little evidence on others. As the field grows, we hope it generates stronger data to help inform decision-making. The research implications in Table 2 can be a starting point for investing more heavily in providing that evidence. However, when a new assessment measure is developed and evaluated, it may fall prey to Goodhart’s law [76] (i.e., it ceases as a valid measurement when it becomes an optimisation target); the unintended effects of individuals or institutions trying to optimise these measures will require close attention.

How we evaluate scientists reflects what we value most—and don't—in the scientific enterprise and powerfully influences scientists’ behaviour. Widening the scope of activities worthy of academic recognition and reward will likely be a slow and iterative process. The principles here could serve as a road map for change. While the collective efforts of funders, journals, and regulators will be critical, individual institutions will ultimately have to be the crucibles of innovation, serving as models for others. Institutions that monitor what they do and the changes that result would be powerful influencers of the shape of our collective scientific future.

Supporting information

S1 Table. Name, portfolio, and affiliation of workshop organisers and participants.

https://doi.org/10.1371/journal.pbio.2004089.s001

(DOCX)

Acknowledgments

We would like to thank all the participants who contributed to the success of the workshop.

We thank the following people for commenting on an earlier version of the paper: Drs. R. (Rinze) Benedictus, Policy Advisor, UMC Utrecht & Science in Transition, the Netherlands; Stephen Curry, PhD, Imperial College, London, UK; Ulrich Dirnagl, MD, PhD, Professor, Center for Stroke Research and Departments of Neurology and Experimental Neurology Charité –Universitätsmedizin, Berlin, Germany; Trish Groves, MRCPsych, Director of Academic Outreach and Advocacy, BMJ Publishing Group; Chonnettia Jones, PhD, Director of Insight and Analysis, Wellcome Trust, UK; Michael S. Lauer, MD, Deputy Director for Extramural Research, NIH; Marcia McNutt, PhD, President, National Academy of Sciences, Washington; Malcolm MacLeod, MD, Professor of Neurology and Translational Neuroscience, University of Edinburgh, Scotland; Sally Morton, PhD, Dean of Science, Virginia Tech, Blacksburg, VA; Daniel Sarewitz, PhD, Co-Director, Consortium for Science, Policy & Outcomes, School for the Future of Innovation in Society, Washington, DC; René Von Schomberg, PhD, Team Leader–Science Policy, European Commission, Belgium; James R. Wilsdon, PhD, Director of Impact & Engagement, Faculty of Social Sciences University of Sheffield, UK; Deborah Zarin, MD, Director clinicaltrials.gov, US.

References

  1. 1. Hammarfelt B. Recognition and reward in the academy: valuing publication oeuvres in biomedicine, economics and history. Aslib J Inform Manag 2017; 69(5):607–23.
  2. 2. Quan W, Chen B, Shu F. Publish Or Impoverish: An Investigation Of The Monetary Reward System Of Science In China (1999–2016).[Internet]. Available from: https://arxiv.org/ftp/arxiv/papers/1707/1707.01162.pdf. Last accessed: 22Feb2018.
  3. 3. Harley D, Acord SK, Earl-Novell S, Lawrence S, King CJ. (2010). Assessing the Future Landscape of Scholarly Communication: An Exploration of Faculty Values and Needs in Seven Disciplines. [Internet] UC Berkeley: Center for Studies in Higher Education. Available from: https://cshe.berkeley.edu/publications/assessing-future-landscape-scholarly-communication-exploration-faculty-values-and-needs. Last accessed: 22Feb2018.
  4. 4. Walker RL, Sykes L, Hemmelgarn BR, Quan H. Authors' opinions on publication in relation to annual performance assessment. BMC Med Educ 2010 Mar 9;10:21. pmid:20214826
  5. 5. Tijdink JK, Schipper K, Bouter LM, Maclaine Pont P, de Jonge J, Smulders YM. How do scientists perceive the current publication culture? A qualitative focus group interview study among Dutch biomedical researchers. BMJ Open 2016;6:e008681. pmid:26888726
  6. 6. Sturmer S, Oeberst A, Trotschel R, Decker O. Early-career researchers’ perceptions of the prevalence of questionable research practices, potential causes, and open science. Soc Psychol 2017; 48(6): 365–371.
  7. 7. Garfield E. The history and meaning of the journal impact factor. JAMA 2006; 295(1):90–3. pmid:16391221
  8. 8. Brembs B, Button K, Munafò M. Deep impact: unintended consequences of journal rank. Front Hum Neurosci 2013;7:article291.
  9. 9. Rouleau G. Open Science at an institutional level: an interview with Guy Rouleau. Genome Biol 2017 Jan 20;18(1):14. pmid:28109193
  10. 10. McKiernan EC. Imagining the ‘open’ university: sharing to improve research and education. PLoS Biol 2017; 15(10):e1002614. pmid:29065148
  11. 11. Kleinert S, Horton R. How should medical science change? Lancet 2014; 383:197–8 pmid:24411649
  12. 12. Begley CG, Ellis LM. Drug development: raise standards for preclinical cancer research. Nature 2012;483(7391):531–3. pmid:22460880
  13. 13. Ioannidis JP. Acknowledging and overcoming nonreproducibility in basic and preclinical research. JAMA 2017;317(10):1019–20. pmid:28192565
  14. 14. Glasziou P, Altman DG, Bossuyt P, Boutron I, Clarke M, Julious S, et al. Reducing waste from incomplete or unusable reports of biomedical research. Lancet 2014;383(9913):267–76. pmid:24411647
  15. 15. Chan A-W, Song F, Vickers A, Jefferson T, Dickersin K, Gøtzsche PC, et al. Increasing value and reducing waste: addressing inaccessible research. Lancet 2014; 383(9913):257–66. pmid:24411650
  16. 16. Heckathorn DD. Snowball versus respondent-driven sampling. Sociol Methodol 2011; 41(1): 355–366. pmid:22228916
  17. 17. Final report summary—ACUMEN (Academic careers understood through measurement and norms).[Internet] Community Research and Development Information Service. European Commission. Available from: http://cordis.europa.eu/result/rcn/157423_en.pdf. Last accessed: 22Feb2018.
  18. 18. Amsterdam call for action on open science.[Internet] The Netherlands EU Presidency 2016. Available from: https://f-origin.hypotheses.org/wp-content/blogs.dir/1244/files/2016/06/amsterdam-call-for-action-on-open-science.pdf. Last Accessed: 22Feb2018.
  19. 19. American Society for Cell Biology. DORA. Declaration on Research Assessment. [Internet] Available from: http://www.ascb.org/dora/. Last accessed: 22Feb2018.
  20. 20. Hicks D, Wouters P, Waltman L, de Rijcke S, Rafols I. Bibliometrics: The Leiden Manifesto for research metrics. Nature 2015;520(7548):429–31. pmid:25903611
  21. 21. Wilsdon J. The metric tide: Report of the independent review of the role of metrics in research assessment and management.[Internet] University of Sussex. 2015. Available from: blogs.lse.ac.uk/impactofsocialsciences/files/2015/07/2015_metrictide.pdf. Last accessed: 22Feb2018.
  22. 22. Alberts B, Cicerone RJ, Fienberg SE, Kamb A, McNutt M, Nerem RM, et al. Scientific Integrity. Self-correction in science at work. Science 2015;348(6242):1420–2. pmid:26113701
  23. 23. The culture of scientific research in the UK. Nuffield Council on Bioethics. [Internet] Available from: http://nuffieldbioethics.org/wp-content/uploads/Nuffield_research_culture_full_report_web.pdf. Last Accessed: 22Feb2018.
  24. 24. Panel criteria and working methods.[Internet] [REF 2014/REF 01.2012.] Available from: https://www.imperial.ac.uk/media/imperial-college/research-and-innovation/public/Main-panel-criteria.pdf; http://www.ref.ac.uk/2014/media/ref/content/pub/REF%20Brief%20Guide%202014.pdf. Last Accessed: 22Feb2018.
  25. 25. Benedictus R, Miedema F. Fewer numbers, better science. Nature 2016;538(7626):453–5. pmid:27786219
  26. 26. Edwards MA, Roy S. Academic research in the 21st century: Maintaining scientific integrity in a climate of perverse incentives and hypercompetition. Environ Eng Sci 2017. 34, 51. pmid:28115824
  27. 27. Ioannidis JPA. How to make more published research true. PLoS Med 2014; 11(10): e1001747. pmid:25334033
  28. 28. Mazumdar M, Messinger S, Finkelstein DM, Goldberg JD, Lindsell CJ, Morton SC, et al. Evaluating academic scientists collaborating in team-based research: A proposed framework. Acad Med 2015;90(10):1302–8. pmid:25993282
  29. 29. Ioannidis JP, Khoury MJ. Assessing value in biomedical research: the PQRST of appraisal and reward. JAMA 2014;312:483–4. pmid:24911291
  30. 30. Nosek BA, Spies JR, Motyl M. Scientific Utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspect Psychol Sci 2012;7(6):615–31. pmid:26168121
  31. 31. Schekman R, Patterson M. Reforming research assessment. eLife 2013;2:e00855. pmid:23700504
  32. 32. Time to remodel the journal impact factor. Nature 2016;535(7613):466. pmid:27466089.
  33. 33. Hutchins BI, Yuan X, Anderson JM, Santangelo GM. Relative Citation Ratio (RCR): A new metric that uses citation rates to measure influence at the article level. PLoS Biol 2016; 14(9): e1002541. pmid:27599104
  34. 34. Larivière V, Kiermer V, MacCallum CJ, McNutt M, Patterson M, Pulverer B, et al. A simple proposal for the publication of journal citation distributions. bioRxiv. Available from: https://www.biorxiv.org/content/biorxiv/early/2016/09/11/062109.full.pdf. Last accessed: 22Feb2018.
  35. 35. Cantor M, Gero S. The missing metric: quantifying contributions of reviewers. R Soc Open Sci 2015; 2 (2): 140540. pmid:26064609
  36. 36. Olfson M, Wall MM, Blanco C. Incentivizing data sharing and collaboration in medical research-the S-Index. JAMA Psychiatry 2017;74(1):5–6. pmid:27784040
  37. 37. Moher D, Goodman SN, Ioannidis JP. Academic criteria for appointment, promotion and rewards in medical research: where’s the evidence? Eur J Clin Invest 2016; 46(5):383–5. pmid:26924551
  38. 38. Brookshire B. Blame bad science incentives for bad science. [Internet] https://www.sciencenews.org/blog/scicurious/blame-bad-incentives-bad-science. Last accessed: 22Feb2018.
  39. 39. Johnson B. The road to the responsible research metrics forum. Higher education funding council for England.[Internet] Available from: http://blog.hefce.ac.uk/2017/03/24/the-road-to-the-responsible-research-metrics-forum/. Last Accessed 22Feb2018.
  40. 40. Imperial College London signs DORA. [Internet] Available from: http://www3.imperial.ac.uk/newsandeventspggrp/imperialcollege/newssummary/news_8-2-2017-12-28-7. Last accessed: 22Feb2018.
  41. 41. Gadd E. When are journal metrics useful? A balanced call for the contextualized and transparent use of all publication metrics. [Internet] LSE Impact Blog. Available from: http://blogs.lse.ac.uk/impactofsocialsciences/2015/11/05/when-are-journal-metrics-useful-dora-leiden-manifesto/ Last accessed: 22Feb2018.
  42. 42. Birkbeck signs San Francisco Declaration on Research Assessment. [Internet] Available from: http://tagteam.harvard.edu/hub_feeds/3649/feed_items/2224509 Last accessed: 22Feb2018.
  43. 43. Terama E, Smallman M, Lock SJ, Johnson C, Austwick MZ. Beyond Academia -Interrogating Research Impact in the Research Excellence Framework. PLoS One 2016;11(12):e0168533. pmid:27997599
  44. 44. Sayer D. Five reasons why the REF is not fit for purpose. [Internet] The Gaurdian 2014 15 Dec. Available from: https://www.theguardian.com/higher-education-network/2014/dec/15/research-excellence-framework-five-reasons-not-fit-for-purpose Last accessed: 22Feb2018.
  45. 45. Public Library of Science. PLOS and DORA. [Internet]. Available at: https://www.plos.org/dora. Last Accessed: 15-Feb-2018.
  46. 46. Burley, R. BioMed Central and SpringerOpen sign the San Francisco Declaration on Research Assessment. [Internet]. Available at: http://blogs.biomedcentral.com/bmcblog/2017/04/26/biomed-central-and-springeropen-sign-the-san-francisco-declaration-on-research-assessment/. Last Accessed: 14-Feb-2018.
  47. 47. Bastian H. Bias in Open Science Advocacy: The Case of Article Badges for Data Sharing. PLOS Blogs. Posted August 29, 2017. [Internet] Available from: http://bit.ly/2jq4eR6. Last accessed: 22Feb2018.
  48. 48. Kidwell MC, Lazarevi LB, Baranski E, Hardwicke TE. Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLoS Biol 2016: 14(5): e1002456. pmid:27171007
  49. 49. Nosek BA, Lakens D. Registered reports: a method to increase the credibility of published results. Soc Psychol 2014; 45: 137–41.
  50. 50. Ioannidis JPA, Boyack K, Wouters PF. Citation Metrics: A primer on how (not) to normalize. PLoS Biol 2016; 14(9): e1002542. pmid:27599158
  51. 51. Janssens ACJW, Goodman M, Powell KR, Gwinn M. A critical evaluation of the algorithm behind the Relative Citation Ratio (RCR). PLoS Biol 2017; 15(10): e2002536. pmid:28968388
  52. 52. Ioannidis JP, Klavans R, Boyack KW. Multiple citation indicators and their composite across scientific disciplines. PLoS Biol 2016; 14(7): e1002501. pmid:27367269
  53. 53. Boyer L. Scholarship reconsidered: priorities of the professoriate. Lawrenceville, NJ: Princeton University Press, 1990. 152p.
  54. 54. Lapinski S, Piwowar H, Priem J. Riding the crest of the altmetrics wave: How librarians can help prepare faculty for the next generation of research impact metrics. Coll Res Libraries News 2013; 74(6): 292–300.
  55. 55. Zare RN. Assessing academic researchers. Angew Chem Int Ed Engl 2012;51(30):7338–9. pmid:22513978
  56. 56. Ovseiko PV, Chapple A, Edmunds LD, Ziebland S. Advancing gender equality through the Athena SWAN Charter for Women in Science: an exploratory study of women's and men's perceptions. Health Res Policy Syst 2017;15(1):12. pmid:28222735
  57. 57. CWTS Leiden Ranking. Responsible use. [Internet] Available from http://www.leidenranking.com/information/responsibleuse. Last Accessed 22Feb2018.
  58. 58. Pasterkamp G, Hoefer I, Prakken B. Lost in citation valley. Nat Biotechnol 2016;34(10):1016–1018. pmid:27727210
  59. 59. Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, et al. Promoting an open research culture. Science 2015; 348(6242):1422–1425. pmid:26113702.
  60. 60. Ioannidis JPA. Defending biomedical science in an era of threatened funding. JAMA 2017;317(24):2483–2484. pmid:28459974
  61. 61. Powell-Smith A, Goldacre B. The TrialsTracker: automated ongoing monitoring of failure to share clinical trial results by all major companies and research institutions. F1000Res 2016;5:2629 pmid:28105310
  62. 62. Coens C, Bogaerts J, Collette L. Comment on the “TrialsTracker: Automated ongoing monitoring of failure to share clinical trial results by all major companies and research institutions.” [version 1; referees: 1 approved, 1 approved with reservations] F1000 Res 2017;6:71.
  63. 63. Flier J. Faculty promotion must assess reproducibility. Nature 2017;549(7671):133. pmid:28905925
  64. 64. Mogil JS, MacLeod MR. No publication without confirmation. Nature 2017;542(7642):409–411. pmid:28230138
  65. 65. Topol EJ. Money back guarantees for non-reproducible results? BMJ 2016;353:i2770. pmid:27221803
  66. 66. Terämä E, Smallman M, Lock SJ, Johnson C, Austwick MZ. Beyond Academia–Interrogating Research Impact in the Research Excellence Framework. PLoS ONE 2016; 11(12): e0168533. pmid:27997599
  67. 67. Manville C, Jones MM, Frearson M, Castle-Clarke S, Henham M, Gunashekar S, et al. Preparing impact submissions for REF 2014: An evaluation: Findings and observations. [Internet] Santa Monica, CA: RAND Corporation, 2015. https://www.rand.org/pubs/research_reports/RR727.html. Last Accessed: 22Feb2017.
  68. 68. Gaind N. Few UK universities have adopted rules against impact-factor abuse. [Internet] Nature 2018 Accessed from: https://www.nature.com/articles/d41586-018-01874-w. Last accessed 14Feb2018.
  69. 69. Piwowar H, Priem J, Lariviere V, Alperin JP, Matthias L, Norlander B, et al. The state of OA: a large scale analysis of the prevalence and impact of open access articles.[Internet] PeerJ Preprints Available from: https://doi.org/10.7287/peerj.preprints.3119v1 Last accessed: 22Feb2018.
  70. 70. Odell J, Coates H, Palmer K. Rewarding open access scholarship in promotion and tenure: driving institutional change. C&RL News 2016;77:7.
  71. 71. Assessing current practices in the review, promotion and tenure process. [Internet] Available from: https://publishing.sfu.ca/7297-review-promotion-tenure-project/. Last Accessed: 22Feb2018.
  72. 72. Hemming K, Haines TP, Chilton PJ, Girling AJ, Lilford RJ. The stepped wedge cluster randomised trial: rationale, design, analysis, and reporting. BMJ 2015;350:h391. pmid:25662947
  73. 73. Kontopantelis E, Doran T, Springate DA, Buchan I, Reeves D. Regression based quasi-experimental approach when randomisation is not an option: interrupted time series analysis. BMJ 2015;350:h2750. pmid:26058820
  74. 74. Ivers N, Jamtvedt G, Flottorp S, Young JM, Odgaard-Jensen J, French SD, et al. Audit and feedback: effects on professional practice and healthcare outcomes. Cochrane Database Syst Rev 2012;(6):CD000259. pmid:22696318
  75. 75. Taylor M. What impact does research have? [Internet] BMJ Opinion. Available from: http://blogs.bmj.com/bmj/2017/05/10/mark-taylor-what-impact-does-our-research-have/. Last Accessed 22Feb18.
  76. 76. Biagioli M. Watch out for cheats in citation game. Nature 2016;535(7611):201. pmid:27411599
  77. 77. Piwowar HA, Day RS, Fridsma DB. Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2007;2(3): e308. pmid:17375194