Which Species Are We Researching and Why? A Case Study of the Ecology of British Breeding Birds

Our ecological knowledge base is extensive, but the motivations for research are many and varied, leading to unequal species representation and coverage. As this evidence is used to support a wide range of conservation, management and policy actions, it is important that gaps and biases are identified and understood. In this paper we detail a method for quantifying research effort and impact at the individual species level, and go on to investigate the factors that best explain between-species differences in outputs. We do this using British breeding birds as a case study, producing a ranked list of species based on two scientific publication metrics: total number of papers (a measure of research quantity) and h-index (a measure of the number of highly cited papers on a topic – an indication of research quality). Widespread, populous species which are native, resident and in receipt of biodiversity action plans produced significantly higher publication metrics. Guild was also significant, birds of prey the most studied group, with pigeons and doves the least studied. The model outputs for both metrics were very similar, suggesting that, at least in this example, research quantity and quality were highly correlated. The results highlight three key gaps in the evidence base, with fewer citations and publications relating to migrant breeders, introduced species and species which have experienced contractions in distribution. We suggest that the use of publication metrics in this way provides a novel approach to understanding the scale and drivers of both research quantity and impact at a species level and could be widely applied, both taxonomically and geographically.


Introduction
The knowledge base for wildlife ecology is extensive, however with research motivations many and varied, the representation of species within this knowledge base is unequal, not only in terms of species and subject area, but also with respect to paper "quality" or "impact" (e.g. citation record) [1][2][3][4][5][6]. With many ecological studies used to guide species conservation, management and policy it is important that knowledge gaps are identified and understood. In this paper we describe a novel method for the quantification of research effort and impact at the individual species level, and also present an investigation of the factors that best explain between-species differences in outputs. While a body of literature on research effort between and within species does exist [1][2][3][4][5][6] this is one of the first papers to also include an estimate of scientific "impact". We illustrate this method using British breeding birds as a case study. British breeding birds provide an excellent test case as they are a very well-studied group, with a varied range of research motivations including ease of study (many species are common and easy to study both in the wild and captivity), a largely positive public perception (the Royal Society for the Protection of Birds is the largest membership environment organisation in Europe [7]), interests of individual researchers (many researchers, or research groups have studied individual species over many decades e.g. Great Tits Parus major [8]), and changes in conservation status.

Materials and Methods
Two pre-existing publication metrics were selected for use-total number of papers per species (a measure of research volume) and species h-index (an indication of volume PLUS "quality"). Developed by Hirsch [9] as a means of measuring the impact and sustainability of scientific output of individual researchers [10], h-index differs from other publication metrics in that it highlights papers which are regarded by fellow scientists as worthy of citation. This paper uses the h-index approach in a novel way-to assess the volume and impact of papers about British breeding bird species, using "individual species" in place of "individual researcher". 225 species were included in the analysis-all species classed as breeding by the British Ornithologists Union (resident breeder, introduced breeder, migrant breeder, has bred, may have bred [11], plus two additional species (Eagle owl Bubo bubo and Monk parakeet Myiopsitta monachus). While not officially classed as breeding, these species are widely regarded as doing so, and are known to have conflicts with human interests [12]. A search was made for the scientific name of each species using Thomson Reuters Web of Science. Scientific names were used for these searches as many common bird names carry alternative meanings in English which complicate their use. Results were then refined by predefined Web of Science Research Domain ("Science Technology"), Research Areas ("Zoology", "Environmental Sciences Ecology" and "Biodiversity Conservation") and Countries/Territories ("UK", "England", "Scotland" and "Wales"). The decision was taken not to include Ireland (Northern or Southern) as preliminary work found there to be insufficient discrimination between the areas for them to be included accurately. The remaining records, including all publication information (e.g. title, abstract, authors, source, publication date), were then imported into Excel.
Each paper was checked for relevance using the criteria in Table 1. In the majority of cases (around 80%), information already contained in the database (title, abstract, source) was sufficient to make these judgements. However, when this information was inadequate the full paper was sourced. It is possible that by following the criteria laid out in Box 1, some relevant papers Table 1. Paper qualifying criteria.
To qualify, a paper must: 1) Feature the target species in the title or abstract 2) Be carried out, at least in part, in Great Britain (studies carried out solely in Northern or Southern Ireland were excluded) were excluded from the analysis (i.e. papers which did not specify the scientific name of the target species in the abstract or title). To determine the extent of this issue we carried out a subsetting exercise, where 10% of previously excluded papers (those which were extracted from WoS but did not meet the Box 1 criteria) were rechecked for relevance. It was found that only between 0.7 and 1.4% of relevant papers had been excluded as a result of the Box 1 criteria.
The default timespan was used for all searches (1864 to 2014, however all papers which met the relevancy criteria were published between 1972 and 2014. The remaining papers were then sorted by decreasing number of citations ("Times Cited") and h-index calculated. The h-index was calculated as the largest number h such that h publications have at least h citations [9]. For example, if a species had three associated publications, cited 10, seven and two times, it would have an h-index of two, as two papers attracted two or more citations. The total number of papers per species was also calculated based on the total number of papers identified by the search.
A range of covariates deemed likely to influence species publication metrics were collated from a variety of sources and entered into the database along with the publication metrics (see S1 Table, for full list and sources). Simple statistical comparisons (Kruskal Wallis tests as data were not normally distributed) were initially made between publication metrics and a range of key covariates. Native/introduced status [13], Biodiversity Action Plan (BAP; [14]) and IUCN Red List status [15] and breeding status [11] were included as two level factors (native vs. introduced, BAP vs. non-BAP, Red List status "Not assessed" or "Least concern" vs. "Near threatened", resident or migrant). 20 and 40 year distribution trends were included as three level factors (increase, stable, decline; [16]) while guild was included as an 11 level factor-ducks and geese (22 species), herons/bitterns/egrets (10 species), gamebirds (10 species), corvids and small passerines (6 and 80 spp), birds of prey (21 species), seabirds (23 species), doves and pigeons (6 species), grebes/divers/rails (14 species) and waders (24 species) and other (9 species) [13].
Publication metrics for each species were subsequently entered as the response variable into general linear models (GLMs) in R (version 3.0.0) along with a suite of predictors based on a combination of a priori predictions and data exploration. To this end, population size [13](or distribution [16]) was included in the model (these were correlated-Pearson's coefficient = 0.508, p<0.001-although population size provided a slightly better fit to the data than distribution; both log10+1 transformed) along with factors describing whether a species was native or introduced, the subject of a BAP, its Red List status, its breeding status, its 20 year distribution trend and guild. Weight was also a good predictor in exploratory models [16,17]; however this was confounded by guild, which provided a better fit for the data.

Results
Of the 11,559 papers extracted from Web of Science, 6716 relevant science references were identified based on 5816 publications (some references referred to more than one species; mean 2.8 species). The publication years of these papers ranged from 1972 to 2014. Together these 5016 references attracted 28,091 citations and had an overall h-index of 111. The full list of species and their publication metrics is shown in Table 2. Total number of papers for the 225 species ranged from 0 (n = 4) to 213 (n = 1) and h-indices from 0 (n = 20) to 52 (n = 1) (Fig 1).
The species with the ten highest numbers of papers were Great tit (209), Red grouse (     The univariate comparisons showed total number of papers and h-index to be significantly lower for introduced than native species (Kruskal Wallis χ 2 = 6.51, p<0.05 and χ 2 = 7.58, p<0.01 respectively). BAP status was also a significant predictor of h-index, species with BAPs having typically higher h-indices (χ 2 = 5.06, p<0.05; Fig 2B). This was not the case for total number of papers (χ 2 = 2.57, p = 0.11; Fig 2A). Neither publication metrics were significantly influenced by Red List status (total papers: χ 2 = 0.92, p = 0.34; h-index: χ 2 = 0.82, p = 0.37). Breeding status (resident over migrant) had a significant effect on both metrics (total no. papers: χ 2 = 15.33, p<0.001; h-index: χ 2 = 12.49, p<0.001). The effect of guild was significant for h-index and close to significance for total number of papers (χ 2 = 19.29, p<0.05; χ 2 = 17.03, p = 0.07). There was no significant difference between the numbers of papers or h-indices for stable/increasing species and declining species (20 year trend-no. papers: χ 2 = 0.14, p = 0.93; h-index: χ 2 = 2.01, p = 0.37, 40 year trend-no. papers: χ 2 = 5.17, p = 0.08; h-index: χ 2 = 3.98,  Fig 3). Species which have undergone very severe range contractions over the last 20 years, on the other hand, (>40% contraction, [16]) have significantly lower publication metrics than those which have undergone minor to moderate declines, remained stable or increased (no. papers: χ 2 = 14.35, p<0.001; h-index: χ 2 = 13.67, p<0.001; Fig 4). Full outputs are given in S2 Table. The results of the best fit multivariate models found species with higher h-indices and total numbers of papers to have significantly larger populations and distributions (total number of papers: population t value = 10.98, p<0.001; distribution t value = 9.277, p<0.001; h-index: population t value = 11.48, p<0.001; distribution t value = 9.30, p<0.001;). Guild was also a significant predictor of both number of papers and h-index (p<0.001 for both models). The ranking of groups for total number of papers was: birds of prey<gamebirds< herons/bitterns/ egrets<waders<seabirds <ducks & geese<other<grebes/divers/rails< corvids<small passerines< pigeons & doves and for h-index: birds of prey<gamebirds<waders< sea-birds<herons/bitterns/egrets<other<ducks & geese<small passerines<grebes/divers/ rail-s<corvids<pigeons & doves. Native status was also a significant predictor, native species having higher publication indices than introduced species (no. papers: t value = -3.09 p<0.005; h-index: t value = -2.59, p<0.05). Between 1972 and 2014, new papers on established introduced species appeared in the literature at 44% of the rate associated with native species (S1 Fig). BAP status was also significant, BAP species typically having higher metrics (no. papers: t value = 3.14, p<0.005; h-index: t value = 2.57, p<0.05), as was breeding status, resident were interchangeable in the model, the outputs described in this section are those from the "population" model only. Full outputs of "population" and "distribution" models can be found in S3 Table. Discussion While a small body of evidence exists relating research effort to species traits [1][2][3][4][5][6], this study, alongside a similar analysis of British mammals [18], is the first attempt to systematically assess the contribution of different species to the evidence base in terms not only of research effort (total number of papers) but also "impact" (h-index). Given how important this evidence is for informing management, policy and conservation actions, this assessment is overdue.
Scientific interest in a species is a composite measure, reflecting the ease with which a species can be studied, its commercial value, the availability of research funding, species conservation status, the damage or risks associated with its presence, the personal contributions or interests of individual scientists and the public interest in the results [1,5,18]. This range of motivations is evidenced in our ranked list of species-Great and Blue Tits for example (numbers 1 and 6 on the total number of papers list and 1 and 3 on the h-index list), rank highly as they are easy to study, populous and widely distributed, making them an excellent "model" species for the long-term testing of ecological principles. The placing of Skylark (numbers 9 and 7 respectively), in contrast, is likely a consequence of conservation status, the species having undergone steep population declines in recent decades. In addition to this, a large proportion of the work carried out on species which rank highly was undertaken by individual/teams of scientists, or university groups over long time periods-research for 60% of all papers on Great Tits, for example, was carried out at Wytham Woods, Oxford (85% of the papers which make up the h-index), while 50% of all papers on Oystercatchers include the authors Goss-Custard and/or Stillman (76% of h-index papers). While it is unsurprising that long-term studies produce larger numbers of papers than short-term studies, they also produce a greater volume of more highly cited papers, emphasising the value of long-term data sets.
While the model developed in this study cannot separate the individual drivers of research, it can identify the key factors explaining relative research effort and impact at the species level. Populous and widely distributed species, which are native rather than introduced, resident rather than migrant breeders, and are in receipt of biodiversity action plans (national-level conservation action), had statistically higher numbers of papers and h-indices and numbers of papers associated with them. Global conservation status (IUCN Red List status, [15]) was not significant.
Guild was also a significant factor, with birds of prey in receipt of most study and pigeons and doves least. Considering the wide range of research motivations discussed above, the model fit was good with only three significant outliers-Rock Pipit (Anthus petrosus) and Lesser Redpoll (Carduelis cabaret) had publication metrics lower than would be expected from their traits and Ruddy Turnstone (Arenaria interpres) had higher metrics. These findings are largely in agreement with those of previous studies of both birds and mammals, two global and one UK-based study [1,2,18] reporting that research effort is greater for species which have larger distribution ranges. Taxonomic status (e.g. order, family, guild) has also been shown to significantly influence research effort [2,3], as has native status, introduced mammals typically shown to be under-represented in the literature [18]. Our findings on residency status differ from those of Ducatez & Lefebvre [2], who reported that migratory bird species at the global scale had been in receipt of significantly greater research effort than resident species. Our finding may be an artefact of the relatively low numbers of migrant species in Great Britain.
H-index, used here for the first time to assess the contribution of individual bird species, is regarded by many in academic spheres as a more appropriate publication metric than alterative measures (such as total paper number) as it gives an indication of the importance or impact of papers, which other metrics do not [9]. It is also it is not influenced by the size of the 'tail' of less cited papers on a species [9,18] and eliminates issues associated with search engines in which the probability of identification is linked to citation number [19].
However its use does have some drawbacks. H-index necessarily invokes a "time" component-papers which have been published for longer periods of time are likely to have higher citation rates and thus h-indices. This may lead to biases in the metric results, particularly for species experiencing recent population changes or introductions. It is worth noting, however, that the number of papers published over time for different species groups within our database (raptors, farmland birds, seabirds, waders, all species,) were highly correlated with one another (0.77 to 0.93), which indicates that while biases will inevitably exist, these are broadly similar across species groups (S1 Table; S2 Fig). H-index is also more time-consuming to calculate than simple metrics such as total number of papers. In this example the two metrics were highly correlated. However, this pattern may not hold true when applied across other datasets, or for species in other regions.
Care must therefore be taken to select the correct metric to address the questions at hand. While quality (evidenced through h-index) is important in the academic arena, it is perhaps less critical in terms of policy and conservation action; papers which are used to inform policy are not necessarily those which are cited highly within academia. For this reason, total number of papers is perhaps a better indication of general interest in a species.
The outputs of the models have identified three key gaps in the bird evidence base, in terms of both paper "quality" (h-index) and "quantity" (total number of papers).

1) Migrant breeders
Migrant breeders in Great Britain have been in receipt of significantly less research than residents. This is a significant omission considering many migrant breeders have undergone severe population declines in recent decades (e.g. Turtle Dove Streptopelia turtur, [20]; European Nightjar Caprimulgus europaeus, [21]). This is highlighted by recently published distribution change data [16], which shows that a higher percentage of migrant breeders have undergone range contractions over both the last 20 and 40 years than resident breeders (20 [13,16]).

2) Introduced species
Introduced species had disproportionally lower h-indices and numbers of papers associated with them than native species, even when allowing for differences in distribution and population. This is a concern when the negative effect of many introduced species on ecosystems is considered [22] and that introduced species often conflict with both native species and human interests (e.g. Canada Goose Branta canadensis [23], Ring-necked Parakeet Psittacula krameri [24], Ruddy Duck Oxyura jamaicensis [25]). It is also a significant result given that a number of introduced species are commercially important (e.g. Red-legged Partridge Alectoris rufa and Pheasant Phasianus colchicus) and have been present for a long time. For some introduced species, their low ranking in the list may be a result of their relatively recent introduction into the country-sufficient time simply has not elapsed for a large body of work to have amassed. Alternatively, some species may be viewed as existing at sufficiently low populations to pose much risk on the larger scale (e.g. Eagle Owl). Other species, such as Little Owl (Athene noctua), are long-term breeders yet do not appear problematic; therefore there is little interest in their study. However, with increases in the distribution of several species known to cause conflicts in other countries (e.g. Egyptian Goose Alopochen aegyptiaca in South Africa [26]: 163% increase in last 20 years, Monk parakeet in Spain [27]: 50% increase in last 20 years [16]), these knowledge gaps may limit effective policy or management.

3) Conservation status-BAP status vs. distribution trends
Species with biodiversity action plans tended to have higher numbers of papers and h-indices than those without plans. This is encouraging as it indicates that conservation status and research effort are linked (although whether conservation status is driving research or vice versa cannot be determined by this analysis). However, BAP is only one aspect of conservation status (national action)-Red List status (international action) did not have any significant impact on either publication metric. Moreover, species which have undergone distribution contractions over the last 20 or 40 years do not have significantly higher publication metrics. Indeed, looking at overall patterns, it is species with stable distributions which have been in receipt of most research over the last 20 years (Fig 3). Moreover, species which have undergone very severe range contractions over the last 20 years (>40% [16] have significantly lower publication metrics than those which have undergone minor to moderate declines, remained stable or increased ( (Fig 4; although this is clearly not the case for all declining species, for example Skylark).
These findings may reflect the difficulty of studying scarce or declining species, however they do raise questions about the relative strength of the evidence base that underpins conservation actions for such species. However as this study does not take account of time, and research motivations have undoubtedly changed in recent decades, this finding may not accurately reflect current British research priorities.
Therefore, while it is encouraging that BAP species appear to be in receipt of more and better quality study than non-BAP species, care should be taken to ensure that other declining species which do not benefit from BAP also receive sufficient research effort. It is likely that funding is harder to obtain for species not badged in this way.

Conclusions and Wider Applications
In summary, this paper has produced ranked lists of species based on their publication metrics. While their position on these lists results from a variety of factors, the relatively simple model we have constructed provides a very good fit to the data. While this method has been applied to birds in this instance, it could and, we believe, should be repeated for other taxa or species groups.
The results of this work have raised a number of questions which warrant further study-1) How have research motivations changed over time? Is there evidence of a change away from theory-led research towards conservation or policy led-research in recent years? 2) How have changes in conservation status and/or policy impacted upon publication rates?; and 3) Are these changes cause or effect? Is policy being led by research or research by policy?
In short, we believe this to be an exciting and useful approach to understanding humanintroduced biases in the quality and quantity of scientific literature at a species level, something which will help provide a solid foundation for both conservation and evidence-based policymaking in the future.  Table. Output from best fit models for both publication metrics (total number of papers and h-index). A) Total number of papers and population; B) Total number of papers and distribution; C) h-index and population; D) h-index and distribution. (DOCX) S4 Table. Pearson correlation coefficients for numbers of papers across time for a range of species groups/all species. (DOCX)