Figures
Citation: Tamayo-Quintero J, Martínez-de la Puente J, Matta NE, Pacheco MA, Rivera-Gutierrez HF (2025) Imprudent use of MalAvi names biases the estimation of parasite diversity of avian haemosporidians. PLoS Pathog 21(2): e1012911. https://doi.org/10.1371/journal.ppat.1012911
Editor: Laura J. Knoll, University of Wisconsin Medical School, UNITED STATES OF AMERICA
Published: February 5, 2025
Copyright: © 2025 Tamayo-Quintero et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: JMdlP was financed by the MICROVEC-PID2020-118205 GB-I00 grant funded by MCIN/AEI/10.13039/501100011033. Additional support derived from the CNS2022-135993 grant of the Ministerio de Ciencia e Innovación (MCIN/AEI/10.13039/501100011033) with funding from European Union NextGenerationEU to JMdlP. JTQ received a Beca de movilidad Resolución Rectoral 144/2024, financed by the Universidad Internacional de Andalucía. The funders had no role in study design, data collection and analysis, publication decision, or manuscript preparation.
Competing interests: The authors have declared that no competing interests exist.
Overview
Understanding patterns of biota diversity at the different geographical scales is one of the major challenges of biogeography [1] and macroecology [2], especially for parasites, one of the most diverse groups on earth [3]. The order Haemosporida includes parasites of wildlife which are among the most studied in ecology and evolution [4,5]. In birds, haemosporidians include malaria parasites of the genus Plasmodium and related genera Haemoproteus and Leucocytozoon [6]. These parasites have been extensively studied in bird populations, mainly in Europe and North America, where authors have identified a high genetic diversity of parasites infecting wild bird species [7]. These parasite genera are an excellent study model for understanding the ecology and evolution of parasite–host interactions [7,8] and factors affecting the specificity and generalist strategies of infections.
A quick review of the avian haemosporidian lineages worldwide in MalAvi database
The development of molecular tools drastically changed the knowledge of the diversity of parasites in wild birds. Since Bensch and colleagues [9], advances in molecular techniques have allowed the identification of 5,131 unique lineages of avian haemosporidians in more than 2,200 bird species worldwide (accession date: 07 June 2024). Authors have extensively defined and named unique lineages using a 479 base pairs (bp) fragment of the cytochrome b gene (cyt-b), which is used as barcode in these bird parasites [10]. For example, these lineages include generalist parasites such as the Plasmodium relictum SGS1, which infect more bird species than any other Plasmodium lineage [11], to lineages identified in a single host species. The goal of this genetic characterization is to use a region that is sufficiently informative and easy to study, allowing unambiguous identification of parasites for direct comparisons of parasite diversity between host species and geographic regions [12]. Based on the current information, the selected genetic region has been used as a proxy for parasite species [5], since they appear to be reproductively isolated entities [10,13]. However, there is still insufficient information to determine whether each of these haplotypes correspond to a different species, which highlights the need to further evaluate the diversity of this group not only molecularly but also morphologically [5].
The MalAvi database is the most comprehensive and widely used repository for lineages of avian haemosporidians worldwide [10]. This database provides detailed information of the known parasite lineages, hosts, and geographical distribution, among other important details [10,14]. However, this database includes numerous sequences with a length lower than the standard (<479 bp), which may be a consequence of prioritizing as much data as possible to have a representation of the diversity of these groups of avian parasites around the world. Although the shorter sequences are labeled as “partial” in the database, the inclusion of these lineages in macroecological studies complicates the standardization of unique lineage names associated with particular sequences, as synonymies can occur within lineages. Synonymies, defined as lineages with different names published as partial or, occasionally, full sequences of the partial cyt-b gene that are genetically indifferentiable between them, may increase the error of name assignment. Recognizing this drawback is important to evaluate the databases of studies including this information. Only using the name of parasite lineages infecting birds, instead of the corresponding sequences, may affect the estimation of parasite lineage diversity circulating in an area and the host range of lineages.
The importance of considering sequences for studies of the parasite infection patterns
The diversity of avian haemosporidians has been widely documented worldwide [15,16]. Data from MalAvi has been frequently used to explore diversity and specificity patterns, providing valuable results [16,17]. These studies typically refer to each cyt-b haplotype as a unique parasite lineage, following the standard naming convention for these parasites [10,18]. However, given the potential bias introduced by short sequence synonymies, we aim to evaluate the name assignment criteria in the MalAvi database and provide potential guidelines for those using this platform.
We analyzed the occurrence of lineage synonymies in the complete MalAvi database (Fig A in S1 Supporting Information). These included 5,131 lineages (accession date: 07 June 2024) corresponding to: 1,559 lineages of Plasmodium, 2,030 lineages of Haemoproteus, and 1,542 lineages of Leucocytozoon. The data set includes partial sequences (n = 482) with a minimum length of 145 bp to sequences covering the whole barcoding region of at least 479 bp (n = 4,649). In total, 486 lineages have one to 21 synonymies, being more frequent in Haemoproteus (47.9%) and Plasmodium (30.7%) than in Leucocytozoon (21.4%). The occurrence of synonymies is negatively correlated with the length of sequences (Estimate = −0.06, z-value = 1,593.09, p < 0.001). A similar trend was found for the 3 parasite genera included in the database (all p < 0.05), suggesting that the relationship between the number of synonymies and the sequence length is consistent between them (Fig 1). For example, the most notable cases, which have a high number of synonymies, include sequences corresponding to the Haemoproteus lineages VIRFLA02 (228 bp), VIOLI15 (228 bp), VIRFLA03 (222 bp), PSADEC01 (305 bp), and MELGEO01 (269 bp), and the Plasmodium lineages RBQ16 (210 bp) and TABI08 (285 bp) (see further detail of the identical sequences in Table A in S1 Supporting Information). By contrast, synonymies rarely occur in the case of more extensive sequences, with only a few lineages with more than 440 pb showing from 3 to 5 synonymies (Fig 1).
Different colors correspond to the 3 genera analyzed.
In some cases, these synonymies occur because the sequences included in Malavi, corresponding to the fragment of the barcode region, partially represent the complete sequence deposited in other databases such as GenBank. In this respect, differences between apparent identical sequences in MalAvi may emerge when considering the complete fragment of sequences deposited in Genbank (Fig 2). Synonymies are generated even though the GenBank sequence has sufficient information outside the barcode region.
The 446 bp sequence reported in MalAvi and the 506 bp fragment from Genbank (EF153660) are included and compared with the 5 synonyms SPIPAS07, PHEMEL01, MELLIN02, DENCOR06, and CNEORN01. Differences between some of these lineages emerged in green when GenBank sequences were considered and compared with fragments included in MalAvi.
Interestingly, the occurrence of synonymies in sequences included in MalAvi differs between geographical regions. The number of synonymies registered by subcontinents showed significant differences (X2 = 51.96, p-value < 0.001), being higher in South America (n = 132), especially for Haemoproteus and Plasmodium lineages (Fig 3). This suggests that as new lineages are being discovered in the most diverse region of earth, then special attention is required to minimize the occurrence of synonymies that may affect estimates of lineage richness and specificity in this area. A standardization of the methods used in different laboratories from different regions may significantly improve the quality of the data available in MalAvi, which may benefit researchers globally.
Different colors correspond to the 3 genera analyzed. Overall, 66 different lineages corresponding to Haemoproteus (23), Leucocytozoon (14), and Plasmodium (29) are found in more than one subcontinent. In these cases, lineages are included in each subcontinent.
Concluding remarks
The utility of the MalAvi database of avian haemosporidians is proved beyond any doubt, since it is broadly used in ecological and parasitological literature and, essentially facilitates the work of researchers on this topic. However, based on the results reported here, it is highly recommended that MalAvi users be alerted to perform a comprehensive data cleaning and lineage homologation before doing any sequence analysis. If possible, authors should avoid using partial sequences of the parasite barcoding region and use the complete MalAvi region as the minimum differentiation unit. This is especially the case in areas such as South America, where synonymies are more frequently found. Before data submission to MalAvi, researchers should ensure that at least sequences cover the 479 bp fragment. Lineages should only be named based on BLAST hits to complete 479 bp records if available and follow the standard nomenclature procedures [10]. Partial sequences, which may introduce potential synonymies, should only be used when complete records are unavailable.
To avoid biases in the analyses of richness, diversity, and specificity of haemosporidians infecting birds based on lineage or lineage names already published in MalAvi, we recommend doing a blast against sequences deposited in both MalAvi and GenBank repositories and including them in an alignment to confirm that lineage names correspond to unique parasite sequences. Providing different names to sequences corresponding to the same genetic lineage may affect different estimations including the diversity of parasites circulating in a particular area or the host ranges of the parasites. To reduce the occurrence of these potential biases, approaches similar to those used by Gil-Vargas and Sedano [14] can be used. In this study, authors identified haplotypes by clustering sequences with a similarity ≥99.3% identity. In addition, Outlaw and Ricklefs [19], in a revision of species limits in avian haemosporidians, recommend a standardized procedure to “tag” these sequences, based on percentage sequence similarity. This approach may allow the inclusion of additional sequences from other databases, such as GenBank. However, this implies not using the barcode region of the parasite, increasing the difficulty of comparing the results with those from studies already conducted that define patterns of diversity, specificity, and distribution of avian haemosporidians. In this respect, different studies have shown that a higher hidden diversity may exist among described parasite lineages when other regions are considered [20], or no variation when mitochondrial genes are used [21]. Despite that, since very limited information about the genetic diversity of parasites is available in regions outside the barcoding region considered in MalAvi, nowadays most research focus on the use of this data set.
Furthermore, it is crucial to verify the length of the original sequence from the GenBank database [22]. GenBank allows verification of the lineage name and sequence length, due to the use of different primers to amplify the partial cyt-b gene, that does not fall within the barcode established in the MalAvi platform. By combining both platforms for blast comparison of parasite sequences, it is possible to minimize biases in lineage identification and use, as well as the use of longer sequences for phylogenetic and specificity analyses. Additionally, for those who are publishing their new sequences, we recommend always try to submit complete sequence barcodes to the public databases. In order to obtain the complete barcode sequence, both strands of the DNA (forward and reverse) should be sequenced, and the sequence quality should be checked to avoid losing information at the ends of them. Then, confirm the identity of the lineage for this genetic region before doing any formal analysis. Submitting only forward or reverse sequences is not good practice. Researchers should ensure that their lineage has not been previously reported before assigning a new name to it, contacting the curator of MalAvi database if necessary. Following a collaborative approach, the curation of this database may be easier and benefit the scientific community to develop further analyses on bird–vector–haemosporidian parasite interactions.
Supporting information
S1 Supporting Information.
Fig A. Workflow of our study, from data collection to analysis. Data access June 2024. Table A. Lineages with synonymies in the open data platform MalAvi.
https://doi.org/10.1371/journal.ppat.1012911.s001
(DOCX)
Acknowledgments
To all the members of the laboratory of Ecología y Evolución de Vertebrados (EcoEV) of the Universidad de Antioquia.
References
- 1. Kattan GH, Tello SA, Giraldo M, Cadena CD. Neotropical bird evolution and 100 years of the enduring ideas of Frank M. Chapman Biol J Linn Soc. 2016;117:407–413.
- 2. Velasco JA, Pinto-Ledezma JN. Mapping species diversification metrics in macroecology: Prospects and challenges. Front Ecol Evol. 2022;10:951271.
- 3.
Morand S, Krasnov BR, Littlewood DTJ. Parasite diversity and diversification. Cambridge University Press; 2015.
- 4. Pacheco MA, Matta NE, Valkiūnas G, Parker PG, Mello B, Stanley CE, et al. Mode and Rate of Evolution of Haemosporidian Mitochondrial Genomes: Timing the Radiation of Avian Parasites. Mol Biol Evol. 2018;35:383–403. pmid:29126122
- 5. Pacheco MA, Escalante AA. Origin and diversity of malaria parasites and other Haemosporida. Trends Parasitol. 2023;39:501–516. pmid:37202254
- 6. Sehgal RNM. Manifold habitat effects on the prevalence and diversity of avian blood parasites. Int J Parasitol Parasites Wildl. 2015;4:421–430. pmid:26835250
- 7. Rivero A, Gandon S. Evolutionary Ecology of Avian Malaria: Past to Present. Trends Parasitol. 2018;34:712–726. pmid:29937414
- 8. González AD. Avian haemosporidians from Neotropical highlands: Evidence from morphological and molecular data. Parasitol Int. 2015. pmid:25638289
- 9. Bensch S, Stjernman M, Hasselquist D, Örjan Ö, Hannson B, Westerdahl H, et al. Host specificity in avian blood parasites: a study of Plasmodium and Haemoproteus mitochondrial DNA amplified from birds. Proc R Soc Lond B. 2000;267:1583–1589. pmid:11007335
- 10. Bensch S, Hellgren O, Pérez-Tris J. MalAvi: a public database of malaria parasites and related haemosporidians in avian hosts based on mitochondrial cytochrome b lineages. Mol Ecol Resour. 2009;9:1353–1358. pmid:21564906
- 11. Martínez-de La Puente J, Santiago-Alarcon D, Palinauskas V, Bensch S. Plasmodium relictum. Trends Parasitol. 2021;37:355–356. pmid:32660871
- 12.
Santiago-Alarcon D, Marzal A, editors. Avian Malaria and Related Parasites in the Tropics: Ecology, Evolution and Systematics. Cham: Springer International Publishing; 2020.
- 13.
Valkiūnas G. Avian malaria parasites and other haemosporidia. Boca Raton: CRC Press; 2005.
- 14. Gil-Vargas DL, Sedano-Cruz RE. Genetic variation of avian malaria in the tropical Andes: a relationship with the spatial distribution of hosts. Malar J. 2019;18:129. pmid:30971233
- 15. Clark NJ, Clegg SM, Lima MR. A review of global diversity in avian haemosporidians (Plasmodium and Haemoproteus: Haemosporida): new insights from molecular data. Int J Parasitol. 2014;44:329–338. pmid:24556563
- 16. Darío Hernandes Córdoba O, Torres-Romero EJ, Villalobos F, Chapa-Vargas L, Santiago-Alarcon D. Energy input, habitat heterogeneity and host specificity drive avian haemosporidian diversity at continental scales. Proc R Soc B. 2024;291:20232705. pmid:38444334
- 17. Ellis VA, Bensch S. Host specificity of avian haemosporidian parasites is unrelated among sister lineages but shows phylogenetic signal across larger clades. Int J Parasitol. 2018;48:897–902. pmid:30076910
- 18. Fecchio A, Bell JA, Pinheiro RBP, Cueto VR, Gorosito CA, Lutz HL, et al. Avian host composition, local speciation and dispersal drive the regional assembly of avian malaria parasites in South American birds. Mol Ecol. 2019;28:2681–2693. pmid:30959568
- 19. Outlaw DC, Ricklefs RE. Species limits in avian malaria parasites (Haemosporida): how to move forward in the molecular era. Parasitology. 2014;141:1223–1232. pmid:24813385
- 20. Harl J, Himmel T, Ilgūnas M, Valkiūnas G, Weissenböck H. The 18S rRNA genes of Haemoproteus (Haemosporida, Apicomplexa) parasites from European songbirds with remarks on improved parasite diagnostics. Malar J. 2023;22:232. pmid:37563610
- 21. Hellgren O, Kelbskopf V, Ellis VA, Ciloglu A, Duc M, Huang X, et al. Low MSP-1 haplotype diversity in the West Palearctic population of the avian malaria parasite Plasmodium relictum. Malar J. 2021;20:265. pmid:34118950
- 22.
National Library of Medicine (US). National Center for Biotechnology Information (NCBI). In: National Center for Biotechnology Information [Internet]. 1988. https://www.ncbi.nlm.nih.gov/.