This is an uncorrected proof.
Figures
Abstract
Geometric morphometrics based on two-dimensional landmarks is a powerful tool for distinguishing morphologically similar or cryptic taxa, an important asset in the fight against medically and veterinary important arthropods. While it is commonly assumed that increasing the number of landmarks should improve discriminatory power by capturing more shape information, our findings challenge this assumption. In terms of shape discrimination (thus excluding size variation), we demonstrate that small subsets of landmarks can equal or even outperform full sets of landmarks. Fifteen examples of comparisons between closely related species were considered. These examples are drawn from published data covering six insect families: Culicidae, Glossinidae, Muscidae, Psychodidae, Reduviidae and Tabanidae. To assess the relevance of smaller subsets of landmarks, we compared the accuracy scores of unsupervised classification using full sets of landmarks (10–22 points) with those obtained using smaller subsets. To eliminate the potential influence of chance on reclassification scores, we validated our results by accounting for correct reclassifications due to chance alone. The strategy for selecting relevant landmark subsets employed two different approaches. The first relied on each landmark’s contribution to the total distance between shapes, thus establishing a hierarchy among them. The second, more comprehensive approach compared the reclassification scores of large random samples of landmarks, from the smallest subsets (3 landmarks) to the full set. From a public health perspective, the value of our approach lies in simplifying the tasks required for entomological surveillance: it could accelerate morphometric identification for large surveillance datasets, improve standardization among users, and reduce noise introduced by problematic landmarks. These gains are particularly relevant for distinguishing medically important but morphologically similar taxa, or when molecular tools are unavailable or too resource-intensive. The statistical procedures have been integrated into the XYOM online software, providing accessible tools for efficient landmark selection and improved morphometric analysis.
Author summary
Landmark-based geometric morphometrics describes shape in direct relation to the number of landmarks used. It is commonly assumed that increasing the number of landmarks allows for more information about shape, and when discriminating between groups or taxa, this strategy is expected to improve classification accuracy. Our results challenge this assumption. We demonstrate that subsets of landmarks, as small as three or four, can equal or even outperform the species classification obtained by the full set of landmarks, and we propose two methods for identifying them. We discuss the possible causes of these counterintuitive results and the perspectives they could open up for morphometric studies.
Citation: Dujardin J-P, Sriwichai P, Samung Y, Ruangsittichai J, Sumruayphol S, Dujardin S (2026) Landmark-based morphometrics with fewer landmarks: Some examples for medical entomology. PLoS Negl Trop Dis 20(6): e0014386. https://doi.org/10.1371/journal.pntd.0014386
Editor: Clarence Mang'era, Egerton University, KENYA
Received: July 18, 2025; Accepted: May 15, 2026; Published: June 2, 2026
Copyright: © 2026 Dujardin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: https://github.com/jepigit/influential-landmarks/tree/main/Suppl_data.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Species determination based on morphology is generally satisfactory but can become tricky for closely related or cryptic species, requiring a high level of entomological knowledge and experience. Due to some limitations of the morphological approach in taxonomy, such as the fragility of diagnostic characters, or the absence of an expert in a rare group of arthropods, complementary or alternative techniques have been developed, genetic, molecular, physical, or morphometric ones. Our study concerns morphometry, and in particular its most recent development: landmark-based, geometric morphometry [1].
This revolutionary approach captures both the size of an organ and its shape, and allows us to visualize changes in shape between individuals, even those not visible to the naked eye. Today, modern morphometrics may be seen as a rapid and cost-effective complementary tool, or sometimes an alternative means, to identify cryptic species of arthropods that often require molecular machinery to be distinguished [2].
To improve the ability of shape variables to correctly distinguish organisms, it seems logical to capture as much detail as possible. In the case of an insect wing or any other relevant body part, this often means digitizing more landmarks.
A larger number of landmarks, each one represented by two variables (the x,y coordinates) may result in too many shape variables to describe few individuals. As an acceptable solution to possible multidimensionality problems (as for instance an overfitting effect), a subset of few first principal components (PCs) of the shape variables has commonly been used. The principal component analysis (PCA) based on the covariance matrix is the basic multivariate analysis applied in geometric morphometrics. It generates new variables (the principal components, called here “final shape variables”) ordered according to their respective contribution (their “eigenvalue”) to the total variation. It allows for an important reduction of dimensionality by selecting the few first PCs representing the main trends of differentiation. [3,4].
A Far as we know, only one study considered the idea to use less landmarks, only the important ones. Using Monte Carlo simulations of forms designed with 5 landmarks, [5] discussed criteria based on the eigenvalues of their correlation matrix, suggesting that some landmarks are probably redundant, or bring little additional information.
Accurate species identification is essential during the monitoring phase of vector control programs. In addition to the requirement for accuracy, the identification method must be inexpensive and readily available. The morphometric approach offers these advantages without requiring extensive entomological expertise. However, its use as a routine technique could be hampered by the difficulty of data collection. Digitizing many landmarks can be a lengthy and tedious process, deterring many entomologists working with large samples. This drawback has likely hindered the adoption of geometric morphometry in many laboratories. Therefore, any idea that reduces the workload without compromising discriminatory power is welcome. Using published data, we tried to answer the following question: to reclassify taxa using wing landmarks geometry, is it equally, or more effective to select only some most informative landmarks rather than use all available ones?
Materials and methods
Culicidae
Aedes.
The mosquitoes belonging to the genus Aedes Meigen, 1818 are vectors of viruses responsible for severe diseases, among which yellow fever, dengue, Chikungunya and Zika. Aedes aegypti (Linnaeus, 1762), Ae. albopictus (Skuse, 1894) and Ae. scutellaris (Walker, 1858) examined here were collected in Thailand. All of these groups were satisfactorily distinguished at 16 wing landmarks by [6], except for the female pair Ae. scutellaris and Ae. albopictus. Using the full set of the 16 landmarks used by [6] and a number of subsets going from 3 to 15 landmarks, we performed here again a separate study of males and females, comparing Ae. albopictus with Ae. scutellaris, Ae. aegypti with Ae. scutellaris, and Ae. albopictus with Ae. aegypti.
Anopheles.
The Anopheles Meigen, 1818 mosquitoes are the vectors of paludism, one of the world’s deadliest parasitic diseases. The coordinates of female Anopheles minimus Theobald, 1901 and An. harrisoni Harbach & Manguin, 2007 were those used by [7]. We used the data comparing 67 female An. minimus and 22 An. harrisoni. They are based on the same number of landmarks (16) as for the Aedes species (see previous paragraph), and collected in the same order. We then explored the reclassification power of less landmarks.
Two group sizes were also compared between Anopheles sawadwongporni Rattanarithikul & Green, 1987 and An. pseudowillmori (Theobald, 1910). In this comparison the raw coordinates corresponded to 17 landmarks as in [8]. Here, different operators, T and J (one of us), used a different sampling of the mosquitoes: 41,36 and 26,28, respectively. These two sampling were extracted from the same data [8].
Glossinidae
The tsetse flies are responsible for the transmission of trypanosomiasis to the human (sleeping disease) and to the livestock (nagana). We used two subspecies of Glossina palpalis from Ivory Coast (Côte d’Ivoire): Glossina palpalis gambiensis (Gpg) and Glossina palpalis palpalis (Gpp) [9]. In the original publication ([9]), the 11 landmarks were annotated by two operators: B(erte) and TA(Ta) and no comparison was perfomed between them. The samples digitized by B were composed of 40 Gpg and 18 Gpp. The sample digitized by TA were the 40 same Gpg and 24 different Gpp coming from another geographic location of southern Ivory Coast (Côte d’Ivoire). Thus, using the 40 Gpg sample we could compare the two operators (B and TA) using either the complete set of landmarks or a number of subsets of them.
Muscidae
The species belonging to the Stomoxys Geoffroy, 1972 genus are hematophagous flies of veterinary and medical importance, acting as vectors of many pathogens. They are also a major irritant pest of both livestock and wildlife, and a nuisance for humans [10]. To explore the discriminatory power of few landmarks, we used the wing coordinates of a maximum of 10 landmarks as digitized by [11] to compare male Stomoxys pullus Austen, 1909 and Stomoxys uruma Shinonaga et Kano, 1966.
Reduviidae
The North American Triatoma protracta (Uhler) (1894) is a potential vector of Chagas disease in North America. It has been described as a complex of various subspecies [12]. We included two of them in our study: T. p. woodi and T. p. protracta. We used the data corresponding to 13 landmarks as published by [13].
Tabanidae
Tabanidae are hematophagous flies of veterinary and medical importance, biological and mechanical vectors of many parasites [14]. From the publication of [15] we used the coordinates at 22 landmarks and at the many subsamples of them to compare Tabanus megalops Walker, 1854 and Tabanus striatus Fabricius, 1787.
Psychodidae
Sergentomyia hodgsoni (Sinton, 1933) and Phlebotomus stantoni (Newstead 1914), as different genera, are not to be considered as morphologically close species, and they were previously shown to be perfectly separated by 12 landmarks of the wings [16]. They are used in the present study to examine the fate of a perfect taxonomic signal when the number of landmarks is progressively reduced.
Qualifying landmarks
The data used on our material were the raw coordinates obtained by the authors of the publications listed in Table 1. Our reclassification of the species, however, used a different algorithm than the linear discriminant analysis (LDA), i.e., the K-means algorithm (see “Classification method” in this section, as well as “Classification tool” in the “Discussion” section). We validated our reclassification scores by adjusting for any correct assignment that would be obtained by chance [17].
Using two species from each family, we compared the reclassification accuracy obtained with the total number of landmarks and the ones obtained with successively lower numbers of landmarks. Thus, we built different subsets of landmarks, decreasing from the total set of landmarks up to a minimum subset of 3 landmarks.
For each subset of landmarks, we performed the Procrustes superposition (GPA). It refers to the superposition of at least two configurations of homologous coordinates, thus corresponding to at least two different individuals. This is done by optimally aligning their respective coordinate systems, producing “aligned” configurations of landmarks. Optimality can be achieved parametrically, using the least squares (LS) method, or nonparametrically, using the resistant fit (R) method. In this process, each configuration is assigned the same unit of size by dividing its x,y coordinates by the centroid size. Detailed explanation is available in [18] or [19].
We used an estimate of the Procrustes distance, i.e., the distance separating aligned configurations, to re-classify the compared groups.
To find relevant subsets of landmarks, two main methods were applied. The first one, that we called the “random search method”, was based on the maximum performance in a given subset of landmarks among a large sample of random combinations of landmarks (see next section). The second one, that we call the “hierarchical method”, was based on the contribution of each landmark to the global shape distance between two configurations after aligning them using the complete set of landmarks.
The random search method (“Ra”).
This method “Ra” was based on the maximum performance in a given subset of landmarks among a sample of random combinations of landmarks. It explored at least two times a random sample (N = 90) from the many possible combinations of landmarks configurations, going from 3 landmarks to the total used by each cited reference ([6–9,11,13,15,16]).
The number of different combinations of landmarks that could be assembled to form a given subset of k landmarks was computed as a classical combination (not arrangements), as:
where: n is the complete number of landmarks, k is the subset of landmarks.
Since the shape variables are unlikely to completely remove the possible influence of size variation (allometry), for each pairwise comparison we computed an estimate of this possible effect as the coefficient of determination of size variation on shape variation (the residual allometry). We then could conmpute the correlation between the allometric residue of shape for each Ra subset and the accuracy score it allowed. The idea was to detect a possible interference of size on the level of reclassification accuracies, trying to answer the question: “does the level of residual allometry determine the level of accuracy?”. We also computed the correlation between the reclassification accuracy scores of the subsets and the corresponding Procrustes distances, expecting a likely high positive correlation.
The reclassification scores obtained by the successive subset of k landmarks allowed us to distinguish successful and unsuccessful configurations. We considered a subset as a successful one when its reclassification accuracy was equivalent or higher than the one obtained with the complete set of landmarks.
The hierarchical method.
This simple method performed a previous hierarchy of landmarks after alignment of forms (and projection onto a tangent plane), based on the Euclidean distances between pairs of homologous landmarks. To select influential landmarks for each subsets, we followed the order of them sorted according to their respective contribution to the global distance between shapes. To account for potential bias introduced by the least-square (LS) optimization alignment, we also performed the alignment using the resistant-fit (RF) method [20]. In addition to be computationally simple, the hierarchy of landmarks could be guessed visually from the graphical superposition of configurations (Fig 1).
Here, three influential landmarks may be visually identified: the landmarks 3, 8 and 15.
Relationship between the “random search” and the “hierarchical” methods.
The two approaches used entirely different methods to select influential landmarks. They were not presented here as a two-step strategy for identifying relevant subsets, but rather to illustrate the unexpected limitations of an intuitive approach such as the “hierarchical” method The latter provides a single output: a list of landmarks sorted according to their contribution to the overall distance between shapes. From this list, the user then selects a subset and excludes the less relevant landmarks. However, selecting these relevant landmarks does not offer the highest level of discrimination, as demonstrated by the “random search” method. The “random search” method does indeed find more propositions belonging to influential subsets and frequently provides higher-performing subsets. It may or not include some (or even all) of the influential landmarks suggested by the hierarchical method
Classification method
The reclassification of the samples was based on shape variables, not on size. Shape variables were computed according to the generalized Procrustes analysis (GPA) [18], and their residual allometry was computed for subsequent correlation analyses.
To explore the ability of each subset of landmarks to recognize the correct species, we used the Kmeans approach with K the number of compared species or groups [21]. The centroid initialization method was based on naive sharding (https://www.kdnuggets.com/2017/03/naive-sharding-centroid-initialization-method.html). We used the Euclidean distance between principal components of shape, assumed to be highly correlated to the non-Euclidean Procrustes distance itself when differences between shapes are small [22,23]. The percentage of individual matches between the groups computed by the Kmeans method and the pre-established ones represents here the reclassification score.
Software
All calculations, including correlation, allometry, and distance tests after Procrustes superposition, as well as reclassification tests based on the k-means algorithm, were performed using specific scripts derived from the XYOM (version III) functions. They are proposed in the miscellaneous section of the online XYOM software (https://xyom.io). The resistant-fit alignment procedure was performed using the R “opSup” function from [23].
Results
We performed reclassification studies on six families of insects: Culicidae, Glossinidae, Muscidae, Psychodidae, Reduvidae and Tabanidae. In our Tables, the naming of the landmarks are the order numbers as found in the published papers (see Table 1).
Table 2 shows the total number of subsets randomly generated (column “n”) for each pair of taxa, ranging from 505 (in Stomoxys) to 1642 (in Tabanus). From this total, only a fraction provided similar or better reclassification accuracy scores than the one obtained with the complete set of landmarks: this proportion is shown in the same Table (column “successful”). This table 2 also shows some statistics related to the Ra method. For the many reclassification analyses (from 505 to 1642) performed at each species comparison, the linear correlation was computed between the accuracy and either the Procrustes distance or the allometric residue of shape variables. As expected, the correlation between accuracy and distance between shapes was always positive, although not always highly positive. For Glossina, Stomoxys and Psychodidae, the accuracy scores clearly increased with the allometric residue, while for Triatoma, Tabanus and some Culicidae, higher accuracy was obtained with less allometric effect. For each comparison between two taxa, the random search method allowed to find the most performant subset of landmarks for an optimal reclassification (see Table 3).
Interspecific comparisons
In addition to the Table 2 giving some statistics about the Ra method, the main information provided by our study may be summarized by considering the subsets of only three and four influential landmarks (Tables 4 and 5).
Tables 4 and 5 show the influential 3- and 4-landmark subsets, their accuracy for reclassifying the sample of two taxa, and this accuracy when using all landmarks.
Using these tables, the following main observations can be derived:
- (1) The selection of LS and RF influential landmarks did not differ much. Their respective performances in reclassifying taxa suggested that, to this purpose, the least squares method was, on average, slightly better than the resistant-fit one.
- (2) The most efficient Ra subsets always contained at least one landmark of the LS or RF subsets.
- (3) When considering the three landmarks subsets (Table 4), the accuracy scores of the LS and RF ones could be far from the ones obtained by the Ra scores. This difference was consistently reduced for the 4 landmarks subsets (Table 5).
- (4) On average, the accuracy produced by selecting LS and RF subsets of 4 landmarks (86% and 84%, respectively) was much more satisfactory than for three landmarks (73% and 69%, respectively), without however reaching the average levels of the Ra subsets of 3 (90%) or of 4 landmarks (92%).
Note that there were two interspecific comparisons showing a reclassification score compatible with chance alone: the female Ae. albopictus compared with the female Ae. scutellaris (55%), as already observed in the original publication [6], and another one, between female Anopheles minimus andAn. harrisoni (56%). For the Aedes comparison, the best subsets could increase slightly the score, not better however than 70%, which remained compatible with chance alone. For the Anopheles comparison, smaller subsets significantly improved the reclassification score, up to 85% (see Tables 3–5).
Interuser comparisons
Our material allowed three comparisons of specimen belonging to the same species but digitized by different operators: Anopheles pseudowillmori, Anopheles sawadwongporni, and Glossina palpalis palpalis (see Table 6). This type of comparison could be likened to a comparison between operators: the more the operators differ, the higher the expected discrimination scores.
Based on all landmarks, the comparison of the Anopheles conspecific samples annotated by different users (T and J) produced reclassification scores in line with chance expectations (52%, 54%), weakly improved by more discriminant subsets of influential landmarks (69% and 71%).
The conspecific comparison of 18 and 24 Glossina p. palpalis could not be reduced to a simple inter-user comparison because these samples came from different geographical areas: they therefore included possible differences due to local geographical effects on shape. On the contrary, the comparison of G. p. gambiensis between two operators was indeed a simple inter-operator comparison, because the specimens (and their images) were exactly the same. Despite this, the reclassification score (with all landmarks) was surprisingly high (91%). Since both operators scanned exactly the same images, influential landmarks indicated inter-operator differences, and only that. The problematic landmarks were identified as LM1, LM2, and LM11 (by both methods). Their removal lowered the expected reclassification score to values compatible with two groups composed of the same specimens (56%), a score that no subset of landmarks could significantly improve. Furthermore, these significant differences between users probably explain why each operator generated different influential landmarks when comparing the subspecies G. p. gambiensis and G. p. palpalis. Indeed, after removing the problematic landmarks LM1, LM2 and LM11, the landmarks contributing to the divergence of the subspecies became the same for both operators (see table 6).
Discussion
Our results were not intended to compare different insect species or groups, but rather to select the most efficient landmark configuration to discriminate between them. The number of landmarks required for optimal discrimination between taxa should not be confused with the minimum number of landmarks required to characterize shape and size variation [24]. Thus, to select the most efficient subset of landmarks, we used two different strategies, both requiring at least one prior Procrustes superposition. The efficiency of both methods was estimated by the subsequent reclassification accuracy scores.
The first strategy (Ra) was to perform a random search for smaller coordinate system configurations that would be as effective, if not more effective, than the initial configurations in reclassifying taxa. This approach required performing a very large number of Procrustes superpositions and subsequent reclassifications (Table 2). It is laborious, but close to an exhaustive search, and able to suggest more than one solution.
The second method (LS, RF) relies on the hierarchy of contributions of landmarks to the overall shape-based distance between two configurations. It is less efficient than the first but it is fast and relies on intuitive logic.
Selecting influential landmarks
- (1) Random search method (Ra)
In this approach, the relevant composition of the landmark subsets was selected after performing the Kmeans reclassification process, not before. Due to obvious technical limitations, we did not use the full set of possibilities but a sample of them, in fact a maximum of 90 combinations per subset, where sometimes tens of thousands or more were possible. Due to this random sampling step, the identity of influential landmarks could vary slightly from one run to another.
- (2) Hierarchical method (LS and RF).
This method was entirely based on the Procrustes alignment method, before the reclassification process. We compared the least squares (LS) alignment method and the resistant fitting (RF) one. The LS method (commonly known as GPA) has the disadvantage of attributing the shape change to all landmarks rather than to the truly influential landmarks. Two different configurations on the same landmark would produce a superposition suggesting differences on other landmarks as well (the “Pinocchio effect”). The resistant fitting (RF) optimization procedure was developed to mitigate this disadvantage. In terms of reclassification accuracy, the LS-subsets of 4 landmarks was, on average, much more satisfactory than for 3 landmarks (86% versus 73%, respectively). Unexpectedly, the LS-subsets showed slightly better reclassification scores than the RF-subsets.
In terms of influential landmarks identity, our data did not reveal any significant difference between the two strategies, the random search strategy and the hierarchical one. In six comparisons, the same three most influential landmarks were found; in the remaining ten comparisons, two of the three most influential landmarks were found in both methods (Table 4).
However, the accuracy scores obtained by the landmarks subsets as suggested by the hierarchical approach was generally lower than the ones suggested by the random search (Ra). This might be explained by the fact that the most contributing landmarks set by the hierarchical method are initially computed for all landmarks, once for all. Their influential effect might be different when combined with another set of landmarks, as in smaller configurations.
Classification tool
To reclassify two taxa, we used an unsupervised, non-hierarchical agglomerative clustering analysis, the Kmeans method, instead of the frequently used linear discriminant analysis (LDA), which is a supervised method.
The Kmeans algorithm compares individuals and mobile centroids to suggest possible natural groupings, while the LDA compares pre-established groups. As a consequence, the LDA, although powerful, is subject to some statistical limitations [25]. Moreover, from a morphometrician point of view, the Mahalanobis distance (the metric of LDA) has the inconvenient to “distort” the Procrustes geometry [26].
The Kmeans method used the Euclidean distances between principal components of shape variables, thus after LS alignment and orthogonal projection onto a tangent plane (GPA). These Euclidean distances are assumed to be highly correlated to the non-Euclidean Procrustes distance itself when differences between shapes are small [22,23]. The use of the naive-sharding Kmeans initialization method was adopted to tentatively reduce the inconvenient of random selection of K initial centroids, which is liable to induce instability of the final output. Thus, because of the Kmeans algorithm, the re-classification accuracy was not depending only on the global distance between two shapes, but on the individual Euclidean distances to the mobile centroids of other clusters of individuals. This algorithm explains why the correlation coefficient between accuracy scores and Procrustes distances between species was not as high as might be expected (Table 2).
Possible causes of equal or improved performance with fewer landmarks
The beneficial effect of using some smaller configuration of landmarks is reminiscent of the well-known phenomenon called “the curse of multidimensionality” [27]. With more parameters, models can become overly complex, fitting the noise rather than the real signal.
In our study, however, the number of variables was not particularly high, remaining well below the sample sizes, and yet our results suggested that reducing the dimensionality of the data could be beneficial.
Various reasons related to the morphometric method itself could also explain the higher discrimination of some reduced subsets of landmarks.
- (i) The first one that comes to mind would simply be the nature of the distance used to perform the classifications: (an estimate of) the Procrustes distance. Because it is not divided by the number of landmarks, it does not depend on their number, but on how those landmarks influence the overall shape. Adding or removing landmarks only affects the distance if this operation contributes to global alignment differences.
- (ii) Another possible cause is that our results are due to size effects, i.e., an increasing importance of size difference when reducing the number of landmarks. The Procrustes superposition removes the isometric change of size, not the allometric one. Does the allometric residue of shape increase with fewer landmarks? The answer is sometimes yes, sometimes no (Table 2).
There are also some other reasons, not directly related to the Procrustes superposition, why a reduced set of landmarks could perform better than the totality when comparing species.
- (i) As highlighted by [5], the positional variation of some landmarks could be determined by the position of other landmarks via common developmental or biomechanical causes, and not directly related to evolutionary differences between species.
- (ii) In addition to the effect of an evolutionary divergence on some landmarks positions, different environments could influence shape. Such influence has been apparent after a two-factor ANOVA on shape in an ecomorphometric study, where the influence of different environments could be eliminated by reducing the number landmarks, producing results more in accordance with species richness [28].
- (iii) Finally, more artifactual circumstances could be incriminated also. One is related to the different qualities of the collected landmarks. Bookstein suggested a hierarchical graduation I, II, III [19]. Landmarks of lower quality (II,III) could introduce noise into the alignment process: removing them could leave the remaining set with sharper discriminating power. Moreover, landmarks of type I themselves are not equally shaped: some are points at the crossing of two or three narrow veins, other are similar to a small areas more than to just a point (optical microscopy). The latter may become problematic: because some landmarks are more difficult to correctly annotate, poorly experimented operators could introduce some bias at some landmarks.
Limitations of our study
Our work is limited to the use of landmarks and does not concern semilandmarks (more frequent in three-dimensional morphometry), nor of course pseudolandmarks (the approach based on contours).
It empirically demonstrates the potential usefulness of more restricted landmark configurations for distinguishing related species. The results are based on the taxa, structures, and image dataset analyzed here, and the most informative landmark subsets may differ in other biological systems or applications [29]. Indeed, our observation does not constitute mathematical proof, and we therefore cannot claim that it is valid for all other organisms. The benefit of a reduced number of landmarks should be validated in more detail on independent datasets and larger taxonomic groups before any large-scale operational implementation.
Although reducing the number of landmarks may improve efficiency, the quality of the digitization can still be influenced by specimen preservation, image resolution, and operator-related placement errors [30,31]. Furthermore, reducing the number of reference points does not, in itself, solve the problem of collaboration between different operators [32], therefore it does not eliminate the need to work with data from a single human operator.
Finally, our proposal has not been field-tested to account for potential seasonal or geographic variations. Consequently, our results should be interpreted as demonstrating the potential of this optimization approach rather than defining a universally applicable strategy for reducing reference points.
Perspectives
Between established but morphologically very close taxa, subtle differences in wing vein architecture are not random, and our data suggest that they may be represented by a very small set of landmarks, perhaps localized to specific areas of the wings (see for instance [33] for a very different approach).
However, influential landmarks are more clearly interpreted when the possible sources of variation between two samples are well understood. Among these sources of variation, there may be simple artifacts, as illustrated by the situation here where two operators annotated exactly the same specimens. It is not recommended to use samples annotated by more than one operator, but it may be necessary to compare two operators. Our method could help to remove problematic (influential) landmarks (see Table 6). The same strategy could be used also to reduce intra-user error, improving the data for symmetry analyses.
The detection of a few influential landmarks could open new, original experimentation ideas about the way some factors affect shape. Thus, under an experimental protocol, the identification of influential landmarks could help to identify the area of the wings which is affected by a given factor, as for instance the temperature, the density, the geographic origin, etc.
If the comparison involves distinct taxa digitized by the same operator, it is acceptable to assume that the few influential landmarks are the most affected by evolutionary divergence. One might be tempted to limit the digitization effort to these landmarks alone, which would considerably reduce the workload on very large samples. However, the use of influential landmarks is only fully justifiable if there is a single or predominant source of variation between samples. In cases where there are multiple possible sources of variation other than species (sex, insecticide pressure, seasons, geography, etc.), caution is advised before generalizing the use of influential landmarks detected in a single comparison.
Epidemiological importance
Accurate identification of insect vectors allows for mapping their geographic distribution and, above all, evaluating the effectiveness of control measures on target species [13]. However, in vector control, precise species identification is not the only criterion to consider. Proof of this lies in the fact that the high precision of genetic identification techniques has not led to the abandonment of the classic morphological approach, based on dichotomous keys. The relevance of the traditional morphological approach remains due not only to its precision, but also to its low cost and immediate accessibility. These are valuable assets, particularly for entomological monitoring after a treatment campaign. Indeed, reinfestations by the target vector occur unpredictably, both in time and space; a reliable, rapid, and inexpensive identification technique is therefore essential. The limitations of the dichotomous key approach can arise with cryptic species or when the specimens examined are damaged. In these situations, geometric morphometry has proven to be a very useful complement [2]. The geometric approach is also inexpensive, readily available, and, moreover, does not require advanced entomological expertise. Despite this, geometric morphometry remains underutilized in entomological laboratories. Its complexity may explain this. Manually digitizing numerous landmarks for each individual is tedious, and no reliable automated digitization technique has generated sufficient interest for widespread adoption. Any method that reduces the workload is welcome, and our study explored the possibility of achieving this by decreasing the number of landmarks without compromising discrimination capabilities. The combined use of dichotomous keys and a less cumbersome geometric method could thus significantly improve entomological monitoring programs.
Conclusion
Systematically increasing the number of landmarks to improve shape discrimination risks mixing noise and signal. Our study demonstrates that a few landmarks, sometimes only three, can achieve classification accuracy similar to, or even better than, that obtained with full set of landmarks. Given their importance, we call them “influential landmarks”. Their identification should help better justify the choice and number of landmarks used in morphometric studies; it could also suggest relevant anatomical areas of morphological divergence between species.
Acknowledgments
We thank the authors of [7], [8], [11], [15] and [9] who kindly allowed us to use their raw coordinates data to perform our analyses.
References
- 1. James Rohlf F, Marcus LF. A revolution morphometrics. Trends Ecol Evol. 1993;8(4):129–32. pmid:21236128
- 2.
Dujardin JP. Chapter 16 - Modern morphometrics of arthropods: a phenotypic approach to species recognition and population structure. In: Tibayrenc M, editor. Genetics and evolution of infectious diseases. 3rd ed. Elsevier; 2024. p. 385–425.
- 3. Krzanowski WJ. Selection of variables to preserve multivariate data structure, using principal components. Appl Stat. 1987;36(1):22.
- 4. Krzanowski WJ. Cross-validation in principal component analysis. Biometrics. 1987;43(3):575.
- 5.
Rusakov DA. Dimension reduction and selection of landmarks. A Monte Carlo Experiment. In: Marcus LF, Corti M, Loy A, Naylor GJP, Slice D, editors. Advances in Morphometrics Proceedings of the 1993 NATO-ASI on Morphometrics. New York: Plenum Publ NATO ASI, ser A, Life Sciences; 1996. p. 531–398.
- 6. Sumruayphol S, Apiwathnasorn C, Ruangsittichai J, Sriwichai P, Attrapadung S, Samung Y, et al. DNA barcoding and wing morphometrics to distinguish three Aedes vectors in Thailand. Acta Trop. 2016;159:1–10. pmid:26987285
- 7. Chatpiyaphat K, Sumruayphol S, Dujardin J-P, Samung Y, Phayakkaphon A, Cui L, et al. Geometric morphometrics to distinguish the cryptic species Anopheles minimus and An. harrisoni in malaria hot spot villages, western Thailand. Med Vet Entomol. 2021;35(3):293–301. pmid:33205850
- 8. Chaiphongpachara T, Sriwichai P, Samung Y, Ruangsittichai J, Morales Vargas RE, Cui L, et al. Geometric morphometrics approach towards discrimination of three member species of Maculatus group in Thailand. Acta Trop. 2019;192:66–74. pmid:30710534
- 9. Kaba D, Berté D, Ta BTD, Tellería J, Solano P, Dujardin J-P. The wing venation patterns to identify single tsetse flies. Infect Genet Evol. 2017;47:132–9. pmid:27765637
- 10. Baldacchino F, Muenworn V, Desquesnes M, Desoli F, Charoenviriyaphap T, Duvallet G. Transmission of pathogens by Stomoxys flies (Diptera, Muscidae): a review. Parasite. 2013;20:26. pmid:23985165
- 11. Changbunjong T, Sumruayphol S, Weluwanarak T, Ruangsittichai J, Dujardin J-P. Landmark and outline-based geometric morphometrics analysis of three Stomoxys flies (Diptera: Muscidae). Folia Parasitol. 2016;63:2016.037. pmid:27827335
- 12. Usinger RL. The triatominae of North and Central America and the West Indies and their public health significance. Public Health Bull. 1944;288:81.
- 13. Dujardin J-P, Beard CB, Ryckman R. The relevance of wing geometry in entomological surveillance of Triatominae, vectors of Chagas disease. Infect Genet Evol. 2007;7(2):161–7. pmid:16949351
- 14. Baldacchino F, Desquesnes M, Mihok S, Foil LD, Duvallet G, Jittapalapong S. Tabanids: neglected subjects of research, but important vectors of disease agents! Infect Genet Evol. 2014;28:596–615.
- 15. Changbunjong T, Prakaikowit N, Maneephan P, Kaewwiset T, Weluwanarak T, Chaiphongpachara T, et al. Landmark data to distinguish and identify morphologically close Tabanus spp. (Diptera: Tabanidae). Insects. 2021;12(11):974. pmid:34821775
- 16. Sumruayphol S, Chittsamart B, Polseela R, Sriwichai P, Samung Y, Apiwathnasorn C, et al. Wing geometry of Phlebotomus stantoni and Sergentomyia hodgsoni from different geographical locations in Thailand. C R Biol. 2017;340(1):37–46. pmid:27890576
- 17.
Klecka WR. Discriminant analysis. Sage University Paper series on Quantitative Applications in the Social Sciences, No 07-019. Beverly Hills (CA): Sage; 1980.
- 18.
Rohlf FJ. Rotational fit (Procrustes) methods. In: Rohlf F, Bookstein F, editors. Proceedings of the Michigan Morphometrics Workshop. Ann Arbor (MI): University of Michigan Museums, Ann Arbor. Special Publication Number 2. The University of Michigan Museum of Zoology; 1990. p. 380, 227–36.
- 19.
Bookstein FL. Introduction to methods for landmark data. In: Rohlf FJ, Bookstein FL, editors. Proceedings, Michigan Morphometrics Workshop, 1988. Ann Arbor (MI): The University of Michigan Museum of Zoology, Special Publication No 2; 1990. p. 216–25.
- 20. Siegel AF, Benson RH. A robust comparison of biological shapes. Biometrics. 1982;38(2):341–50. pmid:6810969
- 21.
Pérez-Ortega J, Almanza-Ortega NN, Vega-Villalobos A, Pazos-Rangel R, Zavala-Díaz C, Martínez-Rebollar A. The K-means algorithm evolution. In: Sud K, Erdogmus P, Kadry S, editors. Introduction to data science and machine learning. Rijeka: IntechOpen; 2019.
- 22.
Rohlf FJ. Morphometric spaces, shape components and the effects of linear transformations. In: Marcus LF, Corti M, Loy A, Naylor GJP, Slice D, editors. Advances in Morphometrics. Proceedings of the 1993 NATO-ASI on Morphometrics. New York: Plenum Publ. NATO ASI, ser. A, Life Sciences; 1996. p. 117–29.
- 23.
Claude J. Morphometrics with R. Springer Science Business Media, LLC; 2008. 330 p.
- 24. Watanabe A. How many landmarks are enough to characterize shape and size variation? PLoS One. 2018;13(6):e0198341. pmid:29864151
- 25.
Pimentel RA. An introduction to ordination, principal components analysis and discriminant analysis. In: Foottit RG, Sorensen JT, editors. Ordination in the study of morphology, evolution and systematics of insects: applications and quantitative genetic rationales. New York: Elsevier; 1992. p. 11–28.
- 26. Mitteroecker P, Bookstein F. Linear discrimination, ordination, and the visualization of selection gradients in modern morphometrics. Evol Biol. 2011;38(1):100–14.
- 27.
Bellman R. Adaptive control processes: a guided tour. Princeton Legacy Library. Princeton University Press; 1961. Available from: https://books.google.fr/books?id=POAmAAAAMAAJ
- 28.
Dujardin JP, Thongsripong P, Henry AB. The mosquito fauna: from metric disparity to species diversity. In: Wahl C, editor. Morphometrics, InTechOpen. 2012. https://doi.org/10.5772/36142
- 29. Lorenz C, Almeida F, Almeida-Lopes F, Louise C, Pereira SN, Petersen V. Geometric morphometrics in mosquitoes: what has been measured? Infect Genet Evol. 2017;54:205–15.
- 30. Dujardin J-PA, Kaba D, Henry AB. The exchangeability of shape. BMC Res Notes. 2010;3:266. pmid:20964872
- 31. Wilke ABB, Christe RD, Multini LC, Vidal PO, Wilk-da Silva R, de Carvalho GC. Morphometric wing characters as a tool for mosquito identification. PLOS One. 2016;11.
- 32. Evin A, Bonhomme V, Claude J. Optimizing digitalization effort in morphometrics. Biol Methods Protoc. 2020;5(1):bpaa023. pmid:33324759
- 33. Nolte K, Sauer FG, Baumbach J, Kollmannsberger P, Lins C, Lühken R. Robust mosquito species identification from diverse body and wing images using deep learning. Parasit Vectors. 2024;17(1):372. pmid:39223629