Toward Objective, Morphology-Based Taxonomy: A Case Study on the Malagasy Nesomyrmex sikorai Species Group (Hymenoptera: Formicidae)

Madagascar is one of the world’s greatest biodiversity hotspots, meriting special attention from biodiversity scientists. It is an excellent testing ground for novel techniques in taxonomy that aim to increase classification objectivity and yield greater taxonomic resolving power. Here we reveal the diversity of a unique and largely unexplored fragment of the Malagasy ant fauna using an advanced combination of exploratory analyses on quantitative morphological data allowing for increased objectivity in taxonomic workflow. The diversity of the Nesomyrmex sikorai species-group was assessed via hypothesis-free nest-centroid-clustering combined with recursive partitioning to estimate the number of morphological clusters and determine the most probable boundaries between them. This combination of methods provides a highly automated and objective species delineation protocol based on continuous morphometric data. Delimitations of clusters recognized by these exploratory analyses were tested via confirmatory Linear Discriminant Analysis (LDA) and Multivariate Ratio Analysis (MRA). The final species hypotheses are corroborated by many qualitative characters, and the recognized species exhibit different spatial distributions and occupy different ecological regions. We describe and redescribe eight morphologically distinct species including six new species: Nesomyrmex excelsior sp. n., N. modestus sp. n., N. reticulatus sp. n., N. retusispinosus (Forel, 1892), N. rugosus sp. n., N. sikorai (Emery, 1896), N. striatus sp. n., and N. tamatavensis sp. n. An identification key for their worker castes using morphometric data is provided.


Introduction
The main objective of taxonomy is to document the diversity of flora and fauna. It provides fundamental information for endeavors dealing with biodiversity by addressing a crucial question: "how many species are there?" Unfortunately, the rapidly accelerating rate of biodiversity loss we face today [1,2] poses new challenges to systematic research.
In recent years a number of promising new algorithmic approaches has been developed and introduced in insect taxonomy for the purpose of recognizing complex patterns in continuous morphometric data [3][4][5][6][7]. These methods help to reveal morphological diversity based on multivariate morphometric analyses [8,9], but are prone to bias as the number of clusters is still user-defined.
In this paper the diversity of the Malagasy Nesomyrmex sikorai group is inferred via a highly automated protocol involving a fusion of two algorithms, Nest Centroid clustering [7] and Partitioning Algorithm based on Recursive Thresholding [10]. The Nesomyrmex sikorai species group is defined and diagnosed by Csősz & Fisher [11] as follows: anterodorsal spines on petiolar node absent; petiolar node long and narrow in dorsal view, sides are nearly parallel; propodeal spines moderately long, always present, mesopropodeal depression conspicuous, deep; postocular distance vs. petiole width = 1.415 [min = 1.198, max = 1.676].
Benefits of the combined application of Nest Centroid clustering (NC clustering) and Partitioning Algorithm based on Recursive Thresholding (PART) was tested and proved efficient in species delimitation in Nesomyrmex [11]. The NC clustering searches for discontinuity in morphometric data by sorting all similar cases into clusters in a two-step procedure. The first step reduces dimensionality of the data and computes linear discriminants (LD) for all samples as coordinates in a morphospace. Removal of collinearity between variables improves the performance of the machine learning model. The second step calculates pairwise distances between samples using LD scores as input, and displays the distance matrix in a dendrogram. The latter step is responsible for better visualization of the data. This technique has proved efficient at pattern recognition within large and complex datasets [12][13][14], but the number of clusters is still subjectively defined.
The PART method offers an excellent opportunity to determine the number of clusters based on statistic thresholds. It uncovers distinct subgroups in the groups identified via recursive application of the Gap statistic algorithm [15] using the LD scores generated by the NC clustering as input. A crucial feature of the algorithm is the introduction of tentative splits of clusters to isolate outliers that might otherwise halt the recursion prematurely [10].
Multivariate evaluation of morphological data has revealed that the N. sikorai species-group comprises eight well-outlined clusters in the Malagasy zoogeographical region, all representing species; of these, six are new to science. The eight species outlined are described or redefined here based on worker caste are as follows: Nesomyrmex excelsior sp. n., N. modestus sp. n., N. reticulatus sp. n., N. retusispinosus (Forel, 1892), N. rugosus sp. n., N. sikorai (Emery, 1896), N. striatus sp. n. and N. tamatavensis sp. n.. We provide a combined key using both qualitative and quantitative characters. Morphological patterns are linked to geographic map elevations of the sites where populations were collected and are also provided as predictor variables.
Our research has also revealed that the dimensionality reduction feature and graphical display of the NC clustering aligned with PART that automatically assigns samples into clusters allows rapid and objective decision-making in morphometry-based alpha taxonomy. Fusion of PART with NC clustering helps to readily infer species boundaries, thereby greatly reduces the need for subjective interpretation.

Material and Methods
Ant samples used in this study comply with the regulations for export and exchange of research samples outlined in the Convention of Biology Diversity and the Convention on International Trade in Endangered Species of Wild Fauna and Flora. For field work conducted in Madagascar, permits to research, collect and export ants were obtained from the Ministry of Environment and Forest as part of an ongoing collaboration between the California Academy of Sciences and the Ministry of Environment and Forest, Madagascar National Parks and Parc Botanique et Zoologique de Tsimbazaza. Authorization for export was provided by the Director of Natural Resources.
In the present study, 22 continuous morphometric traits were recorded in 227 worker individuals belonging to 180 nest samples collected in the Malagasy region and deposited in the following institutions, abbreviations after [16]: CASC (California Academy of Sciences, San Francisco, California, U.S.A.), MCZ (Museum of Comparative Zoology, Cambridge, Massachusetts, U.S.A.), MHNG (Muséum d'Histoire Naturelle, Geneva, Switzerland). The full list of material morphometrically examined in this revision is listed in S1 Table with unique specimen identifiers (e.g. CASENT0374465). Designation of type material with detailed label information is given in the type material investigated sections for each taxon.
All images and specimens used in this study are available online on AntWeb (http://www. antweb.org). Images are linked to their specimens via the unique specimen code affixed to each pin (CASENT0374465). Online specimen identifiers follow this format: http://www.antweb. org/specimen/CASENT0374465. Digital color montage images were created using a JVC KY-F75 digital camera and Syncroscopy Auto-Montage software (version 5.0), or a Leica DFC 425 camera in combination with the Leica Application Suite software (version 3.8). Distribution maps were generated in R [17] using package phytools [18].
The measurements were taken with a Leica MZ 12.5 stereomicroscope equipped with an ocular micrometer at a magnification of 100×. Measurements and indices are presented as arithmetic means with minimum and maximum values in parentheses. Body size dimensions are expressed in μm. Due to the abundance of worker individuals in contrast to the limited number of queen and male specimens available, the present revision is based on worker caste only. Worker-based revision is further facilitated by the fact that the name-bearing type specimens of the vast majority of existing ant taxa belong to the worker caste. All measurements were made by the first author. The following morphometric traits are defined in earlier works [9,11]. Morphological variation, stability and discriminatory power of these characteristics have been shown to be suitable for the purpose of separating morphospecies of the genera Temnothorax [9] and Nesomyrmex [11]. For the definition of morphometric characters, earlier protocols [11] were considered. The repeatability of morphometric traits was tested. All variables have been measured twice for 14 randomly chosen ant specimens [19], the average measure of intraclass correlation coefficient were calculated [20]. Results of the repeatability tests are given in parentheses [R=] for each morphometric trait after character definitions with up to three decimal places. Explanations and abbreviations for measured characters are as follows:

Nomenclature Acts
The electronic edition of this article conforms to the requirements of the amended International Code of Zoological Nomenclature, and hence the new names contained herein are available under that Code from the electronic edition of this article. This published work and the nomenclatural acts it contains have been registered in ZooBank, the online registration system for the ICZN. The ZooBank LSIDs (Life Science Identifiers) can be resolved and the associated information viewed through any standard web browser by appending the LSID to the prefix "http://zoobank.org/". The LSID for this publication is: lsid:zoobank.org:pub:0779FA9E-22C2-4C12-A3C6-6FCA61887E70. The electronic edition of this work was published in a journal with an ISSN, and has been archived and is available from the following digital repositories: PubMed Central, LOCKSS.

Protocol of character recording
CL: Maximum cephalic length in median line. The head must be carefully tilted to the position providing the true maximum. Excavations of hind vertex and/or clypeus reduce CL, [R = 0.998] (Fig 1A).
CW: Maximum width of the head including compound eyes, [R = 0.999] (Fig 1A). CWb: Maximum width of head capsule without the compound eyes. Measured just posterior to the eyes, [R = 0.998] (Fig 1A).
PoOC: Postocular distance. Use a cross-scaled ocular micrometer and adjust the head to the measuring position of CL. Caudal measuring point: median occipital margin; anterior measuring point: median head at the level of the posterior eye margin, [R = 0.997] (Fig 1A).
CS: Absolute cephalic size. The arithmetic mean of CL and CWb. EL: Maximum diameter of the compound eye, [R = 0.929]. FRS: Frontal carina distance. Distance of the frontal carinae immediately caudal of the posterior intersection points between frontal carinae and torular lamellae. If these dorsal lamellae do not laterally surpass the frontal carinae, the deepest point of scape corner pits may be taken as the reference line. These pits take up the inner corner of the scape base when the scape is directed fully caudally and produces a dark, triangular shadow in the lateral frontal lobes immediately posterior to the dorsal lamellae of the scape joint capsule, [R = 0.982] (Fig 1B).
MW: Mesosoma width. In workers MW is defined as the longest width of the pronotum in dorsal view excluding the pronotal spines, [R = 0.998] (Fig 1C).
PSTI: Apical distance of pronotal spines in dorsal view; if spine tips are rounded or thick take the centers of spine tips as reference points, [R = 0.994] (Fig 1C).
PPW: Postpetiole width. Maximum width of postpetiole in dorsal view, [R = 0.994] (Fig 1D). SPBA: Minimum propodeal spine distance. The smallest distance of the lateral margins of the propodeal spines at their base. This should be measured in antero-dorsal view, since the wider parts of the ventral propodeum do not interfere with the measurement in this position. If the lateral margins of propodeal spines diverge continuously from the tip to the base, a smallest distance at base is not defined. In this case, SPBA is measured at the level of the bottom of the interspinal meniscus, [R = 0.993] (Fig 1D).
SPTI: Apical propodeal spine distance. The distance of propodeal spine tips in dorsal view; if spine tips are rounded or truncated, the centers of spine tips are taken as reference points, [R = 0.994] (Fig 1D).
ML (Weber's length): Mesosoma length from caudalmost point of propodeal lobe to transition point between anterior pronotal slope and anterior pronotal shield (preferentially measured in lateral view; if the transition point is not well defined, use dorsal view and take the center of the dark-shaded borderline between pronotal slope and pronotal shield as anterior reference point), [R = 0.998] (Fig 1E).
MPST: Maximum distance from the center of the propodeal spiracle to the posteroventral corner of the ventrolateral margin of the metapleuron, [R = 0.989] (Fig 1F).
NOH: Maximum height of the petiolar node, measured in lateral view from the uppermost point of the petiolar node perpendicular to a reference line set from the petiolar spiracle to the  (Fig 1F).
NOL: Length of the petiolar node. Measured in lateral view from the center of the petiolar spiracle to the dorso-caudal corner of caudal cylinder. Do not erroneously take as the reference point the dorso-caudal corner of the helcium, which is sometimes visible, [R = 0.981] (Fig 1F).
PPL: Postpetiole length. The longest anatomical line that is perpendicular to the posterior margin of the postpetiole and is between the posterior postpetiolar margin and the anterior postpetiolar margin, [R = 0.975] (Fig 1F).
SPST: Propodeal spine length. Distance between the center of the propodeal spiracle and spine tip. The spiracle center refers to the midpoint defined by the outer cuticular ring but not to the center of the actual spiracle opening, which may be positioned eccentrically, [R = 0.994] ( Fig 1F).
PEH: Maximum petiole height. The longest distance measured from the ventral petiolar profile at node level (perpendicular to the chord length of the petiolar sternum) to the distalmost point of the dorsal profile of the petiolar node, [R = 0.989] (Fig 1G).
PPH: Maximum height of the postpetiole in lateral view measured perpendicularly to a line defined by the linear section of the segment border between postpetiolar tergite and sternite, [R = 0.991] (Fig 1G).
Taxonomic nomenclature, OTU concepts, and natural language (NL) phenotypes were compiled in mx (http://purl.org/NET/mx-database). Taxonomic history and descriptions of taxonomic treatments were rendered from this software. Hymenoptera-specific terminology of morphological statements used in descriptions, the identification key, and diagnoses are mapped to classes in phenotype-relevant ontologies (Hymenoptera Anatomy Ontology (HAO) [21] via a URI table (S2 Table); see [22,23] for more information about this approach. A character matrix for the taxa treated in this revisionary work, extracted from the MX database, is given in S1 Appendix.
In verbal descriptions of taxa based on external morphological traits, recent taxonomic papers [11], [12] were considered. Definitions of surface sculpturing are linked to Harris [24]. Body size is given in μm, and the means of morphometric ratios as well as minimum and maximum values are given in parentheses with up to three digits. Estimated inclination of pilosity and cuticular spines is given in degrees. Definitions of species-groups as well as descriptions of species are presented in alphabetic order.

Statistical framework-hypothesis formation and testing
Data preparation and cleaning. Nest-centroid clustering (NC-clustering), and linear discriminant analysis (LDA) do not require special data preparation (e.g. standardization), hence raw data were applied for each of the statistical analyses. Data, however, are standardized (i.e., centered and scaled) for the multivariate ratio analysis (MRA) to prevent variables that are larger from dominating the analysis [5]. Variables are tested via matrix scatterplots and Pearson product-moment correlation coefficients for error variance. The lack of positive within-class correlation between different traits may indicate low repeatability of a character, or may represent a morphological artifact [8]. Each trait shown to have a strong linear correlation to other traits, therefore, was involved in multivariate analyses. Raw data in μm is given in S3 Table. Generating prior species hypotheses by combined application of NC clustering and method PART. This method searches for discontinuities in continuous morphometric data and sorts all similar cases into the same cluster in a two-step procedure. The first step reduces dimensionality in data by a cumulative linear discriminant analysis (LDA) using nest samples (i.e., individuals collected from the same nest are assumed genetically closely related, often sisters) as groups [7]. The second step calculates pairwise distances between samples using LD scores as input. The distance matrix is displayed as a dendrogram. The NC-clustering was done via packages cluster [25] and MASS [26].
The ideal number of clusters was determined by Partitioning Algorithm based on Recursive Thresholding via the package clusterGenomics [27] using the function (part), which also assigns observations (i.e. specimens, or samples) into partitions. The method estimates the number of clusters in a data based on recursive application of the Gap statistic [28] and is able to discover both top-level clusters as well as sub-clusters nested within the main clusters. If more than one cluster is returned by the Gap statistic, it is re-optimized on each subset of cases corresponding to a cluster until a stopping threshold is reached or the subset under evaluation has less than 2 Ã minSize cases [10]. Method two clustering methods are used to determine optimal number of clusters "hclust" and "kmeans" with 500 bootstrap iterations. The results of PART is mapped on the dendrogram by colored bars via function 'mark.dendrogram' found in [29]. This protocol has been introduced by Csősz & Fisher [30]. The script written in R is available in S2 Appendix.
Arriving at final species hypothesis using confirmatory Linear Discriminant Analysis (LDA) and LDA ratio extractor. To provide increased reliability of species delimitation, hypotheses on clusters and classification of cases via exploratory processes were confirmed by LDA Leave-one-out cross-validation (LOOCV). Classification hypotheses were imposed for all samples congruently classified by partitioning methods while wild-card settings (i.e. no prior hypothesis imposed on its classification) were given to samples that were incongruently classified by the two methods. To extract the best ratios for the easiest species separation in the key and diagnoses we applied multivariate ratio analysis (MRA), a modern statistical method based on principal component analysis (PCA) and linear discriminant analysis (LDA) [5].

Results and Discussion
Both clustering algorithms 'hclust' and 'kmeans' using function 'part' returned 9 clusters. The pattern recognized by these partitioning algorithms can be fitted on the hierarchical structure seen on the dendrogram generated by NC-clustering (Fig 2).
The cross-validated (LOOCV) LDA confirmed separation of eight of the nine clusters recognized by the partitioning methods. Separation of the two slightly conflicting clusters within "excelsior" (shown in black and blue in Fig 2) returned by both clustering methods is not confirmed convincingly by the LOOCV-LDA, hence remain lumped in "excelsior." Four additional samples placed into "rugosus" by the PART method have also been reclassified as "excelsior" by cross-validation LDA; the new classification is also supported by other biological evidence such as geographic and elevational distributions of populations.
The eight-species hypothesis is confirmed by cross-validation LDA. The overall classification success is 99.12%, and each cluster but "excelsior" is separated with 100% classification success, while the latter attains 97.6% success. The phenetically-distinguishable clusters represent eight morphologically diagnosable OTUs that differ in many qualitative characters (e.g. shape of propodeal spines, surface sculpturing, color characteristics etc.), hence the eight clusters solution is accepted as the final species hypothesis. These species are also known to occupy different niches, and exhibit different spatial distributions. The geographic distribution of each morphospecies corresponds to the known major areas of endemism in Madagascar [31,32] and can be characterized by one of the simplified bioclimatic zones of Madagascar [33], (after Cornet [34]): eastern rainforest, central montane forest, western dry forest, and southwest desert spiny bush thicket (Fig 3).
These species are grouped into three species complexes by morphological similarity. The retusispinosus complex consists of two species: N. retusispinosus (Forel, 1892) and N. tamatavensis sp. n.; the excelsior complex includs two new species: Nesomyrmex excelsior sp. n. and N. rugosus sp. n.; the sikorai complex contains three species: N. modestus sp. n., N. sikorai (Emery, 1896), N. striatus sp. n.; while N. reticulatus sp. n. forms a complex of its own in the Malagasy zoogeographical region. Separation of species as well as complexes is also confirmed by Multivariate Ratio Analyses. Species complexes can be separated based on two morphometric ratios; MW/SPST yields a separation between the retusispinosus complex and others and CW/ML helps discriminate between the excelsior and sikorai complexes (Fig 4). Nesomyrmex retusispinosus and N. tamatavensis sp. n. can be easily separated making use of a single ratio PoOC/ PSTI ( Fig 5); CL/CW helps to separate N. modestus sp. n. from N. sikorai (Emery, 1896) and N. striatus sp. n. (Fig 6); while the two latter species form two non-overlapping clusters by using a combination of the two ratios shown in Fig 7. Two species placed in the excelsior complex, N. excelsior sp. n. and N. rugosus sp. n., form slightly overlapping clusters in MRA plot (Fig 8). The best ratio (PSTI/PPW) tells the two species apart with a 6.5% error rate. Morphometric data for species calculated on individuals are given in Table 1.
Separation of eight species of the nine morphologically-delineated clusters revealed by the exploratory process, i.e. the combination of NC clustering and method PART (Partitioning based on Recursive Thresholding), have been confirmed by the LDA.
Combining NC clustering and PART helps to accelerate biodiversity studies and provides an opportunity to test and compare levels of variation in quantified morphological traits. This combined approach relies on discontinuity in quantitative phenotypic variations and helps to recognize morphologically divergent lineages believed to represent OTUs reproductively isolated at a certain level. This is generally the philosophy behind morphological approaches.
Although the visual products of NC-clustering may be reminiscent of phylogenetic trees, it is important to emphasize that the results of our analysis on continuous morphometric data are not equivalent to a phylogenetic analysis. Over generations, the gene pool of a population can undergo a row of minor changes manifesting in diverging phenotypes, forming the theoretical basis of phenetic approaches. Phenetic distances, calculated from differences in after confirmation by cross-validated LDA. Different colors represent species. Nesomyrmex excelsior sp. n.: dark blue, N. modestus sp. n.: light green, N. reticulatus sp. n. (RTC') grey, N. retusispinosus (RTS'): orange, N. rugosus sp. n.: red, N. sikorai: khaki, N. striatus sp. n. (STR'): lilac, N. tamatavensis sp. n.: dark green. Prior species hypothesis was generated by applying the partitioning method PART using two clustering methods, hclust ('part-hclust') and kmeans ('part-kmeans'). Color code is the same as above, subclusters differently returned by two clustering methods within N. excelsior (dark blue in final species hypothesis) are split into light blue and black. Due to this discrepancy, the lack of other biological evidence, and weak support of LDA, light blue and black clusters are lumped into one group: N. excelsior. For further explanation, see the text.  Toward Objective Taxonomy on the Malagasy Nesomyrmex Sikorai Group quantitative morphometric traits, may often correlate with the age of lineages and the level of reproductive isolation. However, for many features important for phylogenetic inference, shared ancestry cannot be convincingly ascertained, hence genealogical patterns based on phenetic morphological approaches should be inferred only with extreme care.
Our knowledge about the morphological evolution of traits and the development of stable alternative phenotypes generated by an environmental factor is limited. However, we believe that novel, improved algorithmic approaches applied complementarily to genetic markers offer great opportunities to test hypothesis of molecular phylogeny. Steady increases in our understanding of the development and heritability of morphological traits and and their role in stabilizing selection [35] will foster further applications of quantitative morphological traits in biodiversity research.    Description of the species of Nesomyrmex sikorai species-group

Synopsis of
In this section, eight species of the N. sikorai species-group are described including biogeographic information. The basic statistics of body size ratios are given in Table 1  The list of 83 worker individuals belonging to 58 nest samples morphometrically investigated is given in S1 Table.   Distribution. This species is known to occur in smaller, isolated montane rainforests between elevation of 520 m and 1325 m (mean: 1007 m) in the western and central part of Madagascar (Fig 3). The list of 18 worker individuals belonging to 18 nest samples morphometrically investigated is given in S1 Table. Diagnosis. See in key. Etymology. The name (modestus = moderate) refers to the moderately coarse surface sculpturing of this species.

Nesomyrmex modestus
Distribution. This species is known to occur in rainforests between elevation of 520 m and 1325 m (mean: 514 m) in the southwestern part of Madagascar (Fig 3). The list of 7 worker individuals belonging to 7 nest samples morphometrically investigated is given in S1 Table. Diagnosis. See in key.  Etymology. The name refers to the fine, micro-reticulate body sculpturing.
Distribution. This species is known to occur in dry forests in lowlands between 50 m and 130 m (mean: 94 m) of the western and southern coasts of Madagascar (Fig 3). The list of 9 worker individuals belonging to 9 nest samples morphometrically investigated is given in S1 Table. Diagnosis. See in key. Distribution. This species is known to occur in rain forests and montane forests in lowlands between 918 m and 1080 m in central Madagascar (Fig 3). The list of 24 worker individuals belonging to 23 nest samples morphometrically investigated is given in S1 Table. Diagnosis. See in key.  Etymology. The name refers to the coarse, rugose body sculpturing. Distribution. This species is known to occur in rainforests between elevation of 200 m and 520 m (mean: 390 m) of the eastern and southern coasts of Madagascar (Fig 3).  The list of 36 worker individuals belonging to 28 nest samples morphometrically investigated is given in S1 Table. Diagnosis. See in key. Description of workers. Distribution. This species is known to occur in montane rainforests between elevations of 200 m and 520 m in central Madagascar (Fig 3).  The list of 9 worker individuals belonging to 9 nest samples morphometrically investigated is given in S1 Table. Diagnosis. See in key. Description of workers.  Etymology. The name refers to the coarse, striate/costate surface sculpturing on the head and mesosoma.

Nesomyrmex striatus
Distribution. This species is known to occur in montane rainforests between elevations of 900 m and 1125 m (mean: 1054 m) of the southeastern part of Madagascar (Fig 3).  Toward Objective Taxonomy on the Malagasy Nesomyrmex Sikorai Group Paratypes: 5 workers and a male with the same label data with the holotype with CASENT codes: CASENT0763559, BLF8390, (2w, CAS); CASENT0496293, BLF8390, (3w, CAS); CASENT0496294, BLF8390, (3w, CAS); CASENT0496295, BLF8390, (1q, 1m, CAS); The list of 41 worker individuals belonging to 32 nest samples morphometrically investigated is given in S1 Table. Diagnosis. See in key.  Etymology. Tamatave is the French name of Toamasina, the capital of the Atsinanana region, where this species is abundant.
Distribution. This species is known to occur in rainforests from sea level to 1100 m predominantly in the northeastern part of Madagascar (Fig 3). The only known locality of this species in the southwestern Toliara region (Forêt Classée d'Analavelona, NNW Mahaboboka) is in an isolated rainforest surrounded by dry forest; this rainforest is a relic of the formerly widespread rainforest now found primarily in the east.
Supporting Information S1 Appendix. Extracted character matrix for each taxon treated in this revisionary work generated by the MX database. (NEX) S2 Appendix. R script of NC clustering and method PART implementing cluster methods "hclust" and "kmeans". Mark dendrogram function mapping the results of partitioning algorithm PART on the dendrogram is also added. (TXT) S1 Table. List of morphometrically investigated samples. Unique CASENT number for pinned samples, locality, geographic coordinates (E, N) in decimal format, altitude (ALT) in meters a.s.l., collector's name, date and number of specimens investigated bearing the given CASENT number are provided. HT = Holotype, PT = paratype(s). All samples collected in Toliara administrative region, Madagascar, and deposited at the California Academy of Sciences (CAS). (XLSX) S2 Table. URI table for morphometric characters and Hymenoptera-specific terminology of morphological statements used in descriptions, identification key, and diagnoses are mapped to classes in phenotype-relevant ontologies. (DOCX) S3 Table. Morphometric data of 22 continuous morphometric traits of 227 individuals is given in μm. CASENT code (casent), final species hypothesis (species), geographic coordinates (long, lat) and the name format as samples appear on the dendrogram (dendro-name) are also provided in the table. HT = Holotype, PT = paratype(s). (CSV)