Evaluation of multilocus marker efficacy for delineating mangrove species of West Coast India

The plant DNA barcoding is a complex and requires more than one marker(s) as compared to animal barcoding. Mangroves are diverse estuarine ecosystem prevalent in the tropical and subtropical zone, but anthropogenic activity turned them into the vulnerable ecosystem. There is a need to build a molecular reference library of mangrove plant species based on molecular barcode marker along with morphological characteristics. In this study, we tested the core plant barcode (rbcL and matK) and four promising complementary barcodes (ITS2, psbK-psbI, rpoC1 and atpF-atpH) in 14 mangroves species belonging to 5 families from West Coast India. Data analysis was performed based on barcode gap analysis, intra- and inter-specific genetic distance, Automated Barcode Gap Discovery (ABGD), TaxonDNA (BM, BCM), Poisson Tree Processes (PTP) and General Mixed Yule-coalescent (GMYC). matK+ITS2 marker based on GMYC method resolved 57.14% of mangroves species and TaxonDNA, ABGD, and PTP discriminated 42.85% of mangrove species. With a single locus analysis, ITS2 exhibited the higher discriminatory power (87.82%) and combinations of matK + ITS2 provided the highest discrimination success (89.74%) rate except for Avicennia genus. Further, we explored 3 additional markers (psbK-psbI, rpoC1, and atpF-atpH) for Avicennia genera (A. alba, A. officinalis and A. marina) and atpF-atpH locus was able to discriminate three species of Avicennia genera. Our analysis underscored the efficacy of matK + ITS2 markers along with atpF-atpH as the best combination for mangrove identification in West Coast India regions.


Introduction
Plant DNA barcoding is more complex than animal DNA barcoding and it often requires more than one locus approach. The mitochondrial cytochrome oxidase I (COI) gene fragment is considered as the universal animal barcode. Plant mitochondrial COI was excluded from the barcoding, due to the low substitution rates [1][2][3]. Later, the Consortium for the Barcode of Life (CBOL) evaluated 7 leading candidate DNA regions (matK, rbcL, trnH-psbA spacer, atpF-atpH spacer, rpoB, rpoC1, and psbK-psbI spacer) [4]. The  two-locus combinations of rbcL and matK as the core plant barcode complemented with trnH-psbA intergenic spacer based on the parameters of recoverability, sequence quality, and levels of species discrimination, CBOL [4][5][6]. China Plant Barcode of Life recommended the internal transcribed spacer (ITS) as an additional candidate plant DNA barcode [7]. Comparative studies of seven markers psbA-trnH, matK, rbcL, rpoC1, ycf5, ITS2, and ITS from medicinal plant species were performed. Authors recommended that ITS2 is the best potential marker which discriminated 92.7% plants at the species level in more than 6600 plant samples [8]. The potential discriminating DNA barcode varies from one botanical family to other. The plastid marker matK can differentiate more than 90% of species in the Orchidaceae (Orchid family) but less than 49% in the Myristicaceae (nutmeg family) [9][10]. However, identification of 92 species from 32 genera using multilocus markers (coding regions (rpoB, rpoC1, rbcL, matK and 23S rDNA) and non-coding (trnH-psbA, atpF-atpH, and psbK-psbI) could achieve 69%-71% with several combinations [3]. More than two loci can improve the plant identification success rate; a recent example of the flora of Canada revealed 93% success in species identification with rbcL and matK, while the addition of the trnH-psbA intergenic spacer achieved discrimination up to 95% [11]. rbcL and matK loci showed poor discrimination in species-rich genera and complex taxa of Lysimachia, Ficus, Holcoglossum, and Curcuma [12][13][14][15]. The lowest discriminatory power was observed in closely related groups of Lysimachia with rbcL (26.5-38.1%), followed by matK (55.9-60.8%) and combinations of core barcodes (rbcL + matK) had discrimination of 47.1-60.8% [15]. Beside all these markers, several plastid regions such as ycf1, atpF-H, psbK-psbI, ropC1, rpoB, and trnL-trnF were frequently evaluated as plant barcode. However, the application of DNA barcoding has been hindered owing to the difficulty in distinguishing closely related species, especially in recently diverged taxa.
Mangroves are unique component of the coastal ecosystem of the world with a niche distribution in tropical and subtropical climates [16]. They are adapted to the local environment like fluctuated water level, salinity and anoxic condition through special features such as aerial breathing and extensive supporting roots, buttresses, salt-excreting leaves and viviparous propagules [17][18]. Plant mangrove species comprise 70 species belonging to about 20 families and 27 genera [19][20]. The West Coast of India is more or less steeply shelved, lack major deltas, river estuaries and dominated by sandy and rocky substratum. The West Coast also harbors one of the world's biodiversity hotspot of Western Ghats in India. It includes the states of Gujarat, Maharashtra, Goa, Karnataka, and Kerala, which harbors 37 species (25 genera under 16 families). The most dominant mangrove species found along the West Coast of India are Rhizophora mucronata, R. apiculata, Bruguiera gymnorrhiza, B. parviflora, Sonneratia alba, S. caseolaris, Cariops tagal, Heretiera littoralis, Xylocarpus granatum, X. molluscensis, Avicennia officinalis, A. marina, Excoecaria agallocha and Lumnitzera racemosa [21].

Ethics statement
The mangrove samples were collected from different parts of Goa, west coast region, with the permission from the Principal Chief Conservator of Forest, Goa Forest Department, Goa, India. Further, none of the species are endangered or protected species.

Mangrove plant sampling
In the present study, a total of 44 specimens of mangroves belonging to 14 species, 9 genera and 5 families were collected from Goa region, West Coast of India with geographical co-ordinates latitude of 15.5256˚N and longitude of 73.8753˚E. The selected genera of mangroves such as Rhizophora, Bruguiera, Avicennia, and Sonneratia each represented by at least two species and Aegiceras, Excoecaria, Ceriops, Kandelia, Acanthus genus were represented by single species. Mangrove species were identified based on morphological keys [22] and mounted on herbarium sheets, photographed and deposited at the Botanical Survey of India, Western Regional Centre, Pune, India as barcode vouchers [17]. The well-identified voucher specimens along with their taxonomic information, collection details, and GenBank accession numbers were described in Table 1. For each specimen, leaf tissue was collected in the field, labeled and stored in -80˚C for further analysis.

DNA extraction
Genomic DNA was isolated from mangrove species by modified cetyl-trimethyl ammonium bromide (CTAB) protocol [17]. Leaf tissue was homogenized in liquid nitrogen and CTAB  Table). PCR products were purified as per manufacturer's instruction (Chromous Biotech) and further sequencing reactions were carried out using the Big Dye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems) and analyzed on ABI 3500xL Genetic Analyzer (Applied Biosystems).

Data analysis
Sequence assembly and alignment were performed in Codon Code Aligner v.3.0.1 (Codon Code Corporation) and MEGA 6.0.6 respectively [23]. All sequences were submitted to Barcode of Life Data Systems (BOLD) database under the project code IMDB with their taxonomic and sampling details (doi:10.5883/DS-IMDBNG) [24]. Nucleotide diagnostic characters of mangrove species were analyzed based on the BOLD system. Further, matK and ITS2 sequences were concatenated using DNASP v5.10 tool and analyzed in MEGA 6 [25]. NJ trees were constructed using MEGA 6.0 and Kimura 2 parameter (K2P) genetic distance model with node support based on 1000 bootstrap replicates.

TaxonDNA
TaxonDNA v1.6.2 analysis for species identification with 'Best Match' and 'Best Closest Match' method was performed [17,26]. The threshold (T) was set at 95%. All the results above the threshold (T) were treated as 'incorrect'. Similarly, if all matches of the query sequence were below threshold (T), the barcode assignment was considered to be the 'correct' identification. If the matches of the query sequences were good and corresponded to a mixture of species, then it was treated as ambiguous identification.

Automated Barcode Gap Discovery (ABGD)
The ABGD, is a web server based distance method, which can partition the sequences into potential species based on the barcode gap whenever the divergence within the same species is smaller than organisms from different species [27][28][29]. The ABGD analysis was performed with two relative gap width (X = 1.0, 1.5) and three distance metrics (Jukes-Cantor, K2P, and p-distance) with default parameters.

General Mixed Yule-coalescent (GMYC)
The GMYC method requires a fully resolved ultrametric tree for analysis. This Bayesian tree was built using BEAST v1.8 [30][31]. Input file (XML) for BEAST was compiled in BEAUti v1.83 with an HKY+G molecular evolutionary model for the ITS2 dataset and GTR+G for concatenated dataset of matK+ITS2. These models were derived using PartitionFinder V1.1.1. Tree prior was set to Yule Process and the length of Markov chain Monte Carlo (MCMC) chain was 40,000,000 generation and sampling was performed at every 4000 step. However, all other settings were kept as default. Convergence of the BEAST runs to the posterior distribution. The adequacy of sampling (based on the Effective Sample Size (ESS) diagnostic) was assessed with Tracer v1.4. After removing the first 20% of the samples as burn-in, all other runs were combined to generate posterior probabilities of nodes from the sampled trees using TreeAnnotator v1.7.4. Estimation of the number of species included in the tree was analyzed using GMYC with single and multiple thresholds in R by the APE and SPLITS packages [27,[30][31][32][33][34][35][36].

Poisson Tree Process model (PTP)
The PTP model is a tree-based method that differentiates specimen into populations and species level using coalescence theory [27][28][29] The RaxML tree was constructed using CIPRES portal and input data was generated for bPTP analysis. The calculations were conducted on the bPTP web server (http://species.h-its.org), with the following parameters (500,000 MCMC generations, thinning 100 and burn-in 25%).

Sequence analysis
A total of 148 sequences (44 rbcL, 43 matK, 40 ITS2, 9 atpF-atpH, 6 psbK-psbI and 6 rpoC1) were acquired from 44 specimens of mangrove belonging to 14 species, 9 genera, and 5 families. The sequences (rbcL: 510bp, matK: 712bp, ITS2: 445bp, atpF-atpH: 511bp, psbK-psbI: 360bp and rpoC1: 451bp) with few insertions and deletions, without stop codon, along with the specimen collection details were submitted to the Barcode of Life Data Systems (BOLD) in form of a project 'IMDB' (dx.doi.org/10.5883/DS-IMDBNG). These sequences were submitted to the NCBI GenBank through BOLD systems and their accession numbers were obtained ( Table 1). The scatter plot represented the number of individuals in each species against their maximum intra-specific distances, as a test for sampling bias (Fig 1). Previous evaluation of DNA barcode using rbcL and matK demonstrated 47.72% and 72.09% efficiency in resolving mangrove taxa respectively. The matK sequence region showed better efficiency than the rbcL for resolution of mangrove taxa [17]. In the present study, matK along with ITS2 and few supplementary markers (atpF-atpH, psbK-psbI and rpoC1) were used for the species identification of the cryptic mangrove taxa. Sequence analysis was performed to estimate the average GC content of the corresponding locus. The average GC content observed was 63.11%, 42.7%, 35.18%, 31.22% and 44.6% for ITS2, matK+ITS2, atpF-atpH, psbK-psbI and rpoC1 locus respectively.

Taxonomic assignment
Altogether 40 DNA barcodes from ITS2 and matK+ITS2 were used for species delineation. The Neighbour-Joining (K2P) trees constructed with bootstrap support (1000) and bootstrap values of >60 exhibited substantial resolution among the OTUs corresponding to their genera except for A. marina and A. officinalis (Supporting information S1 Fig).

Species identification based on barcoding gap
The initial partition of ITS2, K2P with X = 1.0, prior maximal distance P = 0.021 produced consistent 12 operational taxonomic units (OTUs). S. alba was split into 3 groups, while members of Rhizophora and Avicennia were merged (Fig 3; Supporting information S2 Table). Whereas, recursive partitioning with P = 0.00167, produced inconsistently18 OTUs, of which A. alba, A. officinalis, and B. cylindrica showed split, while B. gymnorrhiza was clustered perfectly ( Fig 4A). In concatenated matK+ITS2, at X = 1.0 for all three metrics, OTUs ranged from 4-11 in the initial partition, but recursive partition tends to exhibit inconsistent OTUs (Fig 4B). When relative gap width was increased from X = 1.0 to X = 1.5, suddenly OTUs in ITS2 for initial partition was dropped to maximum 7, while recursive partition showed an increase in OTUs, up to 16 at P = 0.001. The initial partition for matK+ITS2, with X = 1, P = 0.0129 produced 11 OTUs. Avicennia and Bruguiera members were merged, while S. alba showed split. In recursive partition, with P = 0.001, A. alba, B. cylindrica, B. gymnorrhiza were resolved Table 4. Diagnostic characters of mangrove taxa.

matK + ITS2
Aegiceras corniculatum ( Identification of diagnostic nucleotides for each of the 14 mangrove taxa recovered from the BOLD system. Based on their utility for mangrove taxa delineating referred as diagnostic characters, diagnostic or partial character, partial characters and partial or uninformative characters. perfectly, while A. officinalis, A. marina along with R. apiculata and R. mucronata remained merged. The initial partition with an atpF-atpH barcode, JC and K2P metrics with (X = 1, 1.5) showed 3 OTUs (P = 0.0027) without any recursive partition except (X = 1.5, P = 0.00278, 1 OTU). With atpF-atpH, at X = 1.5 initial partition with P = 0.00278, 3 OTUs were produced in A. alba, A. officinalis, and A. marina. Similarly, psbK-psbI showed 4 OTUs (P = 0.001) in an initial partition for JC and K2P metrics at X = 1 and p-distance had only 2 OTUs with 1 OTU in the recursive partition. At X = 1.5, only JC and p-distance were able to partition data. JC the initial partition at P = 0.001 produced 4 OTUs, while at P = 0.0046, produced 2 OTUs. Metrics p-distance predicted 2 OTUs in an initial partition and 1 OTU in the recursive partition. Barcode locus rpoC1 at X = 1 with JC and K2P metrics showed initial partition of 2 OTUs and the recursive partition at P = 0.00278 predicted 1 OTU.
In addition to the above methods used for taxonomic evaluation, maximum likelihood (ML) based approach was added to get an additional perspective towards the species delineation through Poissons Tree Process (PTP). The ML analysis exhibited 10 OTUs with ITS2, where Avicennia, Bruguiera, Rhizophora, Ceriops, and Kandelia genera were merged while S. alba and S. caseolaris were split (Fig 4A and 4B). With matK+ITS2, 11 OTUs were formed by merging of Avicennia, Bruguiera and Rhizophora genera and S. alba was split. TaxonDNA is an alignment-based method based on sequence distance matrices. Percentage of correct/incorrect/ambiguous assignment of a taxon is compared using the molecular operating taxonomic unit (MOTU). The species-specific clustering was performed using match and mismatch criteria. T -Threshold; C-Correct; A-Ambiguous; Inc-Incorrect.

Discussion
There is no consensus regarding perfect plant DNA barcode, however few of plastid and nuclear coding (rbcL, matK, rpoB, and rpoC1) and non-coding (trnH-psbA, ITS2, psbK-psbI and atpF-atpH) marker fulfilled the required criteria [3,9,37]. The rbcL and matK are considered as core barcode, which can be further complemented with trnH-psbA and ITS2 as plant barcode suggested by China Plant BOL [4,7]. We employed these markers for molecular identification of mangrove plant species. In our earlier report, we have tested potential barcode candidates rbcL and matK individual as well as concatenated rbcL+matK, which demarcated most of the species such as A. ilicifolius, E. agallocha, A. corniculatum, K. candel, C. tagal, B. cylindrica and B. gymnorrhiza. An initial analysis was performed based on traditional barcode methods (Barcode gap analysis and NJ tree with the K2P method) [17]. Individual, as well as concatenated rbcL and matK barcode demarcated almost all mangroves species except for Rhizophora, Sonneratia and Avicennia genera [17]. The Plant CBOL group (2009) reported that only 72% species were resolved using combined rbcL and matK. A similar result was observed after combining rbcL and matK from closely related species of Curcuma [13]. Moreover, Avicennia genera with three species, of which A. alba, was resolved perfectly using matK but A. officinalis and A. marina lumped together and unable to resolve at the species level. Low resolution using DNA barcode regions has been documented in many other plants such as the genus Araucaria (32%), Solidago (17%) and Quercus (0%) [38]. A high percentage of bidirectional reads were critical for a successful plant barcoding system, given the low amount of variation that separates many plant species [3][4]. The risk of misassignment can be anticipated due to sequencing error or incomplete bidirectional reads. We observed the significant quality of PCR amplification and sequencing ranged from 95% to 100% in all tested markers. However, for ITS2 barcode, we performed many amplifications and sequencing attempt for S. alba, S. caseolaris, and A. ilicifolius mangroves taxa. Sequencing of S. alba and S. caseolaris resulted in a mixed and low-quality chromatogram with unidirectional success. The possible explanation for this kind of situation can be underscored by the presence of either ITS as multiple copies or pseudogene or/and fungal ITS contamination in plant [39]. Species identification success rate using rbcL+matK is higher, whereas rbcL sequence recovery ranged from 90-100% [4,38,40]. Hence, CBOL group recommends rbcL primers to possess universality for land plants. As reported by CBOL, the matK region showed sequencing success of 90% [4]. The matK marker provided 88% sequencing success, with the use of 10 primer pair combinations [3].
Very few reports are available on the DNA barcoding of the mangrove taxa [17,41]. Lower genetic distances were observed based on K2P among mangrove taxa particularly genera Rhizophora, Sonneratia, Avicennia, and Bruguiera based on rbcL, matK and ITS barcode [41]. Genetic distance ranged from 0.01 to 0.25 for rbcL gene, 0.01 to 0.89 for matK and 0.01-0.508 for ITS locus [41]. Similar results were observed in our studies, for rbcL and matK the genetic distance ranged from 0-0.68% and 0-1.32% respectively [17]. The discrimination power of proposed DNA barcode by CBOL Plant Working Group may vary in different plant group [12,[42][43]. Depending on the taxon, the use of additional markers may be needed for discrimination [4].
For single barcode ITS2, ABGD (K2P, X = 1), Taxon DNA (T = 3%) and GMYC produced consistent OTUs with corresponding results. Additionally, GMYC resolved R. apiculata, R. mucronata, and A. alba species. Overall highest taxon assignment was observed as 57.14% in GMYC and taxon resolution was up to 42.85% in ABGD, TaxonDNA, and PTP barcoding methods. However, the resolution of Chlorella-like species (microalgae) produced by GMYC, PTP, ABGD and character-based barcoding methods were variables based on several marker studies such as rbcL, ITS, and tufA [27]. Single ITS2 with PTP analysis was not able to resolve C. tagal and K. candel, which was further improved in the matK+ITS2 analysis. Analysis following the above methods, species delimitation through PTP and GMYC was utilized, due to their robustness in the absence of barcoding gap [44]. Even though they are based on different algorithms, both methods calculated the point of transition between species and population [27]. The GMYC method has a theoretically strong background and requires ultrametric gene tree that takes more time to analyse data. In contrast, the PTP is a recently developed method as an alternative to GMYC which requires non-ultrametric gene tree and consumes less time [44][45]. Both the methods revealed sort of identical results, however, the two analyses differed in resolution. In both the methods, five species (B. cylindrica, B. gymnorrhiza, A. officinalis and A. marina) in GMYC and seven species (B. cylindrica, B. gymnorrhiza, A. alba, A. officinalis, A. marina, R. apiculata and R. mucronata) in PTP were merged into single OTUs, potentially indicating low intraspecific diversity. It reflected that there are many overlooked/cryptic species present within the mangroves. When we performed ABGD with relative gap width X = 1.5 for K2P method, S. alba, and S. caseolaris species were demarcated, while rest of the mangrove species were split. At a relative gap width (X = 1) about seven species of the mangrove s were merged into single OTU and observed that the ABGD tends to lump species by increasing the number of merged OTUs [29]. Beside this, we also observed inconsistency of OTUs count during initial partition to recursive partition. Recursive partitioning recognizes more OTUs than initial ones, showing their superior capability to deal with variation in sample sizes of the species under study [29]. Moreover, TaxonDNA with a lower threshold value (0.3%) demarcated B. cylindrica and B. gymnorrhiza. The possible explanation for this might be due to lack of barcode gap resulting in merged OTUs, which can be optimized by analyzing more than 5 sequences per species, and we have used 3 sequences per species [28]. In TaxonDNA analysis, for rbcL threshold (T) was observed 0%, a similar result was recorded for rbcL in the Zingiberaceae family [46]. However, the threshold (T) for Indian Zingiberaceae family members was recorded as 0.20% for rbcL and 0% for rpoB and accD [43].
Avicennia is the most diverse mangrove genus, comprising eight species, out of which three are endemic to the Atlantic-East Pacific (AEP) region and five are endemic in the Indo-West Pacific region (IWP) [47]. A recent systematic revision of Avicennia based on morphological characters formed three groups: (1) A. marina; (2) A. officinalis and A. integra; and (3) A. rumphiana and A. alba [47]. In the current study, we have included A. marina, A. officinalis, and A. alba species, which were resolved with other barcode markers. Two plastid spacers such as psbK-psbI and atpF-atpH are recommended as potential plant DNA barcodes based on the flora of the Kruger National Park South Africa as a model system [48]. Similarly, we used three markers (atpF-atpH, psbK-psbI and rpoC1) for cryptic genera Avicennia and further evaluated with ABGD and TaxonDNA barcode methods. Both the methods consistently resolved all three Avicennia species using an atpF-atpH marker. Similarly, phylogenetic reconstruction of Avicennia genera based on trnT-trnD intergenic spacer region and the psbA gene revealed that A. marina is sister to the A. officinalis/A. integra and A. alba is genetically distinct [47].

Conclusions
In the present study, we tested core DNA barcode rbcL, matK, ITS2, atpF-atpH, psbK-psbI and rpoC1 to resolve mangroves species. Individual, as well as concatenated matK+ITS2 are helpful to demarcate mangroves at the species level. Single barcode matK is sufficient to resolve A. ilicifolius, A. corniculatum, E. agallocha, Ceriops tagal, K. candel, B. cylindrica and B. gymnorrhiza. ITS2 was able to discriminate R. apiculata and R. mucronata species based on GMYC method, while A. alba was resolved by concatenation of matK+ITS2. A cryptic genus Avicennia was delimitated based on the atpF-atpH single barcode. In the present work, the foundation work was done towards DNA barcoding of mangroves plant genera, such as Rhizophora, Avicennia, Acanthus, Kandelia, Ceriops, Bruguiera, Aegiceras and Excoecaria. Compiled mangroves barcoding result had some limitations, most of which are due to the low mangrove taxa sample coverage. Further, there is a need to explore additional mangroves taxa which will improve mangrove species identification for practical conservation.
Supporting information S1 Table. List of primers used in the current study.