Genomic approach for conservation and the sustainable management of endangered species of the Amazon

A broad panel of potentially amplifiable microsatellite loci and a multiplex system were developed for the Amazonian symbol fish species Arapaima gigas, which is currently in high danger of extinction due to the disorderly fishing exploitation. Several factors have contributed to the increase of this threat, among which we highlight the lack of genetic information about the structure and taxonomic status of the species, as well as the lack of accurate tools for evaluation of the effectivity of current management programs. Based on Arapaima gigas’ whole genome, available at the NCBI database (ID: 12404), a total of 95,098 unique perfect microsatellites were identified, including their proposed primers. From this panel, a multiplex system containing 12 tetranucleotide microsatellite markers was validated. These tools are valuable for research in as many areas as bioinformatics, ecology, genetics, evolution and comparative studies, since they are able to provide more accurate information for fishing management, conservation of wild populations and genetic management of aquaculture.


Introduction
The species Arapaima gigas (Schinz, 1822) belongs to the Arapaimidae family-order of the Osteoglossiformes [1], which composes one of the oldest groups of teleost fishes. It is the world's largest scale fish, and specimens may reach up to 200 kg of body mass and 3 m of length [2]. The species can be found in basins of South American countries, such as Brazil, Peru, Colombia, Ecuador, Bolivia and Guyana [3,4]. Since years ago, A. gigas has been relevant in aquaculture due to its fast growth, high fillet yield, mild-flavored white meat, and great market acceptance, both domestically and abroad [5,6]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 A. gigas is listed in Annex II of the International Convention on the Trade in Endangered Species (CITES) and in the International Union for Conservation of Nature's (IUCN) Red List of Endangered Species, within the category "data deficient", meaning that there is no proper information to make a direct or indirect assessment of its extinction risk based on its distribution and/or population status. Furthermore, in Brazil, arapaima were not included in the national list of endangered species due to lack of data [7].
Many reasons have contributed to the aggravation of the threat to A. gigas, including (i) the exploitation background of wild individuals, coupled with poor management strategies to comply with current fishing legislation; (ii) the current poorly refined genetic data on taxonomic structure and status; and (iii) the lack of management in certain areas, as well as the lack of accurate tools for evaluating the effectiveness on current management programs [8][9][10][11][12].
In Brazil, the predatory fishing of A. gigas-a practice which was intensified in the 1970s, impacted its population distribution in most parts of the Amazon basin. Consequently, strategies for management of the species were adopted by government agencies, such as the establishment of a minimum size for capture, the determination of reproductive season, and the annual fishing prohibition in the states of Amazonas, Pará, Amapá, Rondônia, Roraima, Tocantins and Acre [13][14][15]. However, the supervision of these strategies is deficient due to the lack of financial and human resources, as well as the lack of well-defined methods to identify the origins of the A. gigas's derived products, hindering the actions of the management system. For instance, recent data shows that 77% of A. gigas meat marketed in Santarém-PA comes from illegal fishing [16]. Furthermore, a substantial amount of A. gigas fresh or saltyconserved meat is exported illegally to Brazilian markets by middlemen, through the Guayaramerín border, in Bolivia [17].
The population structure of A. gigas is still undefined, and although genetic analyses revealed the existence of two distinct populations in Amazon and Araguaia-Tocantins basins, there are evidences showing the existence of multiple species of Arapaima [18,19]. On the other hand, the translocation of individuals is also an imminent threat [20], which can lead to loss of genetic diversity, decreased fitness and increased risk of extinction [21], since young individuals have often been translocated among different parts of the Amazon basin-and among this basin and others, mainly to attend aquaculture demand [22]. Therefore, the collection of molecular data is essential for monitoring and defining adequate strategies for the management actions.
It is a consensus that A. gigas populations are following a declining trend, where the main threat is overfishing [16,23], and the majority of communities do not practice sustainable management-with few exceptions, such as the Mamirauá Reserve, in Brazil. However, even in communities where sustainable management is done, there is no consistent, available data concerning the current conservation status of the species, much less concerning the population trends within these communities [11]. This reinforces the need of more precise management and the establishment of stakeholders to ensure the conservation and sustainable management of the species [24].
Due to the history of disorderly exploitation and the risk of extinction of the Amazonian's symbol fish Arapaima gigas, specialists emphasize the need of using molecular markers to support management strategies and the evaluation of the effectiveness of the A. gigas's conservation programs, highlighting the importance of using these markers for the identification and genetic tracing of marketed specimens [18].
Microsatellite markers, also known as short tandem repeats (STR) or simple sequence repeats (SSR), are DNA sequences consisting of tandemly repeating mononucleotide, dinucleotide, trinucleotide, tetranucleotide and pentanucleotide units, arranged throughout the genomes of most eukaryotic species [25]. They present features such as broad distribution in eukaryotic genomes, which can easily be detected by polymerase chain reaction (PCR), locusspecific nature, co-dominant inheritance, and high mutation rate, being highly polymorphic and hyper-variable [26][27][28]. All these features contribute to the advantages of using microsatellite markers in several research areas-as forensic and population genetics, conservation biology, and for genome mapping in evolutionary and biological scenarios, since these biomarkers allow two or more loci amplifications in a single multiplex PCR reaction. The multiplex PCR system consists of the simultaneous amplification of various loci, tagged by a distinct fluorescent label, and posterior analysis by capillary electrophoresis in an automated sequencing machine. It is considered a trustworthy technique since the use of capillary electrophoresis with fluorescently labeled primers provides high detection sensibility of the amplified DNA fragments [29]. Several methods are used in microsatellite markers design. However, the majority of these are longstanding and more laborious. With the advances in Next Generation Sequencing (NGS) technologies, increased data became available, allowing a faster, cost-effective, large-scale mining of molecular markers [30,31].
It is urgent to develop new, more accurate and numerous A. gigas molecular markers, not only to be applied as a tool for genetic studies, but also to improve the management system, refining the effectiveness of conservation programs, as well as to be used as a tool for the genetic tracing of wild and marketed individuals.
Thus, in order to contribute to the preservation and sustainable management, we developed a broad genomic panel of microsatellite loci potentially amplifiable. Based on this broad panel, we designed an unprecedented multiplex system containing 12 tetranucleotide microsatellite markers, which are more defined markers and are easily discriminated in polyacrylamide gel. Hence, the use of tetranucleotide markers is also a low-cost approach, which can be optimized in laboratories equipped with simpler infrastructure, being useful in populational genetics, conservation biology, and forensic studies of the A. gigas. We found a set of 95,098 single loci of perfect microsatellites with simple repeat sequences, from the dinucleotide to hexanucleotide classes, in the genome of the A. gigas (as observed in the Supplementary Table S1 via https://doi.org/10.6084/m9.figshare.8088533). The set contained information about the microsatellite motif, the repeat copy number, the initial and final positions, the scaffold and size, the forward and reverse primers, and the expected Polymerase Chain Reaction (PCR) products. The average size of the estimated PCR products was 207 bp. However, we could not design primers for 647 loci, since these sequences were rich in TA repeats, and because the TA motif were at the beginning or at end of the scaffold. Table 1 presents the abundance of the motifs categorized by class. Here, the dinucleotide class is the most frequent, representing more than 70% of the total microsatellites found in A. gigas's genome. In this class, the TG-repeat motif was the most frequent (15,8). Additionally, we characterized the minisatellites present in 387 loci (see Supplementary Table S2 via  The number of repeats in microsatellites present in the genome of A. gigas was quite variable among the 6 classes, presenting an inverse relationship with the size of the motif. The dinucleotide class, for example, presented the largest number of repeats, whereas the class of hexanucleotides presented fewer repeats (Fig 1).

Multiplex system
The broad panel generated in our study allowed the fast selection and validation of 12 tetranucleotide microsatellite loci on a multiplex system (Agig13519, Agig50571, Agig58115, Agig08356, Agig67103, Agig93614, Agig33291, Agig90836, Agig05001, Agig70664, Agig08912 and Agig06409). The multiplex system presented a high resolution, with no overlap between the microsatellite alleles and no artifact peaks (Fig 3). Moreover, we did not detect any genotyping errors attributed to stutter bands, large allele dropouts or null alleles, which are frequent in dinucleotide microsatellites. The panel was used to evaluate a wild population of A. gigas from Santarem (n = 30). The obtained data revealed a total of 73 alleles. The average rate of alleles per locus (NA) is 6.08. The observed (HO) and expected (HE) heterozygosity rates range from 0.867 (Agig 33291) to 0.100 (Agig 05001), with an average of 0.59 and 0.64, respectively (Table 2).
There was no significant deviation from Hardy-Weinberg equilibrium (HWE) after Bonferroni correction (p<0.0041). Loci pairs in linkage disequilibrium (LD) were not registered. The average inbreeding coefficient value (FIS) was 0.06.
Forensic parameters investigated for this multiplex system presented average polymorphic informative content (PIC) value of 0.53. The average power of discrimination (PD) was 0.77 and the power of exclusion (PE) was 0.33. The average combined power of discrimination (CPD) and the combined power of exclusion (CPE) for the 12 microsatellite markers investigated was 0.999 and 0.999, respectively ( Table 2).

Discussion
The rapid development of sequencing technologies allowed the obtention of complete genome sequences from an increasing number of species [33], which is an excellent source for identification of microsatellite markers already used in several species [34][35][36][37][38]. However, the use of

PLOS ONE
sequencing Technologies demands laboratories equipped with high infrastructure and skilled personal resources, which in many cases does not match the reality of several laboratories in the Amazon region.

PLOS ONE
In this study, we identified a microsatellites panel based on sequences of the complete genome of A. gigas, a low-cost approach which can be standardized for use in laboratories equipped with both in high or low-cost infrastructure, and can be applied in studies concerning the population genetics, conservation and forensic biology of the A. gigas, and the sustainable management of this species in aquaculture.
Microsatellite markers with number of repeats among 15-20 tend to be highly polymorphic [39], being the most indicated for population genetic studies. The panel of microsatellite markers developed for A. gigas included the dinucleotide, trinucleotide and tetranucleotide classes presenting motifs with repeats within this range (Fig 1), however the tetranucleotides markers produced weaker stutter bands and had no artificial multiband patterns [40].
Microsatellite markers are observed in almost all known eukaryotic and prokaryotic genomes, both in coding and noncoding regions [41]. In the A. gigas genome, microsatellite markers are more abundant in non-coding sequences, here called genomic regions (Fig 2). This corroborates data available in the literature towards the distribution of these markerswhich may be explained by the fact that, in promoter regions, the length of microsatellites may influence the transcription activity [42].
It is suggested that microsatellite polymorphisms are associated with the number of loci replications [43]. Microsatellites with higher number of repeats are more prone to mutation/ expansion than those with fewer replicates. The correlation between repeat length and microsatellite variability is comprehensible according to replication slippage model, which is a widely accepted mutation mechanism [44]. Another variable could be the microsatellite classification, and their results indicate that tetranucleotide microsatellites have the lowest polymorphism rate, comparing to dinucleotide microsatellites. Hence, we emphasize the importance considering the number of repeats and the class of the microsatellite while selecting markers for the accomplishment of scientific studies.
In the last years, several studies aiming to elucidate population genetics and conservation biology of the A. gigas have used dinucleotide microsatellites as molecular markers, as reported by Farias et al., 2003 [45], among other studies [19, [46][47][48][49][50]. However, our results reveal that the use of tetranucleotide microsatellite markers has higher cost-effectiveness and may be suitable for laboratories equipped with high-technology or low-cost labs. Nevertheless, it is important to consider the necessity of standardization of allelic designation and internal sizestandards, so the data interchange among different laboratories will become more effective and accurate [51] To validate our findings, we developed a multiplex system using the panel of 12 microsatellite markers to evaluate the conservation status of an A. gigas population. This tetranucleotide microsatellite markers of the multiplex system have the advantage of being highly polymorphic, more stable and presenting clearer bands than the dinucleotide markers described in literature.
It is important to emphasize that the history of fierce fishing exploitation of the A. gigas is a determinant factor for the findings of this study in the population of Santarem. The reduction

PLOS ONE
of natural populations and the decrease in average volume and length of the specimens landed in the Amazon due to overfishing soon lead to the fishing collapse in 1970 [52]. This suggest that these populations may have undergone a bottleneck effect, leading to a loss of genetic variability, however, more studies are necessary to probe this hypothesis. Among the investigated forensic parameters, the values of PIC were considered satisfying accordingly to the scale reported by Botstein et al., 1980 [53]. The CPD and CPE values of this panel allow the distinction of one individual in one billion, which is similar to other multiplex genotyping systems. For instance, the system reported by Hamoy et al., 2012 [54] was used in conservation studies and in the aquaculture support.
Population genetics data are undoubtedly the most important component of the baseline of any conservation and management plan [55]. The broad microsatellite panel developed for A. gigas opens perspectives for studies in as many fields as bioinformatics, ecology, genetics, evolution, and comparative studies among species [32]. And the multiplex system designed for A. gigas allows an accurate, faster, cost-effective, and affordable genotyping for low-income laboratories and conservation studies focusing on this species.
The set of 12 loci was able to measure the genetic variability of the investigated population, providing high statistic power data, sufficient for determining kinship patterns for population attribution tests. Therefore, this multiplex system can represent a valuable and powerful tool for small and large-scale studies in the areas of forensic/conservation biology and population genetics, furthermore, it can also be used as a tool for the management of wild and cultivated populations of A. gigas.

Bioinformatics analysis
Simple Sequence Repeat Identification Tool (SSRIT) [39] was used to find all simple sequence repeats (SSRs). We changed the default parameters related to motif-repeat in the source code of SSRIT, which was made in Perl, to specify motif length and the minimum number of repeats ( Table 3). The other attributes remained with their default values.
Microsatellites sequences counted from all samples generated by the SSRIT were processed by in-house scripts in Python (v. 3.7.2) to insert the flanking regions in the 5' and 3' portions of DNA sequence, and transform them in a "doc.fasta" file. After, the sequences which exhibited microsatellites were submitted to Primer 3 (v. 4.1.0), whose premise is design PCR primers from DNA sequences. In Primer 3, the default parameters of primer length, (|Primer Opt Size = 20|, |Primer Min Size = 18|, and |Primer Max Size = 23|) were modified to 24, 20, and 28, respectively. The parameters of primer product size range were modified for 50-1200 nucleotides.
The microsatellites were individually filtered by abundance and number of repetitions per class, and subsequently mapped in regions of intron (intergenic region) and regions of exon (exonic regions). The ones which are not present in any of the previous regions were classified as belonging to the genome. The development of all figures was made using Matplotlib (v. 3.0.1), which is a Python 2D plotting library. We collected 2 g of muscle tissue of 30 specimens of A. gigas from a fishing vessel landed at the city of Santarem, localized in the Low Amazon mesoregion, Brazil. The samples were preserved in 70% ethanol and posteriorly stored in -20˚C.

Validation of the multiplex microsatellite system
Total genomic DNA was extracted from the digested tissue in proteinase K solution/ Sodium Dodecyl Sulfate (SDS) and purified in Phenol/Chloroform, followed by precipitation in Isopropanol [56]. The DNA concentration was measured in the NanoDrop ND1000 spectrophotometer (Thermo Scientific).

Selection of microsatellites and polymorphism testing.
After assembling the broad microsatellite panel for A. gigas, a total of 12 microsatellite markers were selected to compose the multiplex system, according to the following criteria: tetranucleotide microsatellites, with at least 15 repeats and at maximum 20 repeats, outside of coding regions. The polymorphisms of the 12 selected loci were tested in 7% polyacrylamide gel electrophoresis, stained with silver nitrate.

Primer testing and genotyping.
The possibility of formation of secondary structures among the primers was tested in AutoDimer Software [57]. A PCR reaction consisting of the simultaneous amplification of 12 markers was standardized to a final volume of 9.5 μL,  Table 1. The reactions were optimized to amplify 5 ng of genomic DNA.
Amplification reactions were performed in a Veriti thermocycler (Applied Biosystems). The thermocycling conditions were: initial denaturation at 95˚C for 15 min, followed by 10 cycles at 94˚C for 30 s, 60˚C for 90 s, and 72˚C for 60 s; 20 cycles at 94˚C for 30 s, 58˚C for 90 s, and 72˚C for 60 s, and a final extension at 72˚C for 60 min, 10˚for min. Then, 1 μL of the amplification resulting solution was mixed with 8.5 μL of Hi-Di deionized formamide (Applied Biosystems), and 0.5 μL of GeneScan 500 LIZ (Applied Biosystems) as a molecular weight standard. The final product was analyzed using an ABI 3130 Genetic Analyzer (Applied Biosystems). The determination of fragment size and allele designation was done with the GeneMapper 3.7 software (Applied Biosystems).

Statistical analysis
The dataset was checked for genotyping errors and null alleles using Micro-Checker [58]. We analyzed the genetic variability using the allele number per locus (N A ), the observed (H O ) and expected (H E ) heterozygosity indexes, and the deviation from Hardy-Weinberg equilibrium (HWE), using Arlequin 3.5.1.257 [59], followed by Bonferroni's correction [60]. The same program was used to determine the proportion of locus pairs in linkage disequilibrium (LD).
The inbreeding coefficient (F IS ) was calculated in GENEPOP [61]. The polymorphism information content (PIC), the power of discrimination (PD), and the power of exclusion (PE) for all markers using the forensic statistic tool FORSTAT [62].