Population genetic structure of Guiana dolphin (Sotalia guianensis) from the southwestern Atlantic coast of Brazil

Sotalia guianensis is a small dolphin that is vulnerable to anthropogenic impacts. Along the Brazilian Atlantic coast, this species is threatened with extinction. A prioritized action plan for conservation strategies relies on increased knowledge of the population. The scarcity of studies about genetic diversity and assessments of population structure for this animal have precluded effective action in the region. Here, we assessed, for the first time, the genetic differentiation at 14 microsatellite loci in 90 S. guianensis specimens stranded on the southeastern Atlantic coast of the State of Espírito Santo, Brazil. We estimated population parameters and structure, measured the significance of global gametic disequilibrium and the intensity of non-random multiallelic interallelic associations and constructed a provisional synteny map using Bos taurus, the closest terrestrial mammal with a reference genome available. All microsatellite loci were polymorphic, with at least three and a maximum of ten alleles each. Allele frequencies ranged from 0.01 to 0.97. Observed heterozygosity ranged from 0.061 to 0.701. The mean inbreeding coefficient was 0.103. Three loci were in Hardy-Weinberg disequilibrium even when missing genotypes were inferred. Although 77 of the 91 possible two-locus associations were in global gametic equilibrium, we unveiled 13 statistically significant, sign-based, non-random multiallelic interallelic associations in 10 two-locus combinations with either coupling (D' values ranging from 0.782 to 0.353) or repulsion (D' values -0.517 to -1.000) forces. Most of the interallelic associations did not involve the major alleles. Thus, for either physically or non-physically linked loci, measuring the intensity of non-random interallelic associations is important for defining the evolutionary forces at equilibrium. We uncovered a small degree of genetic differentiation (FST = 0.010; P-value = 0.463) with a hierarchical clustering into one segment containing members from the southern and northern coastal regions. The data thus support the scenario of little genetic structure in the population of S. guianensis in this geographic area.


Introduction
The Guiana dolphin, Sotalia guianensis, is a small dolphin of the Delphinidae family [1], distributed primarily along the tropical and subtropical Atlantic coast of South and Central America [1,2]. The north and south limit records are in La Mosquitia, Honduras, and Florianópolis, Brazil, respectively [3,4]. There are records from Central to South America including Nicaragua, Costa Rica, Panama, Venezuela [5], Colombia [6], Guiana [7], Suriname [8], French Guiana [9] and Trinidad and Tobago [10]. Despite being distributed in the coastal region, the Guiana dolphin is commonly found in more protected areas, such as estuarine and bay regions [11]. There is no evidence of significant discontinuity in its distribution, although in many regions individuals are rarely seen; they have not been observed in some areas, but they have never been specifically sought there [12].
The Brazilian Institute of the Environment and Renewable Natural Resources stated through the Brazilian Aquatic Mammals Action Plan that knowledge of the population genetic diversity of cetaceans is a priority for the development of management and protection strategies [12]. In 2012, the International Union for Conservation of Nature (IUCN) recommended, as a conservation priority, the assessment of genetic diversity of species [13]. Notably, the IUCN classified S. guianensis as a data deficient (DD) species, reflecting the scarcity of studies on anthropogenic impacts. In 2014, the Brazilian Ministry of the Environment included S. guianensis in the list of species threatened with extinction and categorized it as a vulnerable species, following the recommendation of the Chico Mendes Institute for the Conservation of Biodiversity, ICMBio, Brazil [14].
Typing nuclear DNA polymorphic loci has been widely used in population genetics in many mammals. Multiallelic microsatellite loci are the most frequently genotyped in cetacean species . The population-genetic studies on Guiana dolphin in the coast of Brazil have been limited to two reports. Using mitochondrial DNA haplotypes, one study [33] characterized six different state management units: Pará, Ceará, Rio Grande do Norte, Bahia, Espírito Santo, and in the southeast coast from the Rio de Janeiro to Santa Catarina states. Using microsatellites, one study [34] found low genetic differentiation between populations from the states of São Paulo and Rio de Janeiro.
The aim of the present study was to assess the degree of genetic differentiation at 14 microsatellite loci in 90 specimens of Sotalia guianensis stranded in the southwestern Atlantic coast of the State of Espírito Santo, Brazil, a coastal region that had not previously been sampled. We uncovered a small degree of genetic differentiation and hierarchical clustering into one segment containing memberships from the south and north coastal regions. The data thus support the scenario of little genetic structure in the population in this geographic area.

Ethics statement
Specimen collection was carried out under authorizations from the Chico Mendes Institute for the Conservation of Biodiversity-ICMBio (URL: http://www.icmbio.gov.br/portal/) with licenses #20264/2 and #29363/4 to one of the authors (LAB) from the Institute Organization and Environmental Consciousness (ORCA) headquarters in the cities of Guarapari and Vila Velha, Espírito Santo, Brazil. The ORCA and the Universidade Federal do Espírito Santo institutional boards approved the study. municipality of Conceição da Barra (18˚35 0 34@S 39˚43 0 55@W), to the extreme south, in the city of Presidente Kennedy (21˚05 0 56@S 41˚02 0 48@W). The collection localities for each specimen are provided in S1 Table. Fragments of muscle tissue were sampled at necropsy, frozen or preserved in 70% alcohol and stored at -20˚C. Samples were transferred to the Genetics and Animal Conservation Laboratory of the Universidade Federal do Espírito Santo for extraction of total genomic DNA using the salting-out method [42]. DNA was quantified using a NanoDrop 2000c UV Spectrophotometer (Thermo Scientific, Wilmington, DE, USA).

Microsatellite genotyping
A set of 14 microsatellite loci was chosen based on population parameters previously reported in genetic studies in Sotalia spp. (five loci [43]), Tursiops spp. (six loci [20,44]), Inia spp. (two loci [45]) and Megaptera spp. (one locus [34]). Samples were genotyped for SRY (chromosome Y) and ZFX/ZFY (chromosomes X and Y) genes to score gender, using primer sequences reported in the literature [46] and further tested in this study. Alleles were amplified by Quantitative Fluorescence Polymerase Chain Reaction (QF-PCR) assays. S2 Table lists (i) the microsatellite repeat units reported in the literature; (ii) the estimated repeat unit number found in nucleotide databases using the In-silico PCR [47] and the Primer-BLAST [48] online programs, available at the University of California, Santa Cruz (UCSC) and National Center for Biotechnology Information (NCBI) genome browsers, respectively; and (iii) the primer pair sequences and the QF-PCR assay conditions. DNA amplification was performed in a GeneAmp1 PCR System 9700 thermocycler (Applied Biosystems, Foster City, CA, USA). Typically, a reaction mixture contained 20 ng of DNA, 0.16-2.4 μM of each primer, 2 mM MgCl 2 and 0.5 U Taq polymerase in 12.5 μL. The amplification conditions were as follows: 95˚C for 11 min; 28 cycles of 95˚C for 1 min, 58-59˚C for 1 min, and 72˚C for 1 min; and 60˚C for 60 min. Amplimers were analyzed by high-performance capillary electrophoresis in an ABI 310 Genotyper (Applied Biosystems) using the POP-4 polymer. Injection reactions typically consisted of 0.55 μL of amplimer(s), 9.0 μL of Formamide Hi-Di Formamide and 0.1 μL GeneScan ™ 500LIZ1 Size Standard molecular weight ladder, all reagents from Applied Biosystems. Allele profiles were analyzed using GeneMapper ID v3.2 software (Applied Biosystems). We sequenced by the Sanger method at least one amplimer for each of the Sota-10, Sota-11, Sota-12 and Sota-13 loci, microsatellite loci that had not previously been tested in Sotalia spp., to determine the number of repeat units.

Microsatellite mutability estimates
Given that there is no information about the rates of mutation for any of the microsatellite loci genotyped in the present study, we used the scoring method applied for human microsatellite loci [49]. Briefly, we measured the values for four estimates of the levels of mutability that correlate positively with mutation rate: allele span, the number of alleles per locus, expected heterozygosity (H E ) and locus diversity (h locus ) [49,50]. The values were multiplied, and the products were ranked by their ratio with the highest score. The locus diversity was calculated where n is the number of samples, k is the number of alleles, and x i is the frequency of the i-th allele [51].

Chromosomal coordinate conversion and synteny map
To investigate whether the microsatellite loci are linked in syntenic blocks, we first used BLAT analysis [52] with homologous and heterologous primer sets (S2 Table) to retrieve the Tursiops truncatus sequences from both the bulk nucleotide and reference genome reads available from the Database Resources at NCBI [53]. The structures of the repeats were determined from the ortholog sequences using the online Tandem Repeat Finder program [54]. The loci were validated computationally using the In-Silico PCR tool of the online visualization interface of the UCSC Genome Browser [47]. The In-Silico PCR tool searches a sequence database with a pair of PCR primers, using an indexing strategy for fast performance. The tool also provides the contig or chromosomal coordinates of the amplimer. The contig data were migrated from the T. truncatus assembly (Baylor Ttru_1.4/ turTru2 [55]) to the Bos taurus assembly (bosTau8 UMD 3.1.1 cow assembly [56]) using the Convert utility, which is accessed from the menu on the UCSC Genome Browser annotation tracks page. The Convert utility locates the position of a feature of interest in a different release of the same genome or a genome assembly of another species and provides the percent identity and the coverage in base pairs within the converted coordinates. To facilitate access to these provisional map conversions, we customized interactive sessions at the UCSC Genome Browser. The hyperlinks to the custom tracks are available in S5 Table. Genetic differentiation and population structure analysis We performed a descriptive statistical analysis for all the microsatellite loci genotyped. The number of alleles per locus (Na), minimum and maximum frequency, observed heterozygosity (H O ), expected heterozygosity (H E ), polymorphic information content (PIC), and power of discrimination (PD) were calculated using Power Stat v.12 [57]. No resampled individuals were identified by comparing genotypes using the CERVUS 3.0 software [58]. Statistical significances of deviations from Hardy-Weinberg equilibrium (HWE) were calculated using the exact Fisher test with 30,000 shufflings (randomizations) and adjusted with the Holm-Sidak step-down method. Genotypes were tested for global gametic (linkage) disequilibrium using the Genetic Data Analysis (GDA) 1.0 software [59]. The intensity and significance of coupling and repulsion non-random multiallelic interallelic associations were determined using the Multiallelic Interallelic Disequilibrium Analysis Software v.1 (MIDAS) [60] according to the methodology described in [49]. The strength of sign-based overall disequilibrium for the twolocus combinations was determined using the formulas worked in Ref. [61]. Population structure analyses were performed using Wright F Indexes [62] and the bootstrapping method, with 30,000 random repeats and a 95% confidence interval, assuming HWE, in the GDA software [59]. Private alleles were identified using GDA. The Bayesian clustering analysis was performed using the STRUCTURE 2.3.3 computer software package [63][64][65]. We applied the admixture model for correlated allele frequencies (omitting the collection locations of the specimens), setting the possible number (K) of clusters from 1 to 10, with a burn-in period of 100,000 and 500,000 Markov Chain Monte Carlo (MCMC) generations and 50 iterations. Nei's genetic distances were calculated using the GeneAlex 6.5 Office Excel extension [66], and the distance matrix was used in the MEGA V7.0.14 program [67] to generate a dendrogram by the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) hierarchical clustering method.

Population parameters and genetic diversity of microsatellite loci
The collection localities along the coast of the State of Espírito Santo, Brazil, are mapped in Fig  1. We typed genomic DNA samples from 90 Sotalia guianensis specimens with 14 microsatellite loci. The majority (69/90; 76.6%) of the biological samples were from male specimens as assessed by genotyping with DNA markers for the ZFX/ZFY and SRY genes (S2 Table). The individual genotypes are in S1 Table. Population parameters and genetic diversity estimates for the microsatellite loci are summarized in Table 1. The overall mean success rate of amplification was 70.6% (889/1,260 PCR analyses; expected rate = 90 individuals x 14 loci). All microsatellite loci were polymorphic, with at least three and a maximum of ten alleles observed per locus (mean number of alleles was 5.6). Allele frequencies ranged from 0.01 to 0.97. The most frequent alleles were Sota-11 Ã 186 (0.97) and Sota-02 Ã 208 (0.87) (S3 Table). If the specimens were to be assigned heuristically to either the southern (n = 28) or northern (n = 62) coastal regions according to the State midline, eight loci would exhibit at least one private allele, with frequencies < 0.07 (S4 Table). In this investigative scenario, 12 private alleles occurred in the specimens collected from the southern coastal region and just one from the northern area. Observed heterozygosity varied from 0.061 to 0.701, and the expected heterozygosity ranged from 0.06 to 0.81. Sota-03 exhibited the highest expected mutability level (score = 1.000), followed by Sota-12 (score = 0.908), while Sota-11 had the lowest level (score = 0.0015) ( Table 2). Therefore, the least informative locus was Sota-11. The highest inbreeding coefficient F value was 0.266 for Sota-04, and the mean coefficient was 0.103. Three loci (Sota-01, Sota-03, and Sota-12) were in Hardy-Weinberg disequilibrium even when missing genotypes were inferred. We note that the Sota-10 through Sota-14 loci had not been genotyped previously in S. guianensis. To determine the number of repeat units for those microsatellite loci, we sequenced at least one allele for each locus from Sota-10 through Sota-13. The Sota-10 216 bp allele corresponds to [CA] 24

Chromosomal mapping of microsatellite loci by analysis of synteny
At present, no draft of the nuclear genome sequence for Sotalia spp. is available for chromosomal mapping of the genetic markers used in this study. There is, however, a genome draft for the common bottlenose dolphin Tursiops truncatus (Baylor Ttru_1.4/ turTru2) [55]. The diploid number of chromosomes in T. truncatus is 42,XX or 42,XY [68]. No chromosomal or genetic maps are available for that species. Non-random interallelic forces among physically linked loci may influence population parameters. Therefore, to infer the physical proximity of  [56]) reasoning that related species are more likely to share syntenic blocks. We chose the cow assembly because it represents the reference genome available for a terrestrial mammal that is most closely related to the Delphinidae [69]. The strategy intended, first, to determine the extent of sequence homology between the In-Silico PCR retrieved amplimers from Tursiops truncatus and, second, to map by BLAT conversion the physical coordinates of the orthologous contigs in the Bos taurus reference genome. The orthologous contig identity ranged from 92.4% (Sota-05) to 42.2% (Sota-02) (S5 Table). Thirteen microsatellite loci were provisionally mapped in this way to the cow reference genome assembly. Five loci mapped to chromosome 5 and two others to chromosome 2. Sota-07 shares significant homology to unmapped contig sequences. The derived provisional synteny map for the microsatellite loci is shown in Fig 2. Global gametic disequilibrium and intensity of non-random interallelic associations Significant global gametic disequilibrium was limited to 14 out of the 91 possible two-locus combinations ( Table 3). The number of two-locus combinations varied from 13 when the missing genotypes were disregarded to 9 when they were inferred. We note that the two-locus combinations involving the microsatellite loci that are syntenic on chromosome 5 (i.e., Sota-02, -05, -06, -08 and -10) were in global gametic equilibrium. On the other hand, Sota-04 and Sota-12, which are syntenic on chromosome 2, showed global gametic disequilibrium when missing data were inferred. Recombination events represent an important evolutionary process determining gametic equilibrium. Thus, we measured interallelic D´coefficients between all possible two-locus combinations to uncover coupling (D'(+)) or repulsion (D'(-)) non-random interallelic forces at disequilibrium. Twelve possible two-locus combinations exhibited at least one significant interallelic association (Table 4). Thus, ten of those combinations were at apparent global equilibrium. The intensity and significance of the sign-based gametic disequilibrium and the allele pairs involved are shown in Table 4. In total, 15 statistically significant, non-random multiallelic interallelic associations were observed, 12 with coupling (D' values ranged 0.782 to 0.353) and 3 with repulsion (D' values -0.517 to -1.000) forces. Except for one allele pair in the Sota-05/Sota-13 two-locus combination, the interallelic associations did not involve the major alleles from both loci. The only syntenic two-locus non-random interallelic association observed was between Sota-02 Ã 208 bp and Sota-05 Ã 232 bp on chromosome 5, and the allele pair included the most frequent Sota-02 allele. Population genetic structure analysis To evaluate the occurrence of possible patterns in the genetic composition of the 90 stranded Guiana dolphin specimens, we analyzed the genotypes in three ways with a heuristic model based on localization to designate the specimens to either the southern or northern coastal regions (Fig 1). First, we performed fixation index F-statistics to measure the degree of genetic differentiation (F ST = 0.010; P-value = 0.463; 95%CI: -0.000-0.026, for 30,000 random replicates). Second, we employed Bayesian clustering analysis to reveal that all the genotypes clustered in one segment with no significant separation between southern and northern designations. One segment was observed by setting the possible number (K) of clusters from 1 to 10 (Posterior probabilities ranged from 1 to 0.1). Lastly, we estimated the Nei's genetic distances at the 14 microsatellite loci, grouped the individual genotypes by similarity using hierarchical clustering, and displayed the similarity in a dendrogram (Fig 3). The analysis showed that the individuals partitioned into two hierarchical clusters. Nevertheless, both clusters comprised specimens with memberships in the southern and the northern coastal regions. There was no apparent biological aspect in the dataset that represented this hierarchical partition.

Discussion
We show that a sampling of 90 Guiana dolphins stranded in the Atlantic coastal area of the State of Espírito Santo, Brazil, composes a population with little genetic structure. The evidence is three-fold: a low degree of genetic differentiation, low inbreeding coefficients, and clustering into one segment containing members from the southern and northern coastal regions. Our study is the first to assess the genetic diversity of Sotalia guianensis at microsatellite loci in this coastal area. A previous survey with 58 S. guianensis samples from the coastal areas of the States of São Paulo and Rio de Janeiro, Brazil [34], also showed low genetic differentiation coefficient (F ST = 0.04) at ten microsatellites, five of which were also genotyped in our study. Table 3. Two-locus combinations that exhibited significant global gametic disequilibrium. Decreased locus diversity is often seen when using heterologous primer sequences (i.e., designed for one species and used in another) [70]. Here, we used nine heterologous primer sets, and only one (Sota-11) yielded low genetic diversity. Altogether, the population parameters at nine loci were consistent with the data reported in three other studies of S. guianensis that used the same primer sets [34,43,45]. We note that in our biological samples, the Sota-11 locus exhibited only three alleles with an allele span of 186-206 bp (equivalent to 10 [CA] repeat units). In contrast, in T. truncatus, the same locus exhibited eight alleles [20]. Our data indicate that the Sota-11 locus has the lowest estimated rate of mutability in S. guianensis.

Two-locus combination P-value (a) GD P-value (b) GD Chromosome pair Synteny
Genetic studies in other dolphin genera (Tursiops truncatus, Tursiops aduncus, Cephalorhynchus eutropia, and Stenella frontalis) have reported F ST values ranging from 0.034 to 0.20 with varying sample sizes [31, [39][40][41]71] and coverages through short and long geographic distances [37,72]. However, those values cannot be compared because they refer to species with diverse ecologies, social structures, and evolutionary histories.
The clusters created by STRUCTURE can be affected by variability in sample size [73]. We performed an average of 588 analyses (mean number of subjects scored = 42 x 14 loci) for the southern coastal population subset and 301 analyses (average number of subjects scored = 21.5 x 14 loci) for the northern population subsets. We believe, for the following reasons, that the apparent lack of structure in our population study cannot be ascribed to the small number of either individuals or loci scored. First, for microsatellite-based population genetic studies, the typing of 25 to 30 individuals per population is enough to estimate allele frequencies accurately [74]. Second, the occurrence of private alleles increases as a function of the genetic Table 4. Intensity and significance of sign-based gametic disequilibrium between two-locus combinations.  [75]. When we consider the heuristic designation of the specimens to either southern or northern possible population subsets, just one private allele was detected in the northern region population subset, compared with 12 possible private alleles in the southern subgroup. The observed highly skewed distribution does not support a potential history of fragmentation and isolation.

Two-locus combination P-value (a) P-value (b) Allele pair Samples D'(+)
Other factors, however, may influence the structure of a cetacean population: the distribution of prey [76,77], social behavior [78], use of preferential habitats [79], and habitat discontinuities due to environmental characteristics [39,80]. Unfortunately, no reports on such variables are available for the coastal region covered in our study, which impaired a fully comprehensive analysis. We note a significant (chi-squared test, P value = 4.20039E-07) 3-fold excess of male specimens in our samples. This imbalance may be due to anthropogenic actions, such as fishing activities. The majority of dolphins in fishing-net accidents are young and male [81], which increases the number of male animals found on beaches.
A second important aspect of our study addresses the prospective application of the syntenic map of the microsatellite loci for kinship analyses. It is evident in other biological systems [49,82] that measuring global gametic disequilibrium alone is insufficient to define the evolutionary forces at equilibrium for either physically or non-physically linked loci. We showed that eleven of the 91 possible two-locus combinations that were in apparent global equilibrium exhibited at least one significant, sign-based non-random multiallelic interallelic association. For the five loci that are syntenic on chromosome 5, only one significant non-random interallelic association was detected, eventually compromising their combined use for estimating the power of discrimination [83]. In contrast, the two syntenic loci on chromosome 2 did not exhibit significant interallelic associations, supporting the view that these two syntenic loci may segregate independently. We therefore recommend measuring the intensity and significance of coupling and repulsion non-random multiallelic interallelic associations for future parentage-based group composition and dispersal pattern studies of cetaceans.
Supporting information S1