Spatial population genetic structure of Caquetaia kraussii (Steindachner, 1878) evidenced by species-specific microsatellite loci in the middle and low basin of the Cauca River, Colombia

The adaptative responses and divergent evolution shown in the environments habited by the Cichlidae family allow to understand different biological properties, including fish genetic diversity and structure studies. In a zone that has been historically submitted to different anthropogenic pressures, this study assessed the genetic diversity and population structure of cichlid Caquetaia kraussii, a sedentary species with parental care that has a significant ecological role for its contribution to redistribution and maintenance of sedimentologic processes in its distribution area. This study developed de novo 16 highly polymorphic species-specific microsatellite loci that allowed the estimation of the genetic diversity and differentiation in 319 individuals from natural populations in the area influenced by the Ituango hydroelectric project in the Colombian Cauca River. Caquetaia kraussii exhibits high genetic diversity levels (Ho: 0.562–0.885; He: 0.583–0.884) in relation to the average neotropical cichlids and a three group-spatial structure: two natural groups upstream and downstream the Nechí River mouth, and one group of individuals with high relatedness degree, possibly independently formed by founder effect in the dam zone. The three genetic groups show recent bottlenecks, but only the two natural groups have effective population size that suggest their long-term permanence. The information generated is relevant not only for management programs and species conservation purposes, but also for broadening the available knowledge on the factors influencing neotropical cichlids population genetics.


Introduction
Cichlids are one of the most diverse families, having 1756 valid species distributed in 257 genera, and having a notable presence in Africa, and Central and South America [1].Their remarkable divergent evolution and fast adaptation to different environments make them an evolutionary model for researching their biological properties [2][3][4][5][6].Their adaptive radiation may be the result of the combination of intrinsic factors and the environment, namely, the interaction between morphological or behavioral innovations and ecological opportunities [6][7][8][9].Furthermore, plasticity in genetic, morphological, and reproductive features allow this species to survive in different environments, take advantage of ecological opportunities and, ultimately, to diversify [10][11][12][13].Another important factor is hybridization, which has provided the species with advantageous genetic variability to face the environmental changes, differentiate or remain in their surroundings [6,14,15].
Most cichlids have shown particular life histories such as parental care [16], assortative mating [17][18][19][20][21] and a tendency to sedentarism [22,23], features that influence the genetic diversity patterns of various sea and freshwater fish [24].Namely, fish species that reach maturity at an early age show higher levels of genetic variability than the species that reach maturity later [24].Besides, life expectancy is inversely correlated to genetic diversity as individuals with lower life expectancy often have higher genetic diversity [25,26].Moreover, individuals that migrate and reproduce with distant populations exhibit gene flow and, thus, high genetic diversity [24,25].
On the other hand, sedentary species and/or species having parental care with egg or larvae retention display little to no gene flow which may lead to genetic isolation among populations if the effective population size is not large enough [27].As for other species, habitat-preference population isolation in cichlids may be influenced by environmental heterozygosity or geographical barriers [28].Interestingly, some cichlids avoid reproducing with close relatives spreading to habitats occupied by unknown individuals, which allows preventing inbreeding and keeping the genetic diversity [29].
To contribute to the knowledge of neotropical cichlids population genetics, this study selected yellow 'mojarra' Caquetaia kraussii (Steindachner, 1878), one of the 104 cichlid species described for Colombia and one of the four species registered in the Magdalena-Cauca basin [37].This species, also naturally distributed in the basins of rivers Atrato, Sinu ´and Maracaibo, is subject to artisanal fishery for human consumption, performs important ecosystem services like redistribution and maintenance of sedimentologic processes [38], and preferentially habits swamps, ponds, and undisturbed waters in the lower areas of the rivers and streams in altitudes of up to 500 m s. n. m. [39].Regarding reproductive features, C. kraussii shows an average generational time of eight months [40], has partial spawning events through the year in the Atrato [41] and Sinu ´ [42] rivers, an equilibrium reproductive strategy associated with low relative fecundity, parental care, and sedentary tendency [42,43].Furthermore, C. kraussii is in the Ituango hydroelectric project (PHI, for its acronym in Spanish, from Bolombolo of Venecia, Antioquia to Pinillos, Bolı ´var) influence area.This project encompasses the last 500 km of the Cauca River, which is 1350 km long and has a 63,300 km 2 area [44].Some studies have noted that equilibrium strategy species like C. kraussii are threatened for their habitat fragmentation due to hydroelectric constructions [45]; nonetheless, the effects of these anthropogenic activities may vary since downstream there is a reduction in the lateral and longitudinal conductivity and the flow regime stabilizes while the conditions in the dam change from lotic to lentic, increasing the thermal stratification [46].Historically, this area has been exposed to the negative influence of other anthropogenic factors such as water contamination, livestock, fishing, and mining [47], that may explain the genetic bottlenecks in various species sampled prior to the hydroelectric construction [48][49][50].
The impact of hydroelectric plants on cichlids in neotropical environments is poorly documented and fish responses appear to depend on their life history [35], the duration of fragmentation [31], and potential impacts resulting from hybridization between invasive or introduced species [34].For example, Gymnogeophagus setequedas, an endangered cichlid that seems to prefer fast waters, was reported to have disappeared after the construction of the Itaipu reservoir [51].However, it exhibited gene flow in lotic environments of the Iguac ¸u River basin [35].For populations fragmented for approximately 17 years or longer, Geophagus aff.brasiliensis and other neotropical non-migratory fish species show significant genetic structure [31].Recent evidence of hybridization between two introduced species in a reservoir (Cichla ocellaris var.kelberi and Cichla piquiti) and the high genetic diversity found using microsatellite loci raises concerns about its indication of a possible increase in local adaptability that could enhance establishment success in novel areas [34].
Namely, this study aimed to provide a response to C. kraussii genetic diversity and structure related questions in the medium and lower sections of the Cauca River.Since C. kraussii is in strongly anthropogenic pressured habitats and is a sedentary species with parental care, the a priori expectation was for it to exhibit low genetic diversity and spatial structure in the PHI influence area.To contrast these hypotheses, microsatellite loci were developed as a molecular tool to assess this species genetic diversity and structure.This approach was used for avoiding heterologous loci-related genotyping errors [52][53][54][55], considering that to date microsatellite loci have been developed for phylogenetically distant neotropical cichlids like Amphilophus cichlasoma [56], Symphysodon discus [57], Astronotus crassipinis [58], Cichla piquiti [59], Cichla monoculus [60], Cichla temensis and Cichla orinocensis [61], Geophagus brasiliensis [62] Apistogramma agassizii [63], Apistogramma gephyra [64] and Pterophyllum scalarae [33].

Study area
This study assessed fin and muscle tissues preserved in ethanol 95%, obtained from 319 C. kraussii individuals captured in different sections of the Cauca River between 2020 and 2022 (Fig 1) by Universidad de Antioquia, Universidad de Co ´rdoba, and Universidad Nacional de Colombia Sede Medellı ´n, through scientific cooperation agreement CT-2019-000661, under environmental license # 0155 of January 30th, 2009, from Ministry of Environment, Housing and Territorial Development for the Ituango hydroelectric construction.These sections previously identified [65] include the medium (PHI: 100 samples) and lower (S4, S5, S6, S7 and S8: 21, 43, 80, 49 and 26 samples, respectively) sections of the Cauca River.The PHI section, which is 46 km long, corresponds to the dam zone, a lentic system that before the hydroelectric construction had rapids and strong streams.The remain sections (S4: 38 km, S5: 61 km, S6: 78 km, S7: 29 km, S8: 17 km) are downstream the dam and comprise lentic (swamps) and lotic (streams and rivers) systems in a floodplain influenced by the Nechı ´River mouth.
Prep Kit, and sequencing was performed through Illumina MiSeq, generating 300 base paired end reads.Then, raw read cleaning was carried out with CUTADAPT v2. 10 [67], sequences containing the microsatellites were selected with PAL_FINDER v0.02.03 [68], and PRIMER3 v2.0 [69] was used for the primer designs in the flanking sequences of the microsatellite.Subsequently, the correct evaluation of the primers was tested in silico with electronic PCR [70].

Population genetics analysis
Prior to assessing C. kraussii genetic diversity, ARLEQUIN v3.5.2.2 [72] was used for evaluating expected (He) and observed (Ho) heterozygosities, inbreeding coefficient (F IS ), and departures from Hardy-Weinberg equilibrium and linkage disequilibrium among loci, applying, in the latter, the sequential Bonferroni correction for the statistical significance in the multiple comparisons [73].BayeScan v2.1 [74] was used for identifying outlier loci presence.The number of alleles per locus (Na) and the allelic range (Ra) were calculated with GenAlex v6.51 [75] and the polymorphism information content (PIC) was determined using Cervus v3.0.7 [76].GENEPOP 4.7 [77,78] was used for finding the multilocus values (across-loci) and FSTAT v 2.9.4 [79] for the allelic richness (Ar).The correlation analysis between He, Ar, and estimators of genetic differentiation (described below) with the physical (km) or geographical distances of the sections of the Cauca River mouth in the Magdalena River was performed using Pearson correlation with the R-packages ecodist and GGPLOT2 [80,81].
The relatedness coefficient (rxy) estimation among all possible pair of individuals was analyzed using the R-related package [82].In absence of a pedigree, relatedness inferences for non-model species were based in simulated relatedness measures from empirical data [83].Initially, the best estimator between maximum likelihood (Dyadml in [84] and Trioml in [85]) and non-maximum likelihood (Lynchli in [86]; Lynchrd in [87]; Quellergt in [88]; Wang in [89]) estimators was determined from simulated relatedness values among 100 pairs.The best estimator for analyzing the data was the one having relatedness estimations with the highest correlation between the simulated data from the empirical allelic frequencies and the theorical values, in each of the four relatedness categories: unrelated (UR, rxy = 0.00), half sibs (HS, rxy = 0.25), full sibs (rxy = 0.50) and parent-offspring (PO, rxy = 0.50).Once the best estimator was selected for each species, pairwise relatedness was estimated.The same estimator was used for calculating individual inbreeding with COANCESTRY [90].
Genetic structure was determined through the analysis of molecular variance (AMOVA) and calculation of estimators F' ST [91,92] and Jost's D EST [92,93] using GenAlEx v6.51b2 [75].The Bonferroni correction [73] was applied for the statistical significance of the estimators.Furthermore, a discriminant analysis of principal components (DAPC) was performed using the R-package Adegenet [94], with 32 principal components (a-score: 0.422; explaining 60% of total variation) and six discriminant functions retained.Other approach included a Bayesian analysis in STRUCTURE 2.3.4 [95], with 1,000,000 Monte Carlo Markov chains with 100,000 regarded as the burn-in period, admixture model and correlated allele frequencies as a priori models.This analysis was repeated 20 times for each K, assuming 1 to 10 K.Then, Struc-tureSelector [96] was used for determining the best estimation of K value based on Puechmaille [97] estimators MEDMEANK, MEDMEDK, MAXMEANK and MAXMEDK, ΔK [98] and Ln Pr (XlK) [99], and the integrated software Clumpak [100] was utilized for graphically representing the results.Individuals were assigned to their respective genetic stocks in accordance with the co-ancestry estimators and they were submitted to population genetics analysis following the above-mentioned methodology.
Lastly, GENECLASS2 [109] was used for testing whether an individual resides in the sampled site or is immigrant from another section.Analysis was carried out with the Bayesian method [110], and simulations with the Monte Carlo resampling method [111] with 16 loci, 10,000 and 0.01 as threshold of type I error.Likewise, the BA3MSAT extension of the BAYE-SASS software [112,113] was employed for the recent estimation of gene flow between sections, utilizing 50,000,000 MCMC, with a burn-in of 5,000,000 and sampling intervals of 5,000.Delta values for migration rates (deltaM), allele frequencies (deltaA), and inbreeding coefficients (deltaF) were set at 0.49, 0.39, and 0.46, respectively.Convergence was assessed using the Tracer v1.7.2 program [114].
Furthermore, Bayescan posterior probability (PO) values for the loci detection under selection (S5 Table) evidenced that the only paired comparison showing one locus under selection (Ckra 21; reference value > 0.76) was S4-S5 v. S6-S7-S8.Moreover, another parameter found, Alpha, had a value of -1.234 which suggests the existence of balancing or purifying selection.The results presented below are based on 16 loci, as consistent findings were obtained when including or excluding this locus in subsequent analyses.

Population genetic differentiation
AMOVA (F ST = 0.935, P value = 0.001) and pairwise comparisons of the standardized statistics F' ST and D EST showed significant differences among sites (Table 3), evidencing that all sections are genetically different from each other, except for S7 and S8 in both tests.Both the DAPC (Fig 2 ) and the Structure population assignment Bayesian analysis (Fig 3) suggested that C. kraussii is formed by three or four genetic groups (ΔK = 2; MedMed = 3; MedMean, MaxMed, MaxMean = 4; Mean LnP(K) = 3).Ultimately, three stocks were determined: PHI, stock S4-S5 and stock S6-S7-S8.

Population genetic demography
Bottleneck tests summarized in Table 4 were significant in all sites (except S4) under the IAM; S6 and S7 and genetic group S6-S7-S8 under the TPM, and none of the evaluated groups was significant under the SMM.Additionally, M ratio was below 0.680 indicating recent bottlenecks in all sites and genetic groups.Moreover, effective population sizes were superior to 1,000 in S7 (1665) and stocks S4-S5 (2220), S6-S7-S8 (5349), exhibited low values in PHI (161), S4 (49) and S6 (723) and it was not possible to precisely estimate said sizes for S5 and S8 (Table 4).As for migratory events, eight individuals were detected as potentially immigrants in the sampled sites and were assigned to the most likely origin sections; a S4 individual in S5 (p value: 0.005); two S5 individuals in S4 (p value: 0.007); a S6 individual in S7 (p value < 0.0001); and a S7 individual in S6 (p value: 0.002).Results from BayesAss (S7 Table ) showed that the highest migration rates were observed within each section, ranging from 67.3% to 98.1% (m

Discussion
This study demonstrated the presence of spatial structure and high levels of genetic diversity of C. kraussii populations in the middle and lower sections of the Cauca River.Additionally, this study provides a group of 16 C. kraussii species-specific microsatellite loci with long repetition motifs (4-mer, 5-mer) and are highly polymorphic since its PIC values > 0.5 were above the range proposed by Botstein and company [115].Both the diversity levels and the PIC values of the loci here developed were found within the range of the loci designed for other neotropical cichlids which repetition motifs are mostly short (2-mer) and compound [33,[56][57][58][59][60][61][62][63][64].Additionally, loci here developed exhibited amplification consistency and high definition of the electropherograms that ease the allele assignation and, thus, their genotypification, for which they are considered informative and appropriate for the C. kraussii population genetics study.
Spatial structure due to events such as interruption of migratory pathways are not expected for sedentary fish species, as they typically exhibit a spatial genetic structure associated with their life history.In such cases, the potential threats associated with hydroelectric constructions may be related to factors that affect fish species downstream, such as fluctuations in nutrient transportation and shifts in oxygen concentrations [116], water quality and temperature, sediment accumulation, variations in water flow, alterations in the morphology of the main river channel and swamps [117][118][119][120].The impact of these factors on the population genetics of C. kraussii remains unknown due to the lack of previous information on these subjects.However, the significant gene flow observed in various migratory fish species coexisting in this area [48][49][50][121][122][123] suggests the absence of physical or chemical barriers limiting gene flow in these species.
Because this species is considered resident, a reduction in the genetic similarity among populations is expected as the geographical distance among them increases.In this study, the genetic differentiation estimators were spatially autocorrelated in distances 2-6 times longer than the dispersion range estimated for the species, providing a strong support to the isolation by distance explanation.
Moreover, the analysis for detecting immigrants indicated that PHI show low migration rate from downstream, corroborating the restriction of gene flow among these sections.Due to the rapids and high velocities of waterflow in the zone before the hydroelectric construction, results found in this study on the genetic structure and immigration, along with the low levels of genetic diversity and Ne < 1,000, suggest that C. kraussii individuals sampled in PHI may represent a founder effect from populations that do not originate from downstream the dam.Origin of the founder population remains to be explored in future studies.Since 91.6% of the population in this section is formed by individuals that show some relatedness degree, the latter explanation seems more likely than the alternative of a fast genetic differentiation in the short-term between PHI and the remaining populations downstream, caused by new environmental conditions and habitat preferences of C. kraussii, despite cichlids appear to respond rapidly to changing conditions.Under this scenario, C. kraussii is formed by two natural stocks, S4-S5 and S6-S7-S8, that are approximately 15 km apart and separated from each other by the Nechı ´River mouth.The presence of immigrants among sections was displayed within each stock, while low migration rate between stocks was found.The low dispersion potential of the species and its larvae because of parental care, the variation in its dispersion behavior (upstream in the S4-S5 stock and downstream in the S6-S7-S8 stock), and the confluence magnitude of the Nechı ´River and the Cauca River may explain the existence of two C. kraussii stocks, one upstream and other downstream of the Nechı ´River mouth.It has been indicated that the confluence position, in addition to the river branching degree, and the asymmetric migration levels downstream influence the genetic variation patterns in the riverside populations showing an increase of 20 times the global genetic diversity in the very branched rivers and of 7 times the genetic differentiation among local populations [124].
Differences in the genetic diversity between C. kraussii and the other neotropical cichlids may be caused by typical features of their life history [126], since species that reach maturity earlier or have lower life expectancy show higher levels of genetic diversity [24,27,126].Alternatively, discrepancies may be related to differences in the effective population size, estimator that has been associated with genetic diversity [26].However, both explanations are difficult to contrast due to the limited information available for neotropical cichlids reproductive features and effective population size; for this reason, it would be convenient to advance in complete studies on its reproduction, population size, and genetic diversity to further explore the factors influencing its population genetics.
Furthermore, genetic diversity measures, He and Ar were negatively related to the distance from the Cauca River mouth to the Magdalena River since genetic diversity decreases as distance to the mouth increases.This genetic diversity distribution pattern of populations is possibly due to habitats diversity downstream the Nechı ´River mouth, which are preferred by C. kraussii and the existing gene flow in this zone, allowing the allelic exchange and maintenance of the high genetic diversity.This idea aligns with genetic data simulation in different landscapes that show that the dendritic net and the riverbed connectivity interfere in the genetic variation distribution [124,127,128].Likewise, the slope of the riverbed may also influence in the genetic variation distribution of a species [129] as in low altitudes there is more diversity of habitats ideal for C. kraussii while in medium and lower altitudes the habitats are more limited.
Although the averages of heterozygosity in C. kraussii were high, small but significant deficits of Ho were observed in stocks S4-S5 and S6-S7-S8.Since this study used species-specific microsatellite loci, individuals were organized by stocks and the F IS values were not significant, results do not appear to be explained, respectively, by the presence of null alleles, Wahlund effect, or inbreeding.Another likely explanation would be assortative mating, a behavior observed in other neotropical cichlids like in genera Geophagus [19] and Cichla [21].Nonetheless, it is important to remark that there are no available data on this behavior in C. kraussii, hence, further exploration on the reproductive behavior of this species is necessary for determining its potential role on the Ho deficit.
Detection of recent bottlenecks in C. kraussii in the influence area of the Ituango project matches those found in the same area in Prochilodus magdalenae [48], Pseudoplatystoma magdaleniatum [49], Ageneiosus pardalis, Pimelodus grosskopfii and Sorubim cuspicaudus [50], which were attributed to anthropic pressures like fishing, mining, and water contamination.These pressures may be the cause of recent bottlenecks in C. kraussii, although in this case there is an additional factor related to the habitat alteration because of the PHI dam construction.Additionally, recent bottlenecks have also been detected in other cichlid species as in Geophagus brasiliensis [22] and Geophagus aff.Brasiliensis [31], where anthropogenic impacts were appointed as the causes.These results differ from those described for Gymnogeophagus setequedas which did not show recent bottlenecks, possibly for having a large and stable population or for stablishing contact among different lineages [35].
In contrast with the effective population size in PHI (< 1,000), natural environment populations and stocks S4-S5 and S6-S7-S8 exhibited effective population sizes above 1,000, showing a high long-term evolutionary potential [130].Furthermore, the effective population size in stocks followed a genetic diversity-associated pattern (Ho, He, Ar) concordant with the indirect relationship between the effective population size and the genetic variation observed in other studies [25,131] that are influenced by maturity age and life expectancy [26].Studies on effective population sizes in cichlids are relatively limited.Within the examined area, certain low-migration-range species (Astyanax caucanus, [132]) and medium-migration-range species (Prochilodus magdalenae, [121]) exhibit high effective population sizes.However, Pseudoplatytoma magdaleniatum, also classified as a medium-migration-range species, displayed effective population sizes below 1000 [49].

Conclusions
This study unveils a spatial structure within C. kraussii, comprising three genetic groups that exhibit high genetic diversity compared to neotropical cichlids.Both natural stocks, located upstream and downstream the Nechı ´River mouth, display high effective population size, indicating a high long-term evolutionary potential.Nevertheless, the dam group, probably originated by founder effect, is susceptible to potential harmful effects because of their low effective population size and high relatedness degree.Additionally, this study developed a group of 16 highly polymorphic species-specific microsatellite loci for C. kraussii that are proposed as a tool for the future genetic population monitoring of the species.The information obtained reveals the importance of providing the fishing and consumption of the species a differentiated management at local level and contributes to the knowledge of factors modulating the population genetics of neotropical cichlids.
S1 Fig) and reached its lower values in the confined environment PHI.This gradient remained still when diversity was compared among genetic groups (S6-S7-S8>S4-S5>PHI).The distribution of genetic diversity was negatively related with the distance to the Cauca River mouth (S1 Fig), for both He (R: -0.970, p: 0.001) and Ar (R: -0.980, p < 0.0001).

Table 2 . Caquetaia kraussii genetic diversity in the middle and lower sections of the Cauca River, Colombia.
N: number of individuals.Na: number of alleles per locus.Ar: allelic richness.Ho: observed heterozygosity.He: expected heterozygosity.P HWE : p-value for the departure from Hardy-Weinberg equilibrium.F IS : inbreeding coefficient.P FIS : p-value for the inbreeding.Values in bold denote statistical significance.https://doi.org/10.1371/journal.pone.0304799.t002

Table 3 . Caquetaia kraussii pairwise comparisons of the standardized statistics F' ST and D EST in lower and middle sections of the Cauca River, Colombia.
Below diagonal, estimator value found.Upper diagonal, significant statistic.Values in bold denote statistical significance. https://doi.org/10.1371/journal.pone.0304799.t003

Table 4 . Recent bottlenecks detection tests and effective population size in Caquetaia kraussii populations.
IAM: infinite alleles model.SMM: stepwise mutational model.TPM: two-phase model.Probabilities according to Wilcoxon signed rank test (excess heterozygosity), calculated through Bottleneck v1.M ratio values calculated by Arlequin v3.5.2.2.Ne: number of individuals of the effective population size, calculated by NeEstimator v2.1.Values in bold denote statistical significance.