Kin-Aggregations Explain Chaotic Genetic Patchiness, a Commonly Observed Genetic Pattern, in a Marine Fish

The phenomenon of chaotic genetic patchiness is a pattern commonly seen in marine organisms, particularly those with demersal adults and pelagic larvae. This pattern is usually associated with sweepstakes recruitment and variable reproductive success. Here we investigate the biological underpinnings of this pattern in a species of marine goby Coryphopterus personatus. We find that populations of this species show tell-tale signs of chaotic genetic patchiness including: small, but significant, differences in genetic structure over short distances; a non-equilibrium or “chaotic” pattern of differentiation among locations in space; and within locus, within population deviations from the expectations of Hardy-Weinberg equilibrium (HWE). We show that despite having a pelagic larval stage, and a wide distribution across Caribbean coral reefs, this species forms groups of highly related individuals at small spatial scales (<10 metres). These spatially clustered family groups cause the observed deviations from HWE and local population differentiation, a finding that is rarely demonstrated, but could be more common than previously thought.


Introduction
Marine organisms with dispersive pelagic larvae are expected to be characterized by little to no genetic differentiation among populations over large geographic areas due to high gene flow. However, many marine species exhibit slight, yet significant, levels of genetic heterogeneity over various local spatial scales [1,2] which may be temporally unstable. Typically in these situations, molecular markers deviate from the expectations of Hardy-Weinberg equilibrium (HWE) within some samples [3][4][5]. This complex pattern, termed chaotic genetic patchiness (CGP), has been attributed to non-equilibrium conditions caused by variation in reproductive success of breeding adults during larval recruitment [2,6]. By seeking to understand the biological mechanisms underlying these patterns we can deepen our understanding of the ecology and evolution of complex marine populations.
There are several mechanisms that could create CGP across populations of marine organisms [6] including: local habitat differences [1], temporal variability in ocean currents affecting dispersal [7], selection at settlement [8], differential reproductive success [9], genetic drift within isolated populations [2], and the formation of kin-aggregations [2]. In marine populations, kin-aggregations are often overlooked as a mechanism to explain CGP because the probability of sampling relatives is expected to be low for species with highly dispersive pelagic larvae [10]. However, recent studies show that larvae can remain in kin-aggregations in the plankton [7,11,12]. Kin-aggregations likely form during the larval stage through aggregations of locally-spawned eggs and larvae by ocean currents and patchiness of food resources [13]. Various life-history traits, including demersal spawning and shoaling, might also cause kinaggregation formation in fishes [13].
Here we investigate the population genetic structure of a shoaling marine goby, Coryphopterus personatus. Despite being wide-spread throughout the Caribbean little information exists about its life history, particularly information about larval duration, dispersal, and population structure. We seek to use observed patterns of genetic structure to test the kin-aggregation hypothesis and discuss the consequences of this mechanism for the ecology of this and similar species. We then propose competing alternative hypotheses for the mechanism leading to the formation of kin-aggregations.

Methods
Fish were collected during summer 2002 from nine sites in the Mesoamerican barrier reef system (34 to 85 per site, Fig 1). Within each site, multiple shoals of C. personatus were collected by divers from a 1-ha area on the fore-reef and pooled for storage in 95% ethanol. All animals were humanely euthanized in the field using a standard fish sedative, Tricaine mesylate (MS222). Live fish were placed in a seawater bath with MS222 added for 5 minutes or until breathing ceased. All collections were approved by and performed in accordance with the ethical guidelines of the University of Windsor and the Belize fisheries department, the Mexican Secretaría de Medio Ambiente y Recursos Naturales, and the Department of Natural Resources and Environment of Honduras. Genomic DNA was extracted from pectoral fin tissue of each individual following the silica-based 96-well plate protocol of Elphinstone et al. [14]. Ten microsatellite loci were chosen from Hepburn et al. [15] and Hogan et al. [16]. PCR amplification was then performed in 12.5 µL reactions comprised of: approximately 100 ng template DNA, 200 μM of forward dye-label primer and reverse primer, 200 µM of each dNTP, 0.1 U Taq polymerase (0.025 U for CPER 184, Invitrogen, Burlington, Canada), 1x PCR buffer (provided by the manufacturer) and locus specific concentrations of MgCl 2 and bovine serum albumin (Table 1). PCR conditions were 94˚C for 2 minutes, followed by locus specific numbers of cycles of 94˚C for 15 s, locus specific annealing temperatures and times (Table 1), with a final extension of 72˚C for 90 s. The size of amplicons was determined using a LiCor 4300 DNA Analyzer with GeneImagIR 4.05 software (Scanalytics, Inc).
One of the common hallmarks of chaotic genetic patchiness is many deviations from Hardy-Weinberg equilibrium combined with weak but significant genetic differentiation between samples. To test for deviations from HWE exact tests for goodness of fit to the expectations of HWE were performed in ADEGENET [17]. To test the possibility of deviations from HWE being due to elevated levels of inbreeding, Monte Carlo tests for homozygote excess, which is indicative of either high levels of inbreeding, or the presence of null alleles, were performed using the U-score implemented in HWXTEST [18]. Inbreeding coefficients were calculated for each site using GSTUDIO [19]. Both inbreeding and null alleles are supported by homozygote excess in a population. Since individuals homozygous for a null allele or heterozygous for two null alleles will present as missing data, there may be an association between the amount of missing data at a locus, in a population and deviation from HWE when null alleles are present. Therefore, we performed a linear regression between the absolute value of the difference between expected and observed heterozygosity and the proportion of individuals which failed to amplify at each locus by sample combination to determine if the observed patterns are consistent with null alleles. To test for genetic differentiation among samples the proportion of different alleles (P D ) and pairwise F ST between sites were calculated across all loci using ADEGENET [17]. A Mantel test was performed in ADE4 to test for isolation-by-distance among sites using P D and F ST [20]. Significance was assessed by permuting genotypes among samples 10,000 times, using sequential Bonferroni to correct for multiple testing.
To test the hypothesis that increased levels of relatedness within sites explains the pattern of chaotic genetic patchiness observed here pairwise relatedness (r) was calculated between all The axes of the plots are the first two discriminant functions used to delineate clusters with inertia ellipses representing 67% of the variance. The end of the lines connected to the centre of each inertia ellipse represent individuals plotted on each discriminant function and denotes cluster membership. In locations with only two clusters present there is only one discriminant function, as such density plots of proportion of individuals present at each value of the discriminant function were included to show cluster separation. Numbers in parentheses indicate how many individuals were collected from each location.
pairs of individuals using the triadic likelihood estimator, which has been shown to be the least biased relatedness estimator in many circumstances, using the R package RELATED [21,22]. The arithmetic mean of the pairwise relatedness (r) was calculated for each geographic sample in R v3.1.1 [23]. Significance of mean within sample relatedness was calculated using a one-tail permutation test with genotypes permuted among geographic samples or clusters 1,000 times.
Our initial sampling pooled multiple shoals of individuals; if sites were composed of multiple groups of related individuals (i.e., shoals) and if those groups were pooled during collection, then the mean pairwise relatedness at the site level would be artificially deflated. We performed a Discriminant Analysis of Principal Components (DAPC) to identify genetic clusters of individuals within sites as implemented in ADEGENET [17,24]. All possible clustering solutions were compared using the k-means clustering algorithm. The number of clusters within each site was found by evaluating all possible values for k and choosing the most likely based on minimum BIC. The mean pairwise relatedness (r) was then recalculated for each cluster within sites generated with the DAPC and significance determined as above.
As further evidence of elevated relatedness and to minimize error associated with continuous metrics of relatedness, COLONY was used to determine the number of full-and half-sib dyads within clusters using the pair-likelihood score algorithm allowing for inbreeding [25]. We assumed polygamy and polyandry and a 1% genotyping error rate. Sibship was considered when dyads were supported with 95% confidence. Evidence of high levels of sibling pairs (both full and half-sibs) within clusters was assessed using a two-tailed permutation test, permuting individuals among clusters 10,000 times. The probability of excluding a group of three unrelated individuals from a full sibship was calculated using code written by the authors [26]. This probability of exclusion is useful in determining the statistical power the set of microsatellite markers used in this study have in determining sibship relationships.
In order to determine if elevated relatedness observed in clusters is the result of multiple dyads of related individuals or a few large groups of relatives within clusters undirected networks were created for each cluster identified using DAPC with individual C. personatus represented as nodes and sibling relationships (full-and half-sibs) represented as edges [27]. The mean local transitivity, or clustering coefficient, was used to understand the degree of interconnectedness within each cluster. Higher transitivity values indicate more interconnected networks of sibship relationships. Transitivity is a metric used to understand networks and ranges from 0 to 1 with 0 representing no clustering of relationships and 1 representing maximally  [27,28]. Networks were analysed using the package IGRAPH [29].

Results
There were deviations from the expectations of HWE in 48% of site by locus comparisons (S1 Fig). These deviations from HWE were attributed to homozygote excess which was observed in 53% of comparisons (S2 Fig). The ubiquity of the homozygote excesses in all sites and in all but one locus does not support the hypothesis of null alleles but rather, indicates potentially high levels of inbreeding [30]. Additionally, we found no relationship between locus specific sample deviations from the expectations of HWE and the percent of individuals that failed to amplify at each locus (F (1,88) = 0.0018, p = 0.97, Fig 2). Additionally, all samples had significant estimates of F is for six or more loci ( Table 2). All these results taken together refute the hypothesis of null alleles driving the patterns seen here. Significant differences in P D and F ST were seen in 69% and 97% of the pairwise site comparisons respectively, even between closely spaced sites (ex. TA1 -TA2:~5 km) ( Table 3). Mantel tests for an isolation-by-distance pattern of differentiation were not significant, indicating no correlation between genetic and geographic distance (P D , r = 0.22, P = 0.19; F st , r = 0.31, P = 0.14).
Within sample mean pairwise relatedness was 34-60% higher than the random expectation in four sites (BC1, BC2, TA1 and TA3; p < 0.05, Fig 3A). Within-site clustering analyses found 2 to 5 highly related groups of individuals (8-36 individuals per cluster) within each of the nine geographic samples (Figs 1 and S3). Group relatedness measured within these clusters was found to be 73-314% higher than expected in 68% of all clusters (p <0.05, Fig 3B).
The probability of excluding an unrelated individual from a sibship group was 0.998 when all loci are used and 0.927 (95% CI 0.738 to 0.992) when using a random combination of five loci (the minimum number of loci amplified in any individual). The proportion of full-and half-sibs within clusters was calculated to be 22.4±13.8% (S.D.; 3.0±4.4% and 19.3±11.5% respectively) and was significantly greater than clusters of random individuals generated by permutation (p <0.05; Table 4; Fig 3B). The cluster networks were found to be composed of a few large groups of siblings and have a mean transitivity of 0.504±0.213 which was significantly greater than graphs of equal order and size generated through the Erdos-Renyi model in all but four clusters (p <0.05; Table 4).

Discussion
Clusters of highly related individuals were found within geographic samples of Coryphopterus personatus (Fig 3B). This explained wide-spread deviations from HWE, genetic differentiation over short distances-as little as 5 km-and the non-equilibrium spatial pattern of genetic differentiation. While we do not have appropriate samples to determine temporal stability of these patterns, all of these results are consistent with CGP. Some geographic samples as a whole showed substantially higher than expected levels of relatedness. However, we found that within each sample there were between 2 and 5 highly related clusters of individuals. These clusters had significantly higher proportions of full-and half-sibs than expected and showed high levels of interconnectedness with a few large groups of related individuals, indicating familial relationships among individuals in these groups. This is unexpected for a species with pelagic larvae where the prevailing paradigm suggests wide dispersal and mixing via physical oceanographic processes and homogenizing the gene pool over large distances.
We propose two mechanisms driving the formation of related groups in this species. First, larvae could remain together in the plankton and settle onto a reef together. There are many potential selective benefits to larvae remaining in a group throughout the pelagic larval phase including predator avoidance [31] and maintenance of position within a food patch [32]. The presence of kin-aggregations may simply reflect the fact that group formation occurs in the egg phase. Consistent with this mechanism, related individuals have been found within a single larval cohort in several other species [7,11]. Given this mechanism, we expect to find individuals of the same cohort within an aggregation to be related to each other, but not related to individuals from other cohorts within the same sample site.
Another mechanism that could explain kin-groups on a coral reef is a lack of larval dispersal. It is possible that larvae do not enter the pelagic environment, remaining on their natal reef. Other marine species that do not have a pelagic larval stage are characterized by low levels of gene flow, frequent population bottlenecks and strong phylogeographic breaks [33]. If C. personatus has lost their pelagic life stage, we expect a similar genetic pattern. Additionally, we would expect to see highly related individuals across cohorts within a single sample site. Natal recruitment increases the chance of settling in a suitable habitat [34] and, given that C. personatus needs to feed 2 days post-hatching [35], it is plausible that rather than entering the plankton where food is sparse and patchy [36] larvae could remain on the reef where food is more abundant [37]. However, given the lack of observed isolation by distance between geographic locations this mechanism seems less likely than the possibility of individuals staying together throughout the planktonic phase. The formation of kin-aggregations is a potentially important evolutionary process due to the likelihood of very little gene flow on extremely small spatial scales and increased rates of inbreeding, potentially leading to greater vulnerability to extirpation [38]. The benefits of kin-aggregations (protection from predation, kin-selection, etc. [39]) may outweigh the costs of inbreeding. The pattern revealed here demonstrates that benthic kin-aggregations are possible in marine species with pelagic larvae and may be more common than previously expected. The shaded region indicates the area within the 95% confidence intervals calculated using a permutation test with 1,000 iterations. Triangles in 3B indicate proportion of full and half-sibs within the cluster with shaded symbols indicating significantly elevated transitivity when compared to randomly generated networks equivalent to those observed in the clusters.