Skip to main content
Advertisement
  • Loading metrics

Genetic analysis of a phenotypic loss in the mechanosensory entrainment of a circalunar clock

Abstract

Genetic variants underlying traits that become either non-adaptive or selectively neutral are expected to have altered evolutionary trajectories. Uncovering genetic signatures associated with phenotypic loss presents the opportunity to discover the molecular basis for the phenotype in populations where it persists. Here we study circalunar clocks in populations of the marine midge Clunio marinus. The circalunar clock synchronizes development to the lunar phase, and it is set by moonlight and tidal cycles of mechanical agitation. Two out of ten studied populations have lost their sensitivity to mechanical agitation while preserving sensitivity to moonlight. Intriguingly, the F1 offspring of the two insensitive populations regained the sensitivity to mechanical entrainment, implying a genetically independent loss of the phenotype. By combining quantitative trait locus mapping and genome-wide screens, we explored the genetics of this phenotypic loss. QTL analysis suggested an oligogenic origin with one prevalent additive locus in one of the strains. In addition, it confirmed a distinct genetic architecture in the two insensitive populations. Genomic screens further uncovered several candidate genes underlying QTL regions. The strongest signal under the most prominent QTL contains a duplicated STAT1 gene, which has a well-established role in development, and CG022363, an ortholog of the Drosophila melanogaster CG32100 gene, which plays a role in gravitaxis. Our results support the notion that adaptive phenotypes have a complex genetic basis with mutations occurring at several loci. By dissecting the most prevalent signals, we started to reveal the molecular machinery responsible for the entrainment of the circalunar clock.

Author summary

Biological clocks allow organisms to synchronize their physiology and behaviour with their environment. A key part of this mechanism are reliable environmental cues that play a role in setting the clock. Here we study the circalunar clock of Clunio marinus, a marine insect that inhabits the intertidal zone. The circalunar clock is set by moonlight as well as mechanical agitation caused by the tides. Interestingly, in two out of ten studied Clunio populations, mechanical agitation was not sufficient to set the circalunar clock. We found that this loss of sensitivity has a genetic basis. Furthermore, we use the insensitive populations as natural mutants. Finding the mutations that lead to the loss of sensitivity allows us to uncover the molecular mechanisms underlying the setting of the yet unexplored molecular clockwork. Additionally, by uncovering the genetic basis for the loss of sensitivity in two independent populations, we get insights into the processes underlying the evolutionary adaptation of biological clocks.

Introduction

Life on earth adapted to anticipate predictable changes in its environment in order to survive, a case in point is the ubiquity of biological clocks. Due to the earth’s rotation around its axis, most living creatures are exposed to 24-hour cycles, which has resulted in the pervasiveness of circadian clocks [1,2]. Furthermore, marine organisms inhabiting intertidal zones are exposed to tidal cycles of 12.4 hours (or sometimes 24.8 hours), which are modulated across the 29.53-day lunar cycle. Thus, marine organisms have evolved circatidal and circalunar clocks. Due to their universal occurrence, circadian clocks have been intensely studied over the last century [3,4]. Comparatively much less is known about circatidal and circalunar clocks [510], although some argue that as life evolved in the marine environment circadian clocks may have evolved from evolutionarily older circatidal or circalunar clocks [11,12].

Biological clocks must be appropriately set to fulfill their role in synchronizing endogenous physiological processes, reproduction, and behavior to the exogenous environmental cycles. Environmental variables that reliably fluctuate with geophysical cycles serve as clock synchronizers, so-called zeitgebers. The most studied zeitgeber is the light-dark cycle that synchronizes the circadian clock [1]. Two other synchronizers of the circadian clock that were experimentally confirmed are temperature and vibration [1316]. In contrast, many environmental variables fluctuate with the tides and the following have been shown to serve as strong zeitgebers of the tidal clocks: mechanical disturbance of the water [1719], changes in hydrostatic pressure [18,20,21], temperature fluctuations [22,23], changes in salinity [24] immersion and emersion [22].

Not surprisingly, moonlight was shown to be a unique cue for synchronizing lunar clocks [2529]. Furthermore, several synchronizers that were first discovered as tidal cues, were consequently demonstrated to be strong zeitgebers for setting circalunar clocks: vibration that accompanies the rise and fall of the tides [30,31] and temperature fluctuations [32]. Depending on the stability and robustness of the cycles in the environment that the organism inhabits, different zeitgebers provide reliable cues for biological clocks in different organisms. Finally, while biological clocks are not crucial for the survival of all organisms, the harsher the environmental cycles, the stronger the selection pressure on the presence of reliable biological clocks. Studying organisms inhabiting these harsh environments promises to give insight into the nature of yet unexplored biological clocks. One such species whose survival critically depends on its ability to simultaneously synchronize to lunar and circadian cycles is the marine midge Clunio marinus.

Clunio spends most of its life in a larval stage submerged in the intertidal zone of the Atlantic Ocean. During full moon and new moon, adults emerge on the sea surface, mate, oviposit eggs and die within a few hours. Circadian and circalunar clocks allow them to precisely time reproduction to the lowest of the low tides. Individuals that do not emerge at the appropriate time miss the ecologically suitable low tide for reproduction and the opportunity to mate and are thus eliminated from the population. Therefore, strong selection pressure shapes various timing phenotypes in populations that encounter different tidal regimes along the Atlantic coast [3337]. Moonlight, tidal turbulence and temperature have been shown to be zeitgebers setting the circalunar clock of Clunio marinus [27,31,32,38]. However, different Clunio populations are differentially sensitive to zeitgebers, most likely due to the unreliability of different zeitgebers in certain geographical locations [38]. Neumann discovered one population insensitive to moonlight and two that were insensitive to tidal turbulence [38]. Tidal turbulence was defined as low frequency, low amplitude vibration that coincides with the rising tide [38]. This stimulus shifts every day by 50 minutes resulting in a semi-lunar 14.7 days entrainment pattern [38].

Evolutionary losses of function can have a creative role in evolution [39], and genetic and genomic analysis of the affected populations can identify the genes involved in corresponding molecular pathways [40]. Our goal was to establish if the loss of mechanosensory entrainment in the two populations was consistent with it having a common genetic basis, or whether it occurred independently in each population. We also sought to determine if genetic control of this phenotype is likely controlled by a single locus of major effect or whether multiple loci play discernible roles. Finally, we aimed to identify genes likely to be responsible for impacting the trait.

Results

Loss of sensitivity to mechanical entrainment is a genetically determined trait that evolved independently in two Clunio populations

The circalunar clock robustly regulates the emergence of Clunio adults over a lunar month. We study this phenomenon under laboratory conditions by counting the number of emerged adults per day over several lunar cycles, and then assess characteristics of the phenotype using circular statistics: phase, period, rhythmicity, etc. The sensitivity of different strains to the zeitgebers is therefore estimated indirectly via the strength of their emergence rhythms upon entrainment to moonlight or tidal turbulence. Strains are named based on an abbreviation for the sampling sites and the time point of emergence under moonlight entrainment: NM = „new moon“, FM = „full moon“, SL = „semi-lunar”= full moon and new moon. Here we tested the entrainment of Plou-2NM, Ros-2NM, Lou-2NM, Bria-1SL, and Por-1SL under tidal turbulence for the first time, while the entrainment to moonlight [35,41,42] and tidal turbulence [38] were previously reported for the other populations (Figs 1A and S1 and Tables 1, S1, and S2). Vigo-2NM is the most southern strain and it is sensitive to tidal turbulence. Going north, we come across Jean-2NM which is insensitive to tidal turbulence, followed by five closely related populations at the coast of Bretagne: Plou-2NM, Ros-2FM, Ros-2NM, Lou-2NM, Bria-1SL, Por-1SL; and finally, the two most northern populations: He-1SL in Germany and Ber-1SL in Norway (Fig 1A). Bretagne populations vary from very sensitive in the north (Por-1SL and Bria-1SL), and intermediate sensitivity in the south (Ros-2NM, Lou-2NM, and Plou-2NM) to completely insensitive (Ros-2FM) (Figs 1A and S1 and Table 1). This suggests that the frequency of the “insensitive alleles” may vary among the Bretagne populations, giving rise to varying degrees of sensitivity. Furthermore, as Ros-2FM and Jean-2NM are arrhythmic under tidal turbulence but rhythmic under moonlight (Figs 1B, S1O, and S1S) [38], we can conclude that their lunar clocks are intact, but sensory inputs have evolved rendering them insensitive to one of the cues. To characterize the genetic basis for this phenotypic loss, we crossed turbulence-insensitive strains to a strain sensitive to both tidal turbulence and moonlight, Por-1SL (Figs 1B, S1F, and S1G), and analyzed the emergence of adults in F1 and F2 generations (Figs 1B, 1C, and S2). We calculated circular statistics for the emergence distributions in Figs 1 and S1, and used vector length of the summary circular statistics for estimating the strength of the entrainment (Table 1). We found that sensitivity to tidal turbulence is genetically determined and a dominant trait (Table 1).

thumbnail
Fig 1. Sensitivity to mechanical entrainment was lost twice independently in European Clunio populations.

(A) The origin of the ten Clunio populations used in this study is shown on the map (S1 Fig). Map data was obtained from https://www.naturalearthdata.com/downloads/50m-physical-vectors/. Heatmap depicts the sensitivity of each strain to mechanical entrainment as estimated by the circular statistics (Table 1). (B, C) Graphs show the fraction of emerged individuals per lunar day upon mechanosensory entrainment. The total number of emerged individuals is depicted in the left corner of each bar graph, and raw data is given in S1 and S2 Tables. (B) Graphs depict the emergence patterns of the parental populations. Ros-2FM and Jean-2NM populations are insensitive to tidal turbulence as shown by the arhythmic emergence patterns, while the Por-1SL population is sensitive. Geographical locations and the years when strains were established are given in S1 Table. (C) Crossing sensitive and insensitive strains resulted in sensitive F1 progeny. Furthermore, when the two insensitive strains were crossed, the resulting F1 hybrids regained sensitivity to the entrainment (right). Individual F1 families that were raised together were depicted in different shades of blue. The total number of individuals per family is listed in S2 Table.

https://doi.org/10.1371/journal.pgen.1010763.g001

thumbnail
Table 1. Ros-2FM and Jean-2NM are insensitive to tidal turbulence.

https://doi.org/10.1371/journal.pgen.1010763.t001

To test if the same mutations are responsible for the loss of sensitivity in Jean-2NM and Ros-2FM we performed a complementation cross. Interestingly, the four F1 families raised separately all regained their sensitivity to mechanical entrainment (Fig 1C). This finding strongly suggests a different and recessive genetic basis for the loss of sensitivity in Jean-2NM and Ros-2FM.

Discovering genomic loci responsible for the phenotypic loss in the Ros-2FM population

Quantitative trait loci (QTL) mapping was conducted to locate the regions of the genome containing genetic variants responsible for the loss of sensitivity to tidal turbulence in the Ros-2FM population. The resolution of QTL mapping depends on the number and distribution of markers as well as the recombination events which in turn depends on the number of individuals in the crossing family. To maximize our chances of achieving narrow confidence intervals, we performed a large number of crosses and then selected two families for the analysis: F2 progeny of Ros-2FMxPor-1SL cross (RxP-F2.1) and a backcross progeny of Ros-2FMxPor-1SL F1 female to Ros-2FM male (RPxR-BC.1) (S2 Table). The number of informative markers was 137 in RxP-F2.1 and 123 in RPxR-BC.1. The total number of recombination events was 269 and 61, while the number of unique genomic positions of the recombination events was 51 and 36 in RxP-F2.1 and RPxR-BC.1 families respectively (Fig 2B and 2D).

thumbnail
Fig 2. QTL mapping in two Ros-2FmxPor-1SL mapping families reveals one shared additive QTL on the second chromosome.

Regions of the genome harboring genes responsible for the loss of mechanical entrainment in the Ros-2FM population were identified in a [Ros-2FM x Por-1SL] x Ros-2FM backcross (RPxR-BC.1), see panels A and B, and a [Por-1SL x Ros-2FM] F2 intercross (RxP-F2.1), see panels C and D. Bar graphs show the number of emerged individuals per day (A, C). The proportion of insensitive (orange) and sensitive (blue) individuals found on each day was calculated based on estimated probabilities (S3 and S4 Figs). The ratio of sensitive and insensitive individuals in each family is indicated in the top right corner. (B, D) QTL intervals are given for: composite interval mapping–dark blue, fitqtl: additive loci–black, fitqtl: epistatic loci–gray, EM-algorithm–light blue. The green marks the phenotypic panel with the highest convergence in EM analysis (i.e. the number of times the panel was found to be the best in 1000 runs) and the lowest error (i.e. the fraction of individuals in each panel for which the binary phenotype differs significantly from the starting probabilities; see Methods QTL mapping/EM-pipeline, S3 Table). Raw data is given in S3 Table.

https://doi.org/10.1371/journal.pgen.1010763.g002

If ~130 markers and ~40 unique recombination events would be evenly distributed along the 80Mb genome, we could achieve the mapping resolution of ~2-3Mb. However, several non-recombining regions were found in both families and one in which the marker order was inverted as compared to the reference (Fig 2B and 2D). These regions are thought to be large polymorphic inversions [43] which are limiting mapping resolution on the first chromosome and in the right arm of the second chromosome [44].

In order to phenotype F2 and BC progenies, we must distinguish between “sensitive” individuals that emerged within the Por-1SL-like peak and “insensitive” individuals that can emerge on any lunar day. However, the emergence peak does not only contain sensitive individuals, but also some of the insensitive individuals. To overcome this issue, we tested different phenotyping strategies and mapping algorithms (S3S10 Figs) (see Methods and S1 Methods for more details). We calculated the probability of finding sensitive and insensitive individuals on each lunar day (S3 and S4 Figs) and used it as a phenotypic score for the QTL analysis. In addition, we generated a reduced dataset by excluding the individuals with uncertain phenotypes and treated those with the probability of being „insensitive”higher than 0.7 as „insensitive”and lower than 0.3 as „sensitive”(S3 Fig). Finally, this approach allowed us to estimate the ratio of the two phenotypes in the F2 and BC generations: 69:31 in the RxP-F2.1 intercross (Fig 2C) and 50:50 in the RPxR-BC.1 backcross (Fig 2A). The difference in ratios is attributed to a higher portion of sensitive individuals (parental and F1 genotypes) in an F1xF1 intercross as compared to an F1xRos-2FM backcross. Similar ratios were found in Jean-2NMxPor-1SL intercross families (see below). Such segregation of parental phenotypes in F2 and BC progenies indicates that this trait is determined by a small number of loci.

Furthermore, in order to screen for additive QTLs, we ran standard interval mapping with scanone (S6E–S6H Fig) and composite interval mapping (Figs 2B, 2D, and S6I–S6L). To investigate QTLs in epistasis we ran a two-dimensional scan with scantwo function (S7 Fig). QTLs identified with scanone and scantwo were then fed into the multiple-QTL-mapping pipeline implemented in the R/qtl package with the fitqtl function (Figs 2B, 2D, S6Q–S6T, and S7). Since various models can be significant with fitqtl, we also tested a Bayesian method implemented in R package qtlbim designed to find the best QTL model for fitqtl (S6Y-AB Fig).

The multiple QTL mapping pipeline revealed one additive QTL and two QTLs in epistasis in the RPxR-BC.1 family, and two additive QTLs in the RxP-F2.1 family (Figs 2B, 2D, and S6S8). The additive QTL on the second chromosome was found in both crossing families. The QTL on the third chromosome interacts additively with the QTL on the second chromosome in the RxP-F2.1 reduced dataset (S7E and S7F Fig), while in the RPxR-BC.1 family it is in a negative additive-by-additive epistatic interaction [45] with the QTL on the first chromosome. The QTL on the first chromosome has a positive additive effect in the heterozygous AB background of the QTL on the third chromosome and vice-versa (Figs 2D and S7A–S7D). The QTLs in epistasis were found only in one of the families, potentially because the presence of the epistatic interaction depends on the genetic background. This can occur if the mutations underlying QTLs in epistasis are not fixed in the two populations. In other words, if a mutation underlying QTL1 only has an effect in the presence of another mutation underlying QTL2, and one of the two alleles is absent in the parent of that crossing family, the epistatic interaction would not be identified. Thus, to find the regions of the genome containing the loci most likely pervasive in the natural populations, we further focused only on the additive QTLs.

In order to further estimate the effect of the phenotyping uncertainty on additive QTLs, we generated the scanone-optimized expectation-maximization pipeline (see Methods for more details, Figs 2 and S3S8). In a nutshell, all individuals are assigned binary phenotypes (0 or 1) depending on their starting phenotype probabilities. Then the algorithm changes the phenotypes of individuals in order to find the binary phenotype panel with the highest LOD score (Figs 2C, 2D, and S6M–S6P, for more details see Methods). The resulting binary phenotype panels are assessed for their credibility by how often the algorithm converges to a specific panel (% convergence) and by which fraction of animals differs by more than 0.1 to the starting probability. In order to first test this pipeline, we used sex as a known binary phenotype, transformed it into a probability phenotype with varying degrees of uncertainty, and tested how well the QTL intervals from the resulting phenotypic panels match the true sex locus (S9 and S10 Figs). The phenotypic panels with the highest degree of convergence and lowest error match extremely well with the true sex locus (S9 Fig), showing the validity of our approach.

In the RPxR-BC.1 family, the QTL interval of the EM binary panel with the lowest percentage of individuals with error>0.1 (20%) and the second-highest percentage of convergence (39.9%) perfectly overlaps with the QTL interval provided by CIM, and fitqtl mapping on probability phenotypes (Figs 2B, S8A, and S8B). In the RxP-F2.1 family, the best panel according to the same criteria is also on the second chromosome: the percentage of convergence is 31.5, percentage of individuals with error > 0.1 is 21% (Figs 2D, 2K, S8C, and S8D). The high level of convergence in the phenotype panels shows that the QTL landscape does not contain too many potential local optima. This suggests that the effect of the uncertainty on the QTL analysis is limited and does not have a major impact on the QTL location.

Other than the phenotyping uncertainty, a polygenic or oligogenic origin could also lead to a reduction in the QTL LOD scores. In the RPxR-BC.1 family, the full fitqtl model explains 28% of the phenotypic variance (S6S8 Figs and S3 Table). In the reduced (binary) dataset that number increases to 40.79% (S6S8 Figs and S3 Table). The QTL on the second chromosome alone explains 9.27% and the epistatic interaction between the first and the third chromosome explains 11.09%; while in the reduced dataset that becomes 2.82% and 11.68% respectively. In the RxP-F2.1 family, the fitqtl model explains 13.74% in the full and 21.8% in the reduced dataset. The percentage of variance explained by the QTL on the second chromosome is 5.9% and on the third chromosome is 8% (18.29% and 16.56% respectively in the reduced dataset). Previous QTL analysis on mapping the lunar phase, a phenotype of discrete nature, identified two QTLs: one explains 23% of the variation and the other 14% [42]. In both the present and the previous study, we find a small number of significant loci impacting a trait, with a comparable proportion of phenotypic variance explained. This potentially indicates that we did not lose significant mapping power due to the non-discrete nature of the phenotype in the current study. Furthermore, this implies that if the detected loci collectively account for up to 20–40% of the phenotypic variance, unidentified loci or loci of small effect size may still play a substantial role.

Finally, although the multiple QTL mapping is crucial for investigating the most likely number of QTLs, it tends to overestimate the QTL intervals. Thus, to investigate the genes underlying the additive QTLs we relied on composite interval mapping conducted in 10cM windows (dark blue in Figs 2B, 2D, and 3A), while being informed by the QTL intervals resulting from the best binary phenotypic panels found by the EM algorithm (light green in Figs 2B, 2D, and 3A).

Whole-genome sequencing reveals genetic variants associated with insensitivity to tidal turbulence in Ros-2FM

As discussed above, the resolution of the QTL mapping in our model system can theoretically go down to 2-3Mb which can still harbor several hundred genes. Therefore, in order to further identify specific genomic loci underlying QTL regions, we combined QTL mapping with genome-wide association analysis (Fig 3A and 3B). We performed whole genome sequencing of 20–24 field-caught males from nine Clunio populations differentially sensitive to tidal turbulence (Tables 1 and S1 and Figs 1A and S1) and called 746,887 SNPs and small indels. We used circular summary statistics (vector length) as the population-wide phenotypic score for sensitivity to tidal turbulence (Fig 3B and Table 1). We then applied the bayesian tool BayPass for calculating the strength of association of each of the variants to the insensitivity score (S5 Table) while using a kinship matrix to correct for population structure (S11 Fig). Out of 746,887 variants, 357 were significantly associated with sensitivity to tidal turbulence (Fig 3B, top panel). These variants affect 178 genes, as determined by SNPeff (S6 Table and S11 and S12 Figs). Most of the variants are located in non-coding regions and only a handful have potentially disruptive impacts on the respective genes (S11C and S11D Fig).

thumbnail
Fig 3. A combination of QTL mapping and genome-wide screens points to STAT-1 and gravitaxis gene CG022363 as likely contributing to the loss of sensitivity to mechanical entrainment in the Ros-2FM population.

(A) QTL intervals from the two mapping families are plotted along the three chromosomes (modified from Fig 2B and 2D). (B-top). The 746.887 variants called from Ros-2FM and eight differentially sensitive populations were screened for their association with the sensitivity to mechanical entrainment using BayPass. Sensitivity to mechanical entrainment was estimated from emergence patterns using circular statistics (Table 1). Median vector length was used as the phenotypic score. The Bayesian factor (BFis) depicting the strength of the association is plotted for all the variants along the three chromosomes (gray and black). 375 significantly different variants, as determined by BFis > = 20, eBPis > = 2, XtXst > = 21.67 are marked in red (see Methods section for details, raw data are published on Max Planck Repository Edmond https://doi.org/10.17617/3.HUYLPR). The numbers indicate loci within the QTL intervals. Zoom into locus 4 is shown in panel (C) and the others in S12 Fig. (B-middle and bottom plots): To expose potentially adaptive genetic variants under positive selection as a result of local adaptation, we contrasted the turbulence-insensitive Ros-2FM population with the sympatric Ros-2NM population, determined to be sensitive to this entrainment (S1 Fig). (B-middle). Genetic differentiation (FST) was plotted for the variants found in Ros-2FM and Ros-2NM populations. Red marks FST values above 0.5. (B-bottom). The cross-population nSL (number of segregating sites by length) statistic shows the decay of haplotype homozygosity surrounding adaptive alleles as a result of a selective sweep in Ros-2FM in contrast to Ros-2NM. The top 1% clusters of at least 10 variants with XP-nSL values above 2 in a 10kb window were called significant (highlighted in gray). In addition, the variants with extremely high XP-nSL values above 4 are depicted in red. (C) The region of the genome under the prevalent additive QTL on the second chromosome, which also contains the most associated variants with the highest association scores, strong FST signal, and a significant signature of a partial sweep, harbors the STAT-1 and CG022363 genes (shown in red shaded area). The position of the associated variants is shown in red, the candidate gene in blue, and two neighboring genes in gray. For the depiction of all genes affected by associated mutations under the two QTLs see S12S14 Figs. (D, E) The phylogenetic trees of the STAT and CG022363 gene families were shown for Caenorhabditis elegans, Drosophila melanogaster, Musca domestica, Anopheles gambiae, Mus musculus, Clunio marinus candidate gene (blue), and Clunio marinus ortholog of the candidate gene (black). (F) C. marinus has two STAT genes: CG012971 (STAT-1) and CG022905 (STAT-2). The alignment, conserved domains, and percentage of conservation between the two amino acid sequences are shown.

https://doi.org/10.1371/journal.pgen.1010763.g003

Crucially, the Ros-2NM population, which is sympatric to the insensitive Ros-2FM population, is sensitive to tidal turbulence (S1 Fig and Table 1). We can therefore ask if this phenotypic loss occurred as a result of a recent selective sweep due to local adaptation. To explore this, we first estimated genomic differentiation (FST) between the two populations and found that the most prominent loci identified in the BayPass screen also have high FST values (Fig 3B–middle panel). Furthermore, if we assume that the causal genetic variant underwent positive selection as a result of a selective sweep, we can expect that it would leave a characteristic pattern of long high-frequency haplotypes and low genetic diversity in its vicinity [46]. The measure for a selective sweep occurring as a result of a local adaptation is calculated as the decay of haplotype homozygosity between the two populations [46] and is implemented in cross-population statistic XPnSL (nSL: number of segregating sites by length) [47] in selscan 2.0 [48] (for details see Methods section). The top 1% of the 10kb regions containing a cluster of alleles with high XPnSL values were considered to be candidate regions under selection in Ros-2FM (Fig 3B–bottom panel).

Finally, we combined the results from QTL analysis, the genome-wide association screen, and the selection screen. We identified the loci underlying additive QTLs (Figs 3C and S12) and found the orthologues in model organisms of all the genes in the vicinity of the associated mutations (Figs 3D, 3E, S13, and S14). The genomic region underlying the shared additive QTL on the second chromosome (Fig 3A) contains one cluster of variants with the highest association scores in BayPass, as well as high Fst and a selective sweep signature (Fig 3B). This cluster of variants affects three genes: signal transducer and activator of transcription 1 (STAT-1), CG022363, and Lon (Fig 3C). CG022363 is an ortholog of the Drosophila melanogaster CG32100 gene (Fig 3D), which plays a role in gravitaxis [49] but is otherwise poorly investigated. The STAT protein family is conserved in most vertebrates and invertebrates (Fig 3E). Clunio, unlike Drosophila melanogaster, has two paralogues: CG012971 (STAT-1) and CG022905 (STAT-2). STAT-1 is most likely the ancestral STAT protein: ortholog of Anopheles gamibiae STAT2 and Mus musculus STAT5a,5b, and 6; while STAT-2 is newly duplicated in Clunio (Fig 3E). The two Clunio STAT proteins are 83% identical in amino-acid sequence (Fig 3F). The most divergent protein domains in the two Clunio STAT proteins are the N-terminal domain, coiled-coil domain, and sh2 domain (Fig 3F). Lon is a highly conserved protease (S13F Fig) which is crucial for mitochondrial homeostasis [50].

As the QTL mapping explains at most 40% of the phenotypic variance, other loci of smaller effect must exist and are potentially picked up by the association analysis. We, therefore, explored all the genes identified by BayPass and SNPeff by conducting a gene ontology (GO) term enrichment analysis (S15 Fig). Out of 178 genes, 67 went into the GO analysis as they passed the criteria of having known orthologues, and 51 of those genes drove 78 significant GO terms (S15 Fig). Interestingly, gravitaxis was one of the highly significant GO terms. This result, together with the previous identification of the gravitaxis gene CG022905 under the prevalent QTL, prompted us to look more closely into the genes with known roles in gravitaxis (S7 Table). We found that out of 27 such genes in Drosophila, 6 are on our list of genes potentially associated with the loss of sensitivity to tidal turbulence (S7 Table).

Complex genetic basis for the loss of sensitivity to tidal turbulence in Jean-2NM population

As detailed above, complementation crosses between the two insensitive strains identified a separate origin for the insensitivity to tidal turbulence in Jean-2NM and Ros-2FM. To corroborate this finding, we further explored the genetic basis of insensitivity in the Jean-2NM population (Fig 4).

thumbnail
Fig 4. The oligogenic basis for the loss of sensitivity to tidal turbulence in the Jean-2NM population.

QTL mapping in two Jean-2NM x Por-1SL intercross families was performed to find genomic regions harboring genes responsible for the loss of mechanical entrainment in Jean-2NM (A-D). (A, C) The proportion of insensitive (orange) and sensitive (blue) individuals found on each day was calculated based on estimated probabilities (S16 Fig). The ratio of the two phenotypes is indicated in the top right corner. (B, D) QTL intervals are given for: composite interval mapping–dark blue, fitqtl: additive loci–black, EM-algorithm–light blue. The green marks the phenotypic panel with the highest convergence in EM analysis, and the lowest error (see methods QTL mapping/EM-pipeline. Raw data and full statistics are given in S8 Table). (E) Association analysis was performed to find mutations associated with the loss of sensitivity to tidal turbulence in the Jean-2NM population. (E-top): We screened for variants associated with the loss of sensitivity to tidal turbulence using 769.379 variants called from 210 individuals belonging to 9 differentially sensitive populations (S1 and S19 Figs). Bayesian factor (BFis) is plotted for each variant along the three chromosomes. We found 173 significantly associated SNPs and indels (BFis > 20, eBPis > 2, XtXst > 20.02; see Methods section for details) marked in red. The list of SNP effects and genes affected by them is given in S9 Table. Raw data are published on Max Planck Repository Edmond https://doi.org/10.17617/3.HUYLPR). (E-middle and bottom) To find loci under selection in Jean-2NM that could be responsible for the loss of sensitivity, we contrasted it with the closest turbulence-sensitive population Vigo-2NM. (E-middle) plots show the results of genomic differentiation analysis (Fst) between Vigo-2NM and Jean-2NM. Red marks Fst values above 0.5. (E-bottom) Plots depict the results of the selective sweep analysis in Jean-2NM as compared to the Vigo-2NM. The top 1% 10kb regions under selection are gray.

https://doi.org/10.1371/journal.pgen.1010763.g004

QTL mapping was conducted in two intercross families: in the JxP-F2.2.3 family we found one additive QTL on the first chromosome, while in the JxP-F2.1.6 family two additive QTLs on chromosomes 2 and 3 appeared (Figs 4A–4D, S16, and S17 and S8 Table). Interestingly, while the ratio of sensitive to insensitive individuals was consistently 73:27 in three independent intercross families including JxP-F2.1 (S18 Fig), the JxP-F2.2 family had a unique ratio of 62:38 (S18 Fig). This could indicate that the genetic basis for the insensitivity in JxP-F2.2 is unique and may explain why we found different QTLs in JxP-F2.1.6 and JxP-F2.2.3. In addition, this finding suggests that there is an oligogenic origin for the trait and that the alleles responsible for the loss of sensitivity are not fixed in the Jean-2NM population.

We then performed the same genetic screens in Jean-2NM as in Ros-2FM (Fig 4E). The association analysis identified 173 SNPs significantly associated with the phenotypic loss in Jean-2NM as compared to the eight sensitive populations (Fig 4E–top panel and S9 Table). As in the Ros-2FM association analysis, most associated SNPs are found in the non-coding regions of the genome (S19 Fig). To investigate potential selective sweeps in Jean-2NM, we tried contrasting it with the closest turbulence-sensitive population we had: Vigo-2NM (Figs 4E and S1). However, since the two populations are geographically quite far from each other, the FST values were very high overall (Fig 4E–middle panel). Thus, Vigo-2NM is not the most suitable reference population for discovering reliable selective sweeps in Jean-2NM (Fig 4E–bottom panel). Taken together, due to the complexity of the QTL mapping results in Jean-2NM, as well as the lack of prominent peaks in the association analysis, we were not able to identify candidate genes with enough precision. Nevertheless, the absence of a prominent QTL on chromosome 2 in Jean-2NM corroborates the finding that this phenotype was lost independently in Ros-2FM and Jean-2NM.

Discussion

Loss of sensitivity to mechanosensory entrainment: result of convergent evolution?

Loss-of-function alleles were once only associated with deleterious mutations, and loss of genes with the loss of redundant gene duplications. It is now understood that the loss of alleles and genes can drive adaptive phenotypic diversity [39,40]. Furthermore, in contrast to the early evolutionary theories, we now come to understand that convergent evolution is more of a rule than an exception. A few recent studies show that the loss of traits can happen as a result of convergent evolution: repeated eye loss in Mexican cavefish [51] and the loss of flight in paleognathous birds [52].

Clunio colonized the European Atlantic coast from south to north following the last ice age about 10.000 to 20.000 years ago [36], which in evolutionary time is rather short. Given that insensitive populations are found in the south (Fig 1A), one could assume that insensitivity is the ancestral state. However, only two out of ten tested populations are insensitive to tidal turbulence, and the most southern population, Vigo-2NM, is sensitive to this cue (Figs 1A and S1). Our data also show that insensitivity has at least two different recessive genetic bases. The obvious difference in Ros-2FM and Jean-2NM in identified QTLs (Figs 2 and 4A–4D) and the positions of the associated SNPs (Figs 3 and 4E), corroborates the results from the complementation cross (Fig 1C) and leads to the conclusion that this trait indeed evolved independently in the two locations. Furthermore, sensitivity presumably involves a complex signalling pathway that is more difficult to evolve several times from insensitivity, than to "break it" with a few mutations. Taken together, this hints that the ancestral Clunio population was likely sensitive, and the insensitivity in certain populations may be considered a loss of trait. The adaptive value of this loss remains speculative (see below). But if this phenotype has adaptive value, we may have uncovered an example of recent convergent evolution in the process of local adaptation to different timing habitats.

Evolution of differential sensitivity to circalunar synchronizers

In agreement with Neumann [38], the northern populations (Por-1SL, Bria-1SL, He-1SL, and Ber-1SL) are very sensitive to tidal cues while southern ones are less sensitive or entirely insensitive (Jean-2NM, Plou-2NM, Ros-2NM, Ros-2FM, and Lou-2NM). He argued that moonlight is an ill-suited zeitgeber in the north due to the low position of the moon on the horizon [38]. However, that does not explain why tidal turbulence would be unreliable in the south. There is no obvious advantage for Ros-2FM or Jean-2NM to lose the sensitivity to this cue since the tides are as strong and predictable in those locations as in any other tested location.

We also observed that populations most sensitive to the tidal turbulence have a semi-lunar period in adult emergence, i.e. they emerge twice a lunar month, while less sensitive populations have a lunar period (except Vigo-2NM). Tidal turbulence is a semi-lunar zeitgeber as it comes from the tides, and it could therefore be a more appropriate cue for the populations emerging twice a month in contrast to those that are emerging once a month for which moonlight, as a monthly zeitgeber, might be a more suitable cue.

Furthermore, we discovered that the two sympatric populations in Roscoff are differentially sensitive: Ros-2NM is sensitive to tidal turbulence, and Ros-2FM is insensitive to tidal turbulence (S1M and S1O Fig). Although we tested the two zeitgebers moonlight and tidal turbulence separately in the laboratory, they are perceived together in the wild. Furthermore, as the timing of the tides changes along the Atlantic coast, their phase-relationship varies in different habitats [37]. Thus, if the two zeitgebers set the phase differently, losing sensitivity to one of them can be an evolutionary strategy to set the phase according to the most informative zeitgeber. In line with that, we find the same QTL locus harboring STAT-1 and CG022363 as one of the QTLs responsible for the phase-difference between Ros-2FM and Ros-2NM [53].

Genes responsible for the loss of sensitivity to tidal turbulence in Ros-2FM

Tidal turbulence is a vibration, perceived by the mechanosensory nervous system, and mechanosensory pathways are even in model organisms still largely unknown. Most molecular players were identified in genetic screens on phenotypes associated with defects in mechanosensory systems in Drosophila melanogaster, Caenorhabditis elegans, and Mus musculus [54]. Genes found in these analyses are not only directly involved in mechanosensation. They include ion channels, the tethering of the ion channels, extracellular matrix and cytoskeleton; but also indirectly the development of the sensory organs or the function and development of the cells downstream in the neuronal circuits [54]. Many complex phenotypes are polygenic in origin, which makes simple gene–function relationships hard to infer. Additionally, mutations in regulatory regions, rather than mutations in coding regions, are found to shape most emerging phenotypes [52]. Similarly, the loss of complex phenotypes has been shown to be driven by divergence in cis-modulatory elements of developmental genes in the loss of limbs in snakes and degeneration of eyes in subterranean mammals [55]. Interestingly, here we also found that the majority of potentially causal SNPs hit intergenic regions (S11C Fig and S6 Table), suggesting they are rather affecting regulatory elements. Therefore, we investigated both, the region of the genome with the highest association score (Fig 3C), but also other potential candidate genes that showed up in the association analysis with BayPass (S12S15 Figs and S6 and S7 Tables), irrespective of whether there was a signature of selection at these loci.

Gravitaxis: Potential role of chordotonal organs

We found CG022363 to fall into the region with the highest association score (Fig 3C). This gene is an ortholog of the Drosophila CG32100 gene, which has a role in gravitaxis although the exact molecular function of this gene remains unknown [49]. Graviception is a function of the mechanosensory system, and as is the case with all mechanosensory functions, it is poorly understood on a molecular level. To this day, most of the molecular machinery was identified through genetic screens of behaviors associated with impaired gravitaxis [49]. As a result, 27 genes were associated with gravity-sensing in Drosophila. Some detect gravity directly, namely inactive, nanchung, painless, and pyrexia [56]. But the majority seem to have an indirect role most likely in the development of the sensory organs: alan shepard, escargot, broad, cryptochrome, nemo and others [49,56]. Strikingly, out of those 27 genes, we found 6 that were associated with loss of sensitivity to tidal turbulence in Ros-2FM (S7 Table): shep, snaill1_CG000103, broad, cryptochrome1, nemo, and the above-mentioned CG022363. In line with that, gravitaxis was found as one of the top GO terms (S15 Fig). Three of the six belong to the 15 candidate genes under the QTL regions: shep, snaill1_CG000103, and CG022363 (S12 Fig). In Drosophila shep is involved in neuronal development and remodeling of the sensory neurons [57,58] and escargot has a role in neurogenesis [59]. Therefore, it is likely that they are indirectly involved in gravitaxis in Drosophila by contributing to the development of the sensory organs responsible for detecting gravity. Drosophila larvae detect both vibration and gravity via chordotonal organs [60,61]. In addition, chordotonal organs are necessary for the mechanosensory entrainment of the circadian clock in Drosophila adults [14]. Taken together, it is possible that chordotonal organs are responsible for mechanosensory entrainment of the circalunar clock in Clunio as well. Mutations in genes responsible for the development of the chordotonal organs could lead to impaired gravity sensing as well as detection of vibration and thus impair mechanosensory entrainment of the circalunar clock in Ros-2FM.

STAT-1 locus

Together with CG022363, STAT1 falls into the region with the highest association score (Fig 3C). Signal Transducer and Activator of Transcription (STAT) protein is a part of the evolutionary conserved JAK-STAT pathway that controls developmental decisions and participates in the immune response [62]. Archetypical members of each of the components were present at the time of the emergence of Bilateria: JAK, STAT, SHP, and the three SOCS proteins [63]. STAT proteins were duplicated many times throughout metazoan evolution, and while some pseudogenized, many evolved into novel genes through rapid sequence diversification and neofunctionalization [62]. Insect STATs form a single clade in phylogenetic analyses and constitute an ancient class of STATs together with mammalian STAT5 and 6 [62]. While most insect species like Drosophila melanogaster and Apis mellifera have a single STAT whose function remains conserved [62], in others like Anopheles gambiae STAT duplicated and the new gene acquired diverse functions. Duplicated Anopheles STAT has a role in defense against bacteria [64], Plasmodium infection [65], and innate immunity [66]. In addition, duplicated STAT acts as an upstream regulator of the evolutionarily conserved STAT protein [65].

In contrast, in vertebrates, all components of the JAK-STAT pathway duplicated several times and STAT proteins attained specialized functions in various cells. Interestingly, the expression of the TrpA1 mechanosensitive channel is regulated via the JAK-STAT pathway in nociceptive neurons in mice [67]. Similarly, STAT3 is necessary for the differentiation and regeneration of inner ear hair cells, the basic mechanosensory receptors for hearing and balance in zebrafish [68]. Finally, the JAK-STAT pathway is directly coupled to the mechano-gated channels in various non-neuronal cells, regulating gene expression downstream of the channel activation [6974].

Clunio marinus has two STAT proteins: CG012971 (STAT-1), the ortholog of Anopheles gamibiae STAT2, and Mouse STAT5a,5b and 6; and CG022905 (STAT-2) (Fig 3D). Two Clunio STATs are 83% identical in amino-acid sequence (Fig 3E), while Anopheles STATs share only 47% overall sequence identity [62]. Two Clunio STATs differ the most in the N-terminal domain which has a role in nuclear translocation and protein-protein interactions, and the coiled domain which is involved in nuclear export and regulation of tyrosine phosphorylation [63]. This indicates that the two STATs could be regulated differently or be a part of different signalling pathways by interacting with different proteins and thus obtaining different roles.

Taken together, we can speculate that Clunio STAT-1 has a role in the perception of tidal turbulence by being involved in the development or differentiation of mechanosensory organs, or mechanosensory receptors appropriated this JAK-STAT pathway for regulation of gene expression. Further functional analysis is necessary to test this hypothesis. If proven, this would be the first evidence of a STAT role in mechanosensation in invertebrates.

Modulation of the circadian clock is responsible for the lack of mechanosensory entrainment of the circalunar clock?

The mechanosensory cue comes from the tides, and as such it consists of a 6h 10min stimulus followed by a 6h 15min break. Its onset shifts every day by 50min. As a consequence, the tidal pattern of mechanical disturbance occurs at the same time of day every 15 days, i.e. after half a lunar cycle. Thus, lunar phase can be detected by setting a daily window of mechanoreceptor sensitivity [38], which will only overlap with the presence of mechanical cues during specific lunar phases. Such a daily sensitivity window would be governed by a circadian clock. Interestingly, we found several circadian clock genes in our genome-wide screen: cry1, per and clk (S6 Table). It is tempting to speculate that modulation in the circadian clock might have an effect on the lack of mechanosensory entrainment in the Ros-2FM population. In this scenario, loss of mechanosensory entrainment would not be due to a malfunction in the sensory pathways, but due to a misinterpretation of the cue in a circadian context.

Here we show for the first time a convergent loss of sensitivity to tidal turbulence in two Clunio populations. We found several loci to be responsible for this loss. A detailed analysis suggests that in one of the populations the JAK/STAT pathway and gravitaxis may play a prominent role in the detection of tidal turbulence. While in Baltic and Northern European populations complete lunar arrhythmicity seems to be a highly polygenic trait [75], the selective loss of sensitivity to a zeitgeber seems to have a less complex, oligogenic basis. If in the future tools for molecular manipulation of Clunio are developed, this setting is a good starting point to identify novel genes and pathways involved in mechanosensation.

Methods

Clunio cultures

C. marinus cultures were established from different locations (S1 Table) and maintained in the laboratory according to Neumann [27]. Around 1000 larvae were kept in 20x20x5 cm plastic boxes with sand from the natural habitat and 15‰ seawater. They were fed twice a week with diatoms (Phaeodactylum tricornutum, strain UTEX 646). Nettle powder was added twice a month with each water exchange. Clunio larvae were raised under a 16h light and 8h darkness regime and a temperature of 18°C. In experiments with moonlight entrainment, the artificial moonlight was simulated with neutral white LED ~4000K light (Hera 610 014 911 01) on 4 consecutive nights every 30 days. The 24-hour period when moonlight was first applied was marked as day 1. In the experiments with mechanosensory entrainment, cycles of vibration were used to simulate tidal turbulence in a setup established by Neumann [31]. Briefly, an electromotor generating vibration of 50 Hz, 30dBa above background noise was attached to the shelves with Clunio cultures and controlled by a custom-made “tidal clock”. The clock kept the motor on for 6h 10 min and off for 6h 15min which gave a 12.4-hour tidal rhythm. The onset of vibration shifted every day by 50 min which resulted in a 144-day semi-lunar cycle. The day when vibration started in the middle of their subjective night was arbitrarily marked as day 1. Phenotypes were recorded by collecting emerged adults from three culture boxes per strain every day for at least 60 days or two lunar cycles.

Crossing experiments

To explore the genetic basis of insensitivity to tidal turbulence, we crossed the insensitive Ros-2FM or Jean-2NM strains with the sensitive Por-1SL strain, as well as the two insensitive strains with each other (S2 Table). Detailed description can be found in the S1 Methods.

QTL mapping

QTL mapping was performed to identify genetic regions harboring genes where natural variants that underlie the loss of sensitivity to tidal turbulence are segregating. Two families from the Ros-2FM x Por-1SL cross were chosen for QTL mapping: (Ros-2FMxPor-1SL)x(Ros-2FMxPor-1SL)-F2.1 in the further text referred to as RxP-F2.1, and (Ros-2FMxPor-1SL)xRos-2FM-BC.1 in the further text referred to as RPxR-BC.1 (S2 Table). Similarly, two intercrosses of Jean-2NM x Por-1SL families were selected: (Jean-2NMxPor-1SL)x(Jean-2NMxPor-1SL)-F2.1.6 and (Jean-2NMxPor-1SL)x(Jean-2NMxPor-1SL)-F2.2.3in the further text referred to as JxP-F2.1.6 and JxP-F2.2.3.

Phenotyping (S3 Fig)

Emergence data was collected for parental, F1 and F2, and BC generations, and lunar emergence days under turbulence entrainment were assigned as described above (S1 Script and Table 1 and S1 and S2 Figs). To resolve the problem of the overlapping “sensitive” and “insensitive” phenotypes emerging during the peak in the F2 and BC progeny, we designed a pipeline to calculate the probability of finding “sensitive” and “insensitive” individuals on each day. For more details see S1 Methods.

Genotyping

DNA was extracted from adults collected in crossing experiments with the salting-out method [76], it was aplified using RepliG, and single-digest or double-digest RAD sequencing was performed [7779]. A detailed protocol can be found in the S1 Methods. The script containing read processing, mapping, genotype calling and filtering for informative variants is given in S2 Script.

Read processing and mapping

For discriminating individuals, P1 and P2 adaptors contained unique barcode sequences (S10 Table). Raw reads were trimmed to remove adapters and low-quality bases with Trimmomatic v0.38 [80]. Trimmomatic parameters used for paired-end reads were ILLUMINACLIP:<PE_adapter_file>:2:30:10:2:true LEADING:20 TRAILING:20 MINLEN:50 and for single-end reads ILLUMINACLIP:<SE_adapter_file>:2:30:10 LEADING:20 TRAILING:20 MINLEN:50. For paired-end library RxP-F2.1, overlapping read pairs were assembled into single reads with PEAR v.0.9.10 [81] using default parameters. Paired (PEAR unassembled) and single reads (PEAR assembled and unpaired reads from R1 or R2 after adapter trimming) were mapped independently with NextGenMap v0.5.5 [82] to the CLUMA2.0 reference genome (available at https://doi.org/10.17617/3.42NMN2) with default parameters except for—min-identity 0.9 and—min-residues 0.9. Read groups were specified during mapping using—rg-id and—rg-sm. The independently mapped reads were then merged into a single file using samtools v1.9 [83] merge, with parameters -u -c -p. For single-end libraries, trimmed reads were directly mapped with NextGenMap with previously mentioned parameters. Mapped reads were sorted and indexed with samtools sort and samtools index respectively.

Variant calling

Single nucleotide polymorphisms (SNPs) and insertion-deletion (indel) genotypes were called using GATK v3.7-0-gcfedb67 [84]. Steps include initial genotype calling using GATK HaplotypeCaller with parameters—emitRefConfidence GVCF and -stand_call_conf 30, filtering of variants using GATK SelectVariants with ’-select DP > 30.0’, recalibration of base qualities using GATK BaseRecalibrator with ‘-knownSites’, preparing recalibrated BAM files with GATK PrintReads using -BQSR and finally, recalling of genotypes using GATK HaplotypeCaller with previously mentioned parameters. Individual VCF files were combined into a single file using GATK GenotypeGVCFs.

Informative variants and genotype matrix

VCF files were filtered for minimum genotype quality (minGQ) 20, minor allele frequency (maf) 0.10, and maximum fraction of samples having missing genotypes (max-missing) 0.60. Genotypes were coded as ‘AA’, ‘AB’, or ‘BB’ based on the inferred inheritance pattern (S11 Table). To maximize the number of informative markers in a backcross, we included markers for which both parents were heterozygous or, the F1 parent was heterozygous and the Ros-2FM parent was homozygous (S11 Table). To infer from which parent the ‘A’ or ‘B’ allele comes from at ambiguous loci, we chose genotypes based on the consistency of the genotypes along the chromosome (i.e. the assignment that had a smaller number of genotype switches across the individuals in the BC progeny) (S11 Table, consistency genotype assignment in S2 Script). The final genotype matrix was manually inspected for each mapping family, before importing it into R/qtl with read.cross function, within which the flag map.function = "kosambi" was used to generate the linkage map. We used „kosambi”function as it does not assume that the markers are normally distributed. Marker order and genotype errors were further investigated in R/qtl following the best practices suggested by the authors [85]. One inversion was identified in the right arm of the second chromosome in the RxP-F2.1 family and the order of markers was inverted in that region. The final number of markers was 117 for RPxR-BC.1 and 137 for the RxP-F2.1 family. The final genotype matrixes are given in S3 Table.

Samples from parents’ and F1s of the two Jean-2NMxPor-1SL families, unfortunately, had very few good genotypes. Thus, we designed an alternative approach for reconstructing the recombination matrix. For details see S1 Methods.

QTL mapping

Standard interval mapping and multiple QTL mapping were done with R/qtl package functions: scanone, scantwo, and fitqtl [85]. QTL intervals were estimated with bayesint function. Composite interval mapping was analyzed in Windows QTL Cartographer Version 2.5_011 (number of covariates 5, window 10 cM) [86]. In addition, to confirm the model found by fitqtl multiple QTL mapping, we used the Bayesian QTL mapping R package “qtlbim” [87]. Function qb.best was used to identify the best model, and qb.scanone to compare additive and epistatic QTLs found by R/qtl.

Expectation-maximization (EM) algorithm (S5 Fig)

To explore the effect of uncertainty in phenotyping on the QTL mapping results, we devised an EM algorithm to assign binary phenotypes to the entire dataset (S5 Fig). For details see S1 Methods. The script can be found in S3 Script.

Association analysis

The complementation cross indicated that the genetic basis for the loss of sensitivity is different in the two populations (Fig 1). Therefore, the two insensitive populations were analyzed separately. To identify variants associated with the loss of sensitivity to tidal turbulence in Ros-2FM we performed a genome screen on 746.887 SNPs and small indels called in 210 males from nine populations differentially sensitive to tidal turbulence (S4 Table). Similarly, to find potentially causative mutations in Jean-2NM, we used a dataset of 769.379 SNPs and indels from 210 males from Jean-2NM and the same eight populations sensitive to tidal turbulence.

Genotyping

DNA from field-caught males stored in 100% ethanol was extracted using the salting-out method [76]. Genomic DNA was amplified with standard RepliG protocol (REPLI-g Mini Kit QIAgen 150025). Whole genomes of 20–24 adults from nine populations were individually sequenced on Illumina HiSeq3000 with paired-end 150-bp reads and average coverage of 20x (S4 Table).

Read processing

Reads from several sequencing runs were merged with the cat function. Adapters were trimmed using Trimmomatic tool [80] and the following parameters: ILLUMINACLIP <Adapter file>: 2:30:10:8:true, LEADING:20, TRAILING:20, MINLEN:75. Overlapping read pairs were assembled using PEAR with the following parameters: -n 75 -c 20 -k [81]. Reads were mapped using bwa mem version0.7.15-r1140 [88] using the latest Cluma_2.0 reference genome (available at https://doi.org/10.17617/3.42NMN2). The independently mapped reads were then merged into a single file, filtered for -q 20, and sorted using samtools v1.9 [83].

Variant calling

SNPs and small indels were called using GATK v3.7-0-gcfedb67 [84]. All reads in the q20 sorted file were assigned to a single new read-group with the ‘AddOrReplaceReadGroups’ script with LB = whatever PL = illumina PU = whatever parameters. Genotype calling was then performed with HaplotypeCaller and parameters—emitRefConfidence GVCF -stand_call_conf 30, recalibration of base qualities using GATK BaseRecalibrator with ‘-knownSites’. Preparing recalibrated BAM files with GATK PrintReads using -BQSR. Recalling of genotypes using GATK HaplotypeCaller with previously mentioned parameters. Individual VCF files were combined into a single file using GATK GenotypeGVCFs.

BayPass genotype matrix

The genotype matrix for BayPass association analysis [89] was generated by filtering for minor allele frequencies larger than 0.05, the maximal number of missing values per variant was set to 20%, the maximal number of alleles was 2, and minimal read quality minQ was set to 20 with VCFtools (0.1.14) [90]. Allele count per population was calculated using the VcfR package [91]. Briefly, a previously filtered vcf table containing 24 individuals from 9 populations was separated into vcf files containing individuals from distinct populations. Individual vcf files were read with read.vcfR function and allele frequency per population per site was calculated using the gt.to.popsum function. Population allele frequencies were then combined into a genotype matrix.

Phenotyping

Sensitivity to turbulence was estimated for each population using summary circular statistics (see methods section QTL mapping/Phenotyping). Vector length was used as a phenotypic score standardized using -scalecov flag so it had a mean of 0 and a standard deviation of 1, (S5 Table).

BayPass

BayPass was run with 3 random seeds (1, 1988, 11273), and the median of BFis, eBPis, XtXst, and -log10 p-value of XtX was calculated. To find the correct significance threshold for XtX statistics, pseudo-observed data set (POD) was generated by sampling 100.000 SNPs with R function simulate.baypass and found that 1% of XtXst POD values was 21.67 in Ros-2FM dataset, and 20.02 in Jean-2NM. To subset highly associated variants in Ros-2FM, we filtered for BFis > = 20, eBPis > = 2 and XtXst > = 21.67 (S11A Fig) and BFis > = 20, eBPis > = 2 and XtXst > = 20.02 in Jean-2NM (S19C Fig). Association analysis in BayPass is corrected for the population structure based on a kinship matrix Ω.

SNPeff

SNP effects were analyzed in CLUMA2.0_M, a version of the reference genome that contains manual curations to the reference sequence made during genome annotation (available at https://doi.org/10.17617/3.42NMN2). SNPs were transferred from CLUMA2.0 to CLUMA2.0_M using a Python3 script (S4 Script), which creates a map of positions from CLUMA2.0 to CLUMA2.0_M by accounting for insertions and deletions. As input, the script uses a GFF file with manual reference edits, exported from Web Apollo version 2.5.0 [92]. With the CLUMA2.0_M reference sequence, the location and putative effects of the SNPs and indels relative to CLUMA2.0_M gene models were annotated using SnpEff 4.5 (build 2020-04- 15 22:26, non-default parameter `-ud 0’) [93]. The complete list with the number of variants with distinct effects is given in S6 and S9 Tables.

Phylogenetic trees

The identity of the 15 candidate genes was explored by the reciprocal blast between Clunio and Drosophila melanogaster protein sequences. eggNOG 5.0 database was then used to identify orthologs in other model organisms: Anopheles gambiae, Mus musculus, Homo sapiens, and Caenorhabditis elegans [94]. The most distant protein sequence in eggNOG phylogenetic trees was taken as an outgroup sequence. Protein sequences were then aligned, and phylogenetic trees were created in QIAGEN CLC Main Workbench version 7.9.3. Bootstrap values in 1000 runs were reported (Figs 3D, 3E, and S13S14).

Selective sweep analysis

To investigate if the associated loci evolved as a result of a recent selective sweep in the process of local adaptation, we calculated cross-population nSL (number of segregating sites by length) developed by [95]. XP-nSL is designed to detect selective sweeps due to local adaptation within a query population by comparing its integrated haplotype homozygosity (iHH) with one of a reference population. Here, positive scores suggest long haplotypes in population A with respect to population B and a potential sweep in A, whereas negative scores suggest long haplotypes in B with respect to A. nSL, in contrast to EHH, was developed to accommodate the lack of genetic maps in favor of physical maps [47]. We used selscan 2.0 as it was recently revised to work with unphased multi-locus genotypes [48,95]. Details can be found in the S1 Methods.

Genetic differentiation (fst)

To provide a bridge between the association analysis conducted on ten populations, and the cross-population selective sweep analysis calculated between the two populations (see Methods section on association analysis and selective sweeps), we estimated genetic differentiation between those two contrasted populations: Ros-2FM compared to Ros-2NM and Jean-2NM compared to Vigo-2NM. The same vcf files containing GATK-called SNPs and indels used for selective sweep analysis were used (see Methods section on selective sweeps). Genetic differentiation between the two populations (fst) was estimated using vcftools version 0.1.14 [90] parameters—weir-fst-pop—fst-window-size 1—fst-window-step 1.

GO term enrichment

To investigate if the genes identified by BayPass and SNPeff perform some of the known biological functions, we ran Gene Ontology (GO) term enrichment. We previously annotated 5,393 out of 15,193 C. marinus genes with GO terms [96]. In our current reference genome CLUMA2.0, 5436 out of 13751 genes were annotated with GO terms. In brief, GO terms were annotated using the longest protein sequence per gene with mapper-2.0.1.[97] from the eggNOG 5.0 database [94], using DIAMOND [98], BLASTP e-value <1e-10, and subject-query alignment coverage of >60%. Only GO terms with “non-electronic” GO evidence from best-hit orthologs restricted to automatically adjusted per-query taxonomic scope were used. To assess the enrichment of “Biological Process” GO terms, the weight01 Fisher’s exact test was implemented in topGO (version 2.42.0, R version 4.0.3) [99].

Supporting information

S1 Fig.

Clunio populations are differentially sensitive to moonlight or tidal turbulence (A) Origin of the Clunio strains. Strains are color-coded and their names are depicted in the body of the graph. (B-U) Graphs show the fraction of emerged individuals entrained under laboratory conditions by either artificial moonlight (four nights of light every 30 days) or tidal turbulence (vibration of ~50 Hz 30dB above background noise in 6h 10min ON– 6h 15min OFF intervals resulting in a 15-day pattern). The total number of individuals, exact names of geographical locations, and the year when strains were established are given in S1 Table. Strains differ in the period, phase of emergence, and sensitivity to the synchronizers. The emergence of strains considered insensitive to moonlight or tidal turbulence is marked in black. Map data was obtained from https://www.naturalearthdata.com/downloads/50m-physical-vectors/.

https://doi.org/10.1371/journal.pgen.1010763.s001

(PDF)

S2 Fig. Sensitivity to tidal turbulence is genetically determined and a dominant trait.

Crossing experiments were performed to assess the inheritance of sensitivity to tidal turbulence. Graphs show fractions of emerged adults per day in parental populations, F1 and F2 (F1xF1) generations. (A) Intercross between Por-1SL and Ros-2FM. (B) Intercross between Por-1SL and Jean-2NM. The total number of individuals per generation is listed in S2 Table. The color of the bars represents increasing levels of sensitivity to tidal turbulence from sensitive (light gray) to insensitive (black).

https://doi.org/10.1371/journal.pgen.1010763.s002

(PDF)

S3 Fig. Schematic overview of the QTL mapping strategy.

Left: Phenotyping strategy. Light and vibration were logged throughout the experiment and used to calculate the first day of the entrainment as the day when vibration starts in the middle of their subjective night. Emerged adults were collected and the number of emerged individuals per day was recorded. Circular summary statistics were used to test if there was a phase-shift in emergence rhythms between parental, F1, F2, and BC generations. If there was a phase shift, emergence days were corrected so that the only phenotype assessed is rhythmicity. To generalize the emergence distributions, kernel density estimates were calculated (bandwidth = 10) for each generation. The probability of finding sensitive and insensitive individuals on each experimental day in F2 or BC progenies was calculated according to the given equations. The probability of finding an insensitive individual was used as a phenotypic score. In addition, a reduced dataset was generated by removing individuals with uncertain phenotypes between 0.3 and 0.7. Remaining individuals with probability phenotypes > 0.7 or < 0.3 were given binary phenotypes 1 and 0 respectively. Right: QTL mapping strategy. Several mapping pipelines were tested to examine additive and epistatic QTLs. Full and reduced datasets of each crossing family were analyzed using interval mapping (scanone and scantwo), composite interval mapping (WinQTL cartographer), and multiple QTL mapping (fitqtl and qtlbim). See methods section QTL mapping.

https://doi.org/10.1371/journal.pgen.1010763.s003

(PDF)

S4 Fig. Calculating probability phenotypes for RxP-F2.1 and RPxR-BC.1 family.

(A-F) The fraction of emerged adults per generation is shown on a circular plot together with the mean and median vectors. (A) Por-1SL strain. (B) Ros-2FM strain. (C) RxP-F1.1 and RxP-F2.1 generation. (D) RxP-F1.2 three crossing families were raised together (gave rise to RPxR-BC.1). (E) RxP-F2.1 is a F1-24 x F1-24 intercross. (F) RPxR-BC.1 is a backcross of an F1.2 individual to Ros-2FM. (G-H) Kernel density estimates for parental, F1, and F2/BC generations for each of the two mapping families. RxP-F2.1 crossing family shows a 2-day phase shift as compared to Por-1SL. RPxR-BC.1 crossing family does not show considerable phase-shift. (I-J) Bar graphs show probabilities of finding sensitive Por-1SL-like (white) or insensitive Ros-2FM-like (black) individuals on each day in the two crossing families RxP-F2.1 (I) and RPxR-BC.1 (J).

https://doi.org/10.1371/journal.pgen.1010763.s004

(PDF)

S5 Fig. An expectation-maximization (EM) algorithm.

An EM algorithm was designed to generate optimized binary phenotype panels to a crossing family given the calculated probability of finding insensitive individuals on each experimental day. For a detailed explanation see the S1 Methods/QTL mapping/EM Expectation-maximization (EM) algorithm paragraph.

https://doi.org/10.1371/journal.pgen.1010763.s005

(PDF)

S6 Fig. Ros-2FMxPor-1SL QTL mapping full analysis.

Complete QTL mapping results for the two crossing families (RxP-F2.1 and RPxR-BC.1) and two datasets each (F = full and R = reduced) are given. (A-D) Bar graphs show the number of emerged individuals per day. The predicted ratio of insensitive (black) and sensitive (white) individuals is plotted. (E-H) LOD scores of interval mapping analysis (scanone) are designed to detect additive QTLs. Probability phenotypes were used for full datasets and binary phenotypes for reduced datasets. The significance threshold (dashed line) was estimated in 1000 permutations with a 5% cutoff. (I-L) CIM analysis with backward regression method, 5 control markers, and a window size of 10 cM. Threshold values are given in S3 Table. (M-P): LOD scores of scanone on EM-optimized binary phenotypes. Results are shown for panels obtained in at least 5% of the cases in 1000 runs (S3 Table). The significance threshold (dashed line) was estimated in 1000 permutations with a 5% cutoff. (Q-T) LOD scores of significant QTLs in multiple QTL mapping pipeline (fitqtl). Black lines: additive QTLs, gray lines: QTLs in epistasis. p-value of F statistic is marked: * p-value < 0.05; ** p-value of <0.01. Fitqlt statistics are given in S3 Table. (U-AB) To find the best model for multiple QTL mapping (fitqtl) we also tried the qtlbim package. (U-X) We first ran LPD (Log Posterior Density) scan that uses Bayesian model averaging to explore the most probable models. Most likely additive QTLs are shown in black and most likely QTLs in epistasis are shown in gray. (Y-AB) The “best” function of the qtlbim package then selects the most probable model. The larger the font size the larger posterior probability the pattern has. The 2-D multidimensional scaling (MDS) projection is based on the square of the attenuation. If the loci agree exactly, there is no attenuation. The best model is marked in red. The numbers represent chromosomes. (U) The best model for the RPxR-BC.1 full dataset contains 3 QTLs and one epistatic interaction on chromosomes 1, 2, 3, and 1:3. (V) The best model for the RPxR-BC.1 reduced dataset is 1, 2. (W) The best model for the RxP-F2.1 full dataset is 2, 3. (X) The best model for RxP-F2.1 reduced dataset is 1, 2.

https://doi.org/10.1371/journal.pgen.1010763.s006

(PDF)

S7 Fig. Ros-2FMxPor-1SL QTL mapping: epistatic interactions.

To scan for QTLs in epistasis, we used the scantwo function (rqtl package). Datasets are labeled (left gray; F = Full and R = Reduced) and correspond to S6 and S8 Figs. (A, C, E, G): Scantwo heatmaps for three chromosomes show LODf in the lower right corner that measures the improvement in the fit of the full two-locus model over the null model and indicates the evidence for at least one QTL with allowance for interaction. LODi heatmap is plotted in the upper left corner and measures the improvement in the fit of the full model over that of the additive model, and so indicates evidence for an interaction. Significant QTL epistatic interaction is marked by a red circle. (B, D, F): The epistatic effect for each of the significant interactions on the left is shown. The marker that shows the strongest interaction on each chromosome was selected and its location is marked in gray letters.

https://doi.org/10.1371/journal.pgen.1010763.s007

(PDF)

S8 Fig. Ros-2FMxPor-1SL QTL mapping: intervals and phenotype scores.

Plotted are QTL intervals and phenotype panels for the QTL analysis (S6 Fig). (A, C) QTL intervals: composite interval mapping–orange, scanone–green, fitqtl: additive–black, fitqtl: epistatic–gray, EM-algorithm–blue (see all LOD score profiles in S6 Fig and the exact coordinates of the markers, recombination events and QTL intervals in S3 Table). (B, D): Phenotype panels for the corresponding QTL analysis are on the left. The probability of being sensitive (white) or insensitive (black) is shown for each individual. Yellow boxes indicate the individuals that were excluded in the “reduced” dataset due to the probability phenotype between 0.3 and 0.7 (see S1 Methods QTL mapping section and S3 Fig). Numbers on the right indicate for each EM panel how many out of 1000 runs that panel was found, and the fraction of individuals in each panel which had an error > 0.10 from the original data (see S1 Methods QTL mapping/EM-pipeline, S3 Table. The green marks the panel with the highest convergence and lowest error.

https://doi.org/10.1371/journal.pgen.1010763.s008

(PDF)

S9 Fig. Testing EM-scanone pipeline by using sex as a known binary phenotype.

We set out to test how well the EM-scanone pipeline identifies binary phenotypic panels from starting probabilities with varying degrees of certainty. We used sex as a known binary phenotype and replaced “0” and “1” to simulate different probability scenarios. A: The insensitivity probability phenotypes in 4 mapping families have varying degrees of certainty. To illustrate their distribution we used the local regression fitting “loess method” of R package ggplot2. B-C: We used sex phenotype and genotype matrix of the RxP_F2.1 mapping family, simulated different probability scenarios and ran EM-scanone pipeline. B left: In models A, B and C, a portion of binary phenotypes was kept fixed: 75%, 50%, and 25% respectively. The remaining 25–75% individuals were given a probability phenotype drawn from the linear distribution: individuals with phenotypes 1 were given a score between 0.99 and 0.5, and those with phenotype 0, values between 0.01 and 0.5. In models D-G there were no certain phenotypes. Individuals with phenotype 1 were given scores between 0.99–0.5 (D), 0.87–0.5 (E), 0.75–0.5 (F); while all individuals that originally had a phenotype 0 gained a score between 0.5–0.01 (D), 0.5–0.125 (E), 0.5–0.25 (F), 0.5–0.375 (G). B right: To model a more realistic distribution, we tested the logistic function y = 1/(1+exp(k*(x-0.5))) using k values of 0.2, 0.3, 0.5, 1, 2 and 5. C: Plotted are the QTL intervals and peak positions of the phenotypic panels found in more than 5% of the 1000 runs. The percentage of convergence and the difference from the true sex phenotype are depicted on the right. In model E 58.8% of the runs failed because the pipeline could not identify a panel that gives a higher scanone LOD score from the starting one. This also indicates that this scenario has too much uncertainty for the EM-pipeline to produce credible results and it is a good additional sanity check. The scenarios closest to our insensitivity phenotypes (panel A) are the linear models B-D and logistic models LOG0.3 and LOG0.5.

https://doi.org/10.1371/journal.pgen.1010763.s009

(PDF)

S10 Fig. Sex locus in Ros-2FMxPor-1SL and Jean-2NMxPor-1SL.

To complement the S9 Fig. and validate the resolved Jean-2NMxPor-1SL genotype matrix (see S1 Methods: QTL mapping: Informative variants and genotype matrix in Jean-2NM x Por-1SL), we identified the sex locus in the four mapping families: (A)RPxR-BC.1, (B) (RxP-F2.1, (C) JxP JxP-F2.1.6, (D) JxP-F2.2 .3. The QTL intervals are listed in S10 Table.

https://doi.org/10.1371/journal.pgen.1010763.s010

(PDF)

S11 Fig. BayPass analysis for Ros-2FM dataset.

(A) The Bayesian factor (BFis) showing the strength of the association is plotted against the differentiation measure (XtXst) for all polymorphic variants analyzed by BayPass. 357 significantly associated SNPs and indels (BFis > 20, eBPis > 2, XtXst > 21.67) are depicted in red. (B) Kinship matrix Ω is given as a heatmap showing reconstructed relationships between nine tested populations. (C-D) The effect of 357 associated variants was analyzed by SNPeff. (C) The effects of the variants on the surrounding genes are depicted in a pie chart. (D) The estimated impact of 357 associated variants is represented in a pie chart.

https://doi.org/10.1371/journal.pgen.1010763.s011

(PDF)

S12 Fig. SNPs associated with the loss of sensitivity to tidal turbulence in Ros-2FM.

QTL mapping and association mapping was performed to determine the most likely causative mutations underlying the phenotypic loss in Ros-2FM. Nine loci were identified (Fig 3). Although STAT1 locus was the most likely causal one (Fig 3C), we investigated all other polymorphisms underlying the QTLs on the second and third chromosomes. A-H panels show for each of these loci: gene affected by the associated SNPs (blue gene models), association score (BFis), genomic differentiation (Fst) between Ros-2FM and Ros-2NM, and selective-sweep analysis (normalized XP-nSL). Phylogenetic trees of all 15 candidate genes (blue gene models) can be seen in S13 and S14 Figs.

https://doi.org/10.1371/journal.pgen.1010763.s012

(PDF)

S13 Fig. Phylogenetic trees of all the potential candidate genes on the second chromosome.

Species are color-coded and represented by a pictogram next to the gene names: outgroup–gray, Caenorhabditis elegans–green, Drosophila melanogaster–orange, Anopheles gambiae–brown, Mus musculus–light blue, Homo sapiens–dark blue, Clunio marinus candidate gene–red, Clunio marinus other orthologs of the candidate gene–black. Bootstrap values are written above each node. The estimated distance is given below each tree. (A) Protein kinase regulatory subunits. (B) Heparan sulfate proteoglycan Perlecan / Terribly reduced optic lobe. (C) Alan shepard. (D) Signal transducer and transcription activator. (E) Unnamed gravitaxis gene. (F) Lon protease mitochondrial.

https://doi.org/10.1371/journal.pgen.1010763.s013

(PDF)

S14 Fig. Phylogenetic trees of all the potential candidate genes on the third chromosome.

Species are color-coded and represented by a pictogram next to the gene names: outgroup–gray, Caenorhabditis elegans–green, Drosophila melanogaster–orange, Anopheles gambiae–brown, Mus musculus–light blue, Homo sapiens–dark blue, Clunio marinus candidate gene–red, Clunio marinus other orthologs of the candidate gene–black. Bootstrap values are written above each node. The estimated distance is given below each tree. (A) Snail-like family of transcription factors. (B) O-acyltransferase family. (C) ATP-dependent RNA helicase me31b. (D) Membralin. (E) Globin family. (F) tRNA splicing endonuclease subunit 34 (Tsen34) (G) Unknown protein. (H) Chico, insulin receptor substrate.

https://doi.org/10.1371/journal.pgen.1010763.s014

(PDF)

S15 Fig. GO term enrichment of genes associated with the loss of sensitivity to tidal turbulence in Ros-2FM.

BayPass and SNPeff were used to identify 178 genes associated with the loss of sensitivity to tidal turbulence in Ros-2FM. (A) 51 genes that are driving 78 significant GO terms are depicted. Hierarchical clustering of genes and GO terms reveals major clusters of GO terms (color-coded) and listed in panel (B): yellow = Reproduction, orange = Development & Morphogenesis, brown = Protein & organelle localization, pink = Nervous system, purple = Sensory system, light green = Signaling, dark blue = Circadian, light blue = Metabolic process, dark green = Response, gray = Behavior. (C) Several GO terms related to sensory nervous system and potentially involved in mechanosensory entrainment are given in the table together with the corresponding genes. (D) Venn diagram is showing the number of genes that went into the GO term enrichment analysis.

https://doi.org/10.1371/journal.pgen.1010763.s015

(PDF)

S16 Fig. Calculating probability phenotypes for JxP-F2.1.6 and JxP-F2.2.3.

(A-F) The fraction of emerged adults and the mean (red) and median (blue) vectors are plotted. (A) Por-1SL strain. (B) Jean-2NM strain. (C-D) F1 progenies of the two mapping families (E-F) F2 progenies of the two mapping families. (G-H) Kernel density estimates for parental, F1, and F2 generations for each of the two mapping families. Both F2 progenies show a phase shift as compared to the parental Por-1SL strain. (I-J) Bar graphs depict probabilities of finding sensitive—Por-1SL-like (white) or insensitive—Jean-2NM-like (black) individuals on each day in the two crossing families JxP-F2.1.6 (I) JxP-F2.2 .3 (J).

https://doi.org/10.1371/journal.pgen.1010763.s016

(PDF)

S17 Fig. Jean-2NMxPor-1SL QTL mapping analysis.

Complete QTL mapping results for the two crossing families: JxP-F2.1.6 and JxP-F2.2.3. (A, H) Bar graphs show the number of emerged individuals per day. The predicted ratio of insensitive (black) and sensitive (white) individuals is plotted. (B, I) LOD scores of interval mapping analysis (scanone) are designed to detect additive QTLs. The significance threshold (dashed line) was estimated in 1000 permutations with a 5% cutoff. (C, J) CIM analysis with backward regression method, 5 control markers, and a window size of 10 cM. Threshold values are given in S8 Table. (D, K): LOD scores of scanone on EM-optimized binary phenotypes. Results are shown for panels obtained in at least 5% of the cases in 1000 runs (S8 Table). The significance threshold (dashed line) was estimated in 1000 permutations with a 5% cutoff. (E, L) LOD scores of significant QTLs in multiple QTL mapping pipeline (fitqtl). Black lines: additive QTLs, gray lines: QTLs in epistasis. p-value of F statistic is marked: * p-value < 0.05; ** p-value of <0.01. Fitqlt statistics are given in S8 Table. (F, M) Confidence intervals and for the full QTL analysis. QTL intervals: composite interval mapping–orange, scanone–green, fitqtl: additive–black, fitqtl: epistatic–gray, EM-algorithm–blue (exact coordinates of the markers in S8 Table. (G, N): Phenotype panels for the corresponding QTL analysis. The probability of being sensitive (white) or insensitive (black) is shown for each individual. Numbers on the right indicate for each EM panel how many out of 1000 runs that panel was found, and the fraction of individuals in each panel which had an error > 0.10 from the original data (see methods QTL mapping/EM-pipeline, S8 Table). The green marks the panel with the highest convergence and lowest error.

https://doi.org/10.1371/journal.pgen.1010763.s017

(PDF)

S18 Fig. Probability of finding sensitive and insensitive individuals in several Jean-2NMxPor-1SL intercross families.

(A-E left and middle) The fraction of emerged adults per generation is shown on a circular plot together with the mean and median vector. (A). Jean-2NM parental strain (B) Por-1SL parental strain. (B-E left) Independent F1 progenies of the four intercross families JxP-F2.1–4. (B-E middle) Combined emergence of several F2 families of four intercross families JxP-F2.1–4. Number of individuals is given in S2 Table. (B-E right) Bar graphs show probabilities of finding sensitive Por-1SL-like (white) or insensitive Jean-2NM-like (black) individuals on each day in the four intercross families.

https://doi.org/10.1371/journal.pgen.1010763.s018

(PDF)

S19 Fig. BayPass analysis for Jean-2NM dataset.

Association analysis was performed to find mutations associated with the loss of sensitivity to tidal turbulence in the Jean-2NM population. (A) Median vector length was used as a proxy for sensitivity to this cue (S1 and S5 Tables), (solid lines with arrows; values outside the circle). (B) Association analysis for median vector length with 769.379 SNPs and small indels. Bayesian factor (BFis) is plotted for each variant along the three chromosomes. (C) We found 173 significantly associated SNPs and indels (BFis > 20, eBPis > 2, XtXst > 20.02; see Methods section for details) marked in red. A list of effects and genes affected by mutations is given in S9 Table. (D) Kinship matrix Ω is given as a heatmap showing reconstructed relationships between nine tested populations. (E-F) The effect of 173 associated variants was analyzed by SNPeff. (E) The effects of the variants on the surrounding genes are depicted in a pie chart. (F) The estimated impact of 173 associated variants is represented in a pie chart.

https://doi.org/10.1371/journal.pgen.1010763.s019

(PDF)

S1 Table. The origin of Clunio populations and their entrainment to tidal turbulence.

https://doi.org/10.1371/journal.pgen.1010763.s020

(XLSX)

S4 Table. Whole genome sequencing details and sampling sites.

https://doi.org/10.1371/journal.pgen.1010763.s023

(XLSX)

S6 Table. BayPass results—Insensitivity to tidal turbulence in Ros-2FM.

https://doi.org/10.1371/journal.pgen.1010763.s025

(XLSX)

S7 Table. Insensitivity to tidal turbulence in Ros-2FM–Gravitaxis.

https://doi.org/10.1371/journal.pgen.1010763.s026

(XLSX)

S9 Table. BayPass results—Insensitivity to tidal turbulence in Jean-2NM.

https://doi.org/10.1371/journal.pgen.1010763.s028

(XLSX)

S10 Table. RAD sequencing primers and adapters.

https://doi.org/10.1371/journal.pgen.1010763.s029

(XLSX)

S11 Table. Informative variants for QTL mapping.

https://doi.org/10.1371/journal.pgen.1010763.s030

(XLSX)

Acknowledgments

Kerstin Schäfer assisted with RAD sequencing library preparations and Jürgen Reunert provided animal care. We thank the members of the research group Biological Clocks for their feedback for the entire duration of the project. We also thank Diethard Tautz, Guy Reeves, and Miriam Liedvogel for their feedback in the process of writing the manuscript.

References

  1. 1. Pittendrigh SC. Circadian Rhythms and the Circadian Organization of Living Systems. Cold Spring Harb Symp Quant Biol. 1960;25: 159–184. pmid:13736116
  2. 2. Dunlap JC, Loros JJ. Making Time: Conservation of Biological Clocks from Fungi to Animals. Microbiol Spectr. 2017;5. pmid:28527179
  3. 3. Wager-Smith K, Kay SA. Circadian rhythm genetics: from flies to mice to humans. Nat Genet. 2000;26: 23–27. pmid:10973243
  4. 4. Takahashi JS. Transcriptional architecture of the mammalian circadian clock. Nat Rev Genet. 2017;18: 164–179. pmid:27990019
  5. 5. Kaiser TS, Neumann J. Circalunar clocks—Old experiments for a new era. BioEssays. 2021;43: 2100074. pmid:34050958
  6. 6. Goto SG, Takekata H. Circatidal rhythm and the veiled clockwork. Curr Opin Insect Sci. 2015;7: 92–97. pmid:32846692
  7. 7. Andreatta G, Tessmar-Raible K. The Still Dark Side of the Moon: Molecular Mechanisms of Lunar-Controlled Rhythms and Clocks. Journal of Molecular Biology. Academic Press; 2020. pp. 3525–3546. pmid:32198116
  8. 8. Raible F, Takekata H, Tessmar-Raible K. An overview of monthly rhythms and clocks. Frontiers in Neurology. Frontiers Research Foundation; 2017. pmid:28553258
  9. 9. Zhang L, Hastings MH, Green EW, Tauber E, Sladek M, Webster SG, et al. Dissociation of circadian and circatidal timekeeping in the marine crustacean eurydice pulchra. Current Biology. 2013;23: 1863–1873. pmid:24076244
  10. 10. Zantke J, Ishikawa-Fujiwara T, Arboleda E, Lohs C, Schipany K, Hallay N, et al. Circadian and Circalunar Clock Interactions in a Marine Annelid. Cell Rep. 2013;5: 99–113. pmid:24075994
  11. 11. Wilcockson D, Zhang L. Circatidal clocks. Current Biology. 2008;18: R753–R755. pmid:18786379
  12. 12. (Ernest) Naylor E. Chronobiology of marine organisms. Cambridge University Press; 2010.
  13. 13. López-Olmeda JF, Madrid JA, Sánchez-Vázquez FJ. Light and temperature cycles as zeitgebers of zebrafish (Danio rerio) circadian activity rhythms. Chronobiol Int. 2006;23: 537–550. pmid:16753940
  14. 14. Simoni A, Wolfgang W, Topping MP, Kavlie RG, Stanewsky R, Albert JT. A mechanosensory pathway to the drosophila circadian clock. Science (1979). 2014;343: 525–528. pmid:24482478
  15. 15. Caldart CS, Carpaneto A, Golombek DA. Synchronization of circadian locomotor activity behavior in Caernorhabditis elegans: Interactions between light and temperature. J Photochem Photobiol B. 2020;211. pmid:32919174
  16. 16. Liu Y, Merrow M, Loros JJ, Dunlap JC. How temperature changes reset a circadian oscillator. Science. 1998;281: 825–829. pmid:9694654
  17. 17. Enright JT. Entrainment of a tidal rhythm. Science (1979). 1965;147: 864–867. pmid:17793561
  18. 18. Jones DA, Naylor E. The swimming rhythm of the sand beach isopod Eurydice pulchra. J Exp Mar Biol Ecol. 1970;4: 188–199.
  19. 19. Hastings MH. The entraining effect of turbulence on the circa-tidal activity rhythm and its semi-lunar modulation in Eurydice pulchra. Journal of the Marine Biological Association of the United Kingdom. 1981;61: 151–160.
  20. 20. Gibson RN. Factors affecting the rhythmic activity of Blennius pholis L.(Teleostei). Anim Behav. 1971;19: 336–343. pmid:5150479
  21. 21. Northcott SJ. A comparison of circatidal rhythmicity and entrainment by hydrostatic pressure cycles in the rock goby, Gobius paganellus L. and the shanny, Lipophrys pholis (L.). Journal ofFisll Biology. 1991.
  22. 22. Williams BG, Naylor E. Synchronization of the locomotor tidal rhythm of Carcinus. Journal of Experimental Biology. 1969;51: 715–725.
  23. 23. Holmstrom WF, Morgan E. Laboratory entrainaient of the rhythmic swimming activity of Corophium volutator (Pallas) to cycles of temperature and periodic inundation. J mar biol Ass UK. 1983;63: 861–870.
  24. 24. Taylor AC, Naylor E. Entrainment of the locomotor rhythm of Carcinus by cycles of salinity change. journal of the Marine Biological Association of the United Kingdom. 1977;57: 273–277.
  25. 25. Hauenschild C. Lunar Periodicity. Cold Spring Harb Symp Quant Biol. 1960;25: 491–497. pmid:13712278
  26. 26. Bunning E, Müller D. Wie messen Organismen lunare Zyklen? Zeitschrift für Naturforschung B. 1961;16: 391–395.
  27. 27. Neumann D. Die Lunare und Tägliche Schlüpfperiodik Der Mücke Clunio Steuerung und Abstimmung auf Die Gezeitenperiodik. Zeitschrift fiir vergleichende Physiologie. 1966;53: 1–66.
  28. 28. Saigusa M. Entrainment of a semilunar rhythm by a simulated moonlight cycle in the terrestrial crab, Sesarma haematocheir. Oecologia 1980 46:1. 1980;46: 38–44. pmid:28310623
  29. 29. Franke H-D. On a clocklike mechanism timing lunar-rhythmic reproduction inTyposyllis prolifera (Polychaeta). Journal of Comparative Physiology A 1985 156:4. 1985;156: 553–561.
  30. 30. Reid DG, Naylor E. Free-running, endogenous semilunar rhythmicity in a marine isopod crustacean. Journal of the Marine Biological Association of the United Kingdom. 1985;65: 85–91.
  31. 31. Neumann D. Entrainment Of A Semilunar Rhythm By Simulated Tidal Cycles Of Mechanical Disturbance. J.exp.marBiolEcol. 1978;35: 73–85.
  32. 32. Neumann D, Heumbach F. TIme Cues for Semilunar Reproduction Rhythms in European Populations of Clunio marinus. II. The Influence of TIdal Temperature Cycles. Biological Bulletin. 1984;166: 509–524.
  33. 33. Neumann D. Genetic adaptation in emergence time of Clunio populations to different tidal conditions. 1967.
  34. 34. Kaiser TS. Local Adaptations of Circalunar and Circadian Clocks: The Case of Clunio marinus. Annual, Lunar, and Tidal Clocks. Springer Japan; 2014. pp. 121–141.
  35. 35. Kaiser TS, von Haeseler A, Tessmar-Raible K, Heckel DG. Timing strains of the marine insect Clunio marinus diverged and persist with gene flow. Mol Ecol. 2021;00: 1–17. pmid:33410230
  36. 36. Kaiser TS, Neumann D, Heckel DG, Berendonk TU. Strong genetic differentiation and postglacial origin of populations in the marine midge Clunio marinus (Chironomidae, Diptera). Mol Ecol. 2010;19: 2845–2857. pmid:20584134
  37. 37. Kaiser TS, Neumann D, Heckel DG. Timing the tides: genetic control of diurnal and lunar emergence times is correlated in the marine midge Clunio marinus. BMC Genet. 2011;12. pmid:21599938
  38. 38. Neumann D, Heimbach F. Time Cues for Semilunar Reproduction Rhythms in European Populations of Clunio Marinus. I. The Influence of Tidal Cycles of Mechanical Disturbance. 1978.
  39. 39. Albalat R, Cañestro C. Evolution by gene loss. Nat Rev Genet. 2016;17: 379–391. pmid:27087500
  40. 40. Monroe JG, McKay JK, Weigel D, Flood PJ. The population genomics of adaptive loss of function. Heredity 2021 126:3. 2021;126: 383–395. pmid:33574599
  41. 41. Kaiser TS, Poehn B, Szkiba D, Preussner M, Sedlazeck FJ, Zrim A, et al. The genomic basis of circadian and circalunar timing adaptations in a midge. Nature. 2016;540: 69–73. pmid:27871090
  42. 42. Kaiser TS, Heckel DG. Genetic architecture of local adaptation in lunar and diurnal emergence times of the marine midge clunio marinus (chironomidae, diptera). PLoS One. 2012;7. pmid:22384150
  43. 43. Michailova P. Comparative External Morphological and Karyological Characteristics of European Species of Genus Clunio Haliday, 1855 (Diptera, Chironomidae). Chironomidae. 1980; 9–15.
  44. 44. Briševac D, Peralta CM, Kaiser TS. An oligogenic architecture underlying ecological and reproductive divergence in sympatric populations. bioRxiv. 2022; 2022.08.30.505825.
  45. 45. Cockerham CC. An Extension of the Concept of Partitioning Hereditary Variance for Analysis of Covariances among Relatives When Epistasis Is Present. Genetics. 1954;39: 859–882. pmid:17247525
  46. 46. Szpiech ZA, Hernandez RD. Selective Sweeps. Encyclopedia of Evolutionary Biology. 2016; 23–32.
  47. 47. Ferrer-Admetlla A, Liang M, Korneliussen T, Nielsen R. On Detecting Incomplete Soft or Hard Selective Sweeps Using Haplotype Structure. Mol Biol Evol. 2014;31: 1275–1291. pmid:24554778
  48. 48. DeGiorgio M, Szpiech ZA. A spatially aware likelihood test to detect sweeps from haplotype distributions. bioRxiv. 2021; 2021.05.12.443825.
  49. 49. Armstrong JD, Texada MJ, Munjaal R, Baker DA, Beckingham KM. Gravitaxis in Drosophila melanogaster: A forward genetic screen. Genes Brain Behav. 2006;5: 222–239. pmid:16594976
  50. 50. Pinti M, Gibellini L, Nasi M, de Biasi S, Bortolotti CA, Iannone A, et al. Emerging role of Lon protease as a master regulator of mitochondrial functions. Biochimica et Biophysica Acta (BBA)—Bioenergetics. 2016;1857: 1300–1306. pmid:27033304
  51. 51. Sifuentes-Romero I, Ferrufino E, Thakur S, Laboissonniere LA, Solomon M, Smith CL, et al. Repeated evolution of eye loss in Mexican cavefish: Evidence of similar developmental mechanisms in independently evolved populations. J Exp Zool B Mol Dev Evol. 2020;334: 423–437. pmid:32614138
  52. 52. Sackton TB, Grayson P, Cloutier A, Hu Z, Liu JS, Wheeler NE, et al. Convergent regulatory evolution and loss of flight in paleognathous birds. Science (1979). 2019;364: 74–78. pmid:30948549
  53. 53. Briševac D, Peralta CM, Kaiser TS. An oligogenic architecture underlying ecological and reproductive divergence in sympatric populations. Elife. 2023;12. pmid:36852479
  54. 54. Ernstrom GG, Chalfie M. Genetics of sensory mechanotransduction. Annual Review of Genetics. 2002. pp. 411–453. pmid:12429699
  55. 55. Roscito JG, Sameith K, Parra G, Langer BE, Petzold A, Moebius C, et al. Phenotype loss is associated with widespread divergence of the gene regulatory landscape in evolution. Nature Communications 2018 9:1. 2018;9: 1–15. pmid:30413698
  56. 56. Sun Y, Liu L, Ben-Shahar Y, Jacobs JS, Eberl DF, Welsh MJ. TRPA channels distinguish gravity sensing from hearing in Johnston’s organ. Proc Natl Acad Sci U S A. 2009;106: 13606–13611. pmid:19666538
  57. 57. Chen D, Qu C, Hewes RS. Neuronal remodeling during metamorphosis is regulated by the alan shepard (shep) gene in Drosophila melanogaster. Genetics. 2014;197: 1267–1283. pmid:24931409
  58. 58. Olesnicky EC, Antonacci S, Popitsch N, Lybecker MC, Titus MB, Valadez R, et al. Shep interacts with posttranscriptional regulators to control dendrite morphogenesis in sensory neurons. Dev Biol. 2018;444: 116–128. pmid:30352216
  59. 59. Ashraf SI, Hu X, Roote J, Ip YT. The mesoderm determinant Snail collaborates with related zinc-finger proteins to control Drosophila neurogenesis. EMBO Journal. 1999;18: 6426–6438. pmid:10562554
  60. 60. Kamikouchi A, Inagaki HK, Effertz T, Hendrich O, Fiala A, Göpfert MC, et al. The neural basis of Drosophila gravity-sensing and hearing. Nature. 2009;458: 165–171. pmid:19279630
  61. 61. Ishikawa Y, Fujiwara M, Wong J, Ura A, Kamikouchi A. Stereotyped Combination of Hearing and Wind/Gravity-Sensing Neurons in the Johnston’s Organ of Drosophila. Front Physiol. 2020;10. pmid:31969834
  62. 62. Wang Y, Levy DE. Comparative evolutionary genomics of the STAT family of transcription factors. JAKSTAT. 2012;1: 23–36. pmid:24058748
  63. 63. Liongue C, Ward AC. Evolution of the JAK-STAT pathway. JAKSTAT. 2013;2: e22756. pmid:24058787
  64. 64. Barillas-Mury C, Han YS, Seeley D, Kafatos FC. Anopheles gambiae Ag-STAT, a new insect member of the STAT family, is activated in response to bacterial infection. EMBO Journal. 1999;18: 959–967. pmid:10022838
  65. 65. Gupta L, Molina-Cruz A, Kumar S, Rodrigues J, Dixit R, Zamora RE, et al. The STAT Pathway Mediates Late-Phase Immunity against Plasmodium in the Mosquito Anopheles gambiae. Cell Host Microbe. 2009;5: 498–507. pmid:19454353
  66. 66. Souza-Neto JA, Sim S, Dimopoulos G. An evolutionary conserved function of the JAK-STAT pathway in anti-dengue defense. Proc Natl Acad Sci U S A. 2009;106: 17841–17846. pmid:19805194
  67. 67. Malsch P, Andratsch M, Vogl C, Link AS, Alzheimer C, Brierley SM, et al. Deletion of interleukin-6 signal transducer gp130 in small sensory neurons attenuates mechanonociception and down-regulates TRPA1 expression. Journal of Neuroscience. 2014;34: 9845–9856. pmid:25057188
  68. 68. Liang J, Wang D, Renaud G, Wolfsberg TG, Wilson AF, Burgess SM. The stat3/socs3a pathway is a key regulator of hair cell regeneration in zebrafish stat3/socs3a pathway: Regulator of hair cell regeneration. Journal of Neuroscience. 2012;32: 10662–10673. pmid:22855815
  69. 69. Lammerding J, Kamm RD, Lee RT. Mechanotransduction in cardiac myocytes. Annals of the New York Academy of Sciences. New York Academy of Sciences; 2004. pp. 53–70. pmid:15201149
  70. 70. Millward-Sadler SJ, Khan NS, Bracher MG, Wright MO, Salter DM. Roles for the interleukin-4 receptor and associated JAK/STAT proteins in human articular chondrocyte mechanotransduction. Osteoarthritis Cartilage. 2006;14: 991–1001. pmid:16682236
  71. 71. Shah SK, Fogle LN, Aroom KR, Gill BS, Moore-Olufemi SD, Jimenez F, et al. Hydrostatic intestinal edema induced signaling pathways: Potential role of mechanical forces. Surgery. 2010;147: 772–779. pmid:20097396
  72. 72. de Andrés MC, Imagawa K, Hashimoto K, Gonzalez A, Goldring MB, Roach HI, et al. Suppressors of cytokine signalling (SOCS) are reduced in osteoarthritis. Biochem Biophys Res Commun. 2011;407: 54–59. pmid:21352802
  73. 73. Busch-Dienstfertig M, González-Rodríguez S. IL-4, JAK-STAT signaling, and pain. JAKSTAT. 2013;2: e27638. pmid:24470980
  74. 74. Kunnen SJ, Malas TB, Semeins CM, Bakker AD, Peters DJM. Comprehensive transcriptome analysis of fluid shear stress altered gene expression in renal epithelial cells. J Cell Physiol. 2018;233: 3615–3628. pmid:29044509
  75. 75. Fuhrmann N, Prakash C, Kaiser TS. Polygenic adaptation from standing genetic variation allows rapid ecotype formation. Elife. 2023;12. pmid:36852484
  76. 76. Reineke A, Karlovsky P, Zebitz CPW. Preparation and purification of DNA from insects for AFLP analysis. Insect Mol Biol. 1998;7: 95–99. pmid:9459433
  77. 77. Etter PD, Bassham S, Hohenlohe PA, Johnson EA, Cresko WA. SNP discovery and genotyping for evolutionary genetics using RAD sequencing. Methods Mol Biol. 2011;772: 157–178. pmid:22065437
  78. 78. Etter PD, Johnson E. RAD paired-end sequencing for local de novo assembly and SNP discovery in non-model organisms. Methods in Molecular Biology. 2012;888: 135–151. pmid:22665280
  79. 79. Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One. 2008;3. pmid:18852878
  80. 80. Bolger AM, Lohse M, Usadel B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114–2120. pmid:24695404
  81. 81. Zhang J, Kobert K, Flouri T, Stamatakis A. PEAR: A fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics. 2014;30: 614–620. pmid:24142950
  82. 82. Sedlazeck FJ, Rescheneder P, von Haeseler A. NextGenMap: Fast and accurate read mapping in highly polymorphic genomes. Bioinformatics. 2013;29: 2790–2791. pmid:23975764
  83. 83. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25: 2078–2079. pmid:19505943
  84. 84. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20: 1297–1303. pmid:20644199
  85. 85. Broman Karl W., Sen Saunak. A Guide to QTL Mapping with R/qtl. Springer; 2009. https://doi.org/10.1007/978-0-387-92125-9
  86. 86. Wang S, Basten CJ, Zeng Z-B. Windows QTL Cartographer 2.5. Department of Statistics, North Carolina State University, Raleigh, NC; 2012. Available: http://statgen.ncsu.edu/qtlcart/WQTLCart.htm
  87. 87. Yandell BS, Mehta T, Banerjee S, Shriner D, Venkataraman R, Moon JY, et al. R/qtlbim: QTL with Bayesian Interval Mapping in experimental crosses. Bioinformatics. 2007;23: 641–643. pmid:17237038
  88. 88. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25: 1754–1760. pmid:19451168
  89. 89. Gautier M. Genome-wide scan for adaptive divergence and association with population-specific covariates. Genetics. 2015;201: 1555–1579. pmid:26482796
  90. 90. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27: 2156–2158. pmid:21653522
  91. 91. Knaus BJ, Grünwald NJ. vcfr: a package to manipulate and visualize variant call format data in R. Molecular Ecology Resources. Blackwell Publishing Ltd; 2017. pp. 44–53. pmid:27401132
  92. 92. Lee E, Helt GA, Reese JT, Munoz-Torres MC, Childers CP, Buels RM, et al. Web Apollo: A web-based genomic annotation editing platform. Genome Biol. 2013;14: 1–13.
  93. 93. Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012;6: 80–92. pmid:22728672
  94. 94. Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, Cook H, et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 2019;47: D309–D314. pmid:30418610
  95. 95. Szpiech ZA, Novak TE, Bailey NP, Stevison LS. Application of a novel haplotype-based scan for local adaptation to study high-altitude adaptation in rhesus macaques. Evol Lett. 2021;5: 408–421. pmid:34367665
  96. 96. Fuhrmann N, Prakash C, Kaiser TS. Polygenic adaptation from standing genetic variation allows rapid ecotype formation.
  97. 97. Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, von Mering C, et al. Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper. Mol Biol Evol. 2017;34: 2115–2122. pmid:28460117
  98. 98. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nature Methods 2014 12:1. 2014;12: 59–60. pmid:25402007
  99. 99. Alexa A, Rahnenfuhrer J. topGO: Enrichment Analysis for Gene Ontology. R package; 2022.