Widespread selection and gene flow shape the genomic landscape during a radiation of monkeyflowers

Speciation genomic studies aim to interpret patterns of genome-wide variation in light of the processes that give rise to new species. However, interpreting the genomic “landscape” of speciation is difficult, because many evolutionary processes can impact levels of variation. Facilitated by the first chromosome-level assembly for the group, we use whole-genome sequencing and simulations to shed light on the processes that have shaped the genomic landscape during a radiation of monkeyflowers. After inferring the phylogenetic relationships among the 9 taxa in this radiation, we show that highly similar diversity (π) and differentiation (FST) landscapes have emerged across the group. Variation in these landscapes was strongly predicted by the local density of functional elements and the recombination rate, suggesting that the landscapes have been shaped by widespread natural selection. Using the varying divergence times between pairs of taxa, we show that the correlations between FST and genome features arose almost immediately after a population split and have become stronger over time. Simulations of genomic landscape evolution suggest that background selection (BGS; i.e., selection against deleterious mutations) alone is too subtle to generate the observed patterns, but scenarios that involve positive selection and genetic incompatibilities are plausible alternative explanations. Finally, tests for introgression among these taxa reveal widespread evidence of heterogeneous selection against gene flow during this radiation. Combined with previous evidence for adaptation in this system, we conclude that the correlation in FST among these taxa informs us about the processes contributing to adaptation and speciation during a rapid radiation.


Introduction
The primary goal of speciation genomics is to interpret patterns of genome-wide variation in light of the ecological and evolutionary processes that contribute to the origin of new species [1][2][3].Advances in DNA sequencing now allow us to capture patterns of genome-wide variation from organisms across the tree of life, but inferring the processes underlying these patterns remains a formidable challenge [2].This is because speciation is highly complex, involving a range of factors and processes that shape genomes through time and across different spatial and ecological settings [4].
The difficulty of inferring process from pattern is illustrated by recent efforts to characterize the genomic basis of reproductive isolation using patterns of genome-wide variation [1][2][3].Numerous studies have revealed highly heterogeneous differentiation "landscapes" between pairs of taxa at different stages in the speciation process [5][6][7][8][9][10][11][12][13].This pattern, which is characterized by peaks and valleys of relative differentiation (i.e., F ST ) across the genome, was initially thought to provide insight into the genomic architecture of porous species boundaries [14].Peaks in the differentiation landscape were interpreted as genomic regions containing loci underlying reproductive barriers, whereas valleys were thought to reflect regions that were homogenized by ongoing gene flow [6,15,16].However, as the field of speciation genomics has matured, it has become clear that heterogeneous differentiation landscapes can be influenced by factors that have nothing to do with speciation per se.
For example, it is now clear that levels of genome-wide differentiation (F ST ) can be influenced by the genomic distribution of intrinsic properties, including the recombination rate and the local density of functional sites [17].This is because these properties affect the way that natural selection impacts levels of variation across the genome.First, regions enriched for functional sequence are more likely to be subject to selection, because they provide a larger target size for mutations with fitness effects.Second, when positive or negative directional selection acts on these mutations, it can indirectly reduce levels of genetic variation at statistically associated sites [17][18][19][20][21][22][23].Because these indirect effects of selection are mediated by linkage disequilibrium (LD) between selected and neutral sites, stronger reductions in diversity (and corresponding increases in F ST ) are expected in genomic regions with low recombination, because this is where LD breaks down most gradually [24].This is also why these indirect effects of selection are referred to as "linked selection," even though physical linkage is not actually the cause.In this paper, we refer to the indirect effects of selection as "indirect selection" for short.
Although genomic features were not initially expected to play a major role in shaping patterns of between-species variation [25,26] (but see [20,22]), recent empirical studies indicate that they can have a strong impact on the topography of the differentiation landscape [1,17,[27][28][29].The most compelling evidence comes from studies that compare not one pair of taxa but several closely related taxa that share a similar distribution of genomic elements due to their recent common ancestry [1,28,29].For example, Burri and colleagues [30] examined multiple episodes of divergence in Ficedula flycatchers and found strikingly similar differentiation landscapes between distinct pairs of taxa, presumably due to a shared pattern of indirect selection across the genome.Highly correlated differentiation landscapes have also been found in other groups of taxa, including sunflowers [12], Heliconius butterflies [10,31], Darwin's finches [32], and other birds [33,34].
But what, if anything, do these parallel signatures of selection reveal about the genomic basis of adaptation and speciation?Part of the answer lies in understanding which forms of selection cause these patterns.In a recent article, Burri [29] argued that selection against deleterious mutations (i.e., background selection [BGS]) is primarily responsible for the evolution of differentiation landscapes that are correlated both among taxa and with the distribution of intrinsic features.This argument was based on 2 premises: first, deleterious mutations are far more common than beneficial ones, so BGS has a greater opportunity to generate a genomewide correlation between levels of variation and the distributions of intrinsic properties; second, although all functional elements are potential targets of BGS, the same loci are not expected to be repeatedly involved in adaptation or speciation across multiple taxa [29].Although this interpretation implies that correlated differentiation landscapes are unlikely to inform us about adaptation or speciation directly, it has been argued that this shared pattern of differentiation can be used to control for the effects of BGS when attempting to identify regions of the genome that have been affected by positive selection [29].
Even though it is often argued that correlated differentiation landscapes reflect the action of recurrent BGS, it stands to reason that they also could arise as a direct consequence of adaptation and speciation.For example, when adaptation occurs from standing variation, the rate of adaptation is not limited by the mutation rate, meaning that heterogeneous differentiation landscapes can evolve rapidly [35,36].Moreover, if adaptation is highly polygenic, positive selection will inevitably impact most of the genome [37].This could cause levels of differentiation to become correlated with the distribution of intrinsic genomic features, even across multiple taxa that are adapting to different environments.Also, correlated differentiation landscapes may arise across multiple closely related taxa due to a common basis of reproductive isolation among taxa.This is because incomplete isolating barriers generate a heterogeneous pattern of selection against gene flow across the genome.The loci underlying these porous barriers, including genetic incompatibilities and locally adapted alleles, are expected to accumulate in gene rich regions and will have stronger barrier effects in genomic regions in which the recombination rate is low [38,39].Thus, speciation also may result in genome-wide correlations between levels of differentiation and genome features, especially if reproductive isolation is highly polygenic.
Facilitated by a new chromosome-level genome assembly, genetic map, and annotation, we combine analyses of whole-genome sequencing with simulations to understand how different processes have contributed to the evolution of correlated genomic landscapes during a recent radiation.The bush monkeyflower radiation consists of 8 taxa of Mimulus aurantiacus distributed mainly throughout California (Fig 1; [40]).Together with their sister species M. clevelandii, they span a range of divergence times over the past approximately 1 million years.The plants inhabit a range of environments, including temperate coastal regions, mountain ranges, semi-arid habitats, and offshore islands [41].Crossing experiments have shown that all of these taxa are at least partially interfertile [42], and many hybridize in nature in narrow regions where their distributions overlap [43][44][45][46][47].Despite their close evolutionary relationships and opportunities for gene flow, these taxa show striking phenotypic differentiation [40].The most conspicuous trait differences are associated with their flowers, which show heritable variation in color, size, shape, and the placement of the reproductive organs [45,47].In his seminal works on plant speciation, Grant [48][49][50] postulated that these floral trait differences were due to pollinator-mediated selection by different avian and insect pollinators.Detailed studies in one pair of taxa support this hypothesis and have shown that pollinator-mediated selection can generate strong premating reproductive isolation in the face of extensive gene flow [43][44][45][46][47].
After inferring the phylogenetic relationships among these taxa, we show that highly similar diversity (π) and differentiation (F ST ) landscapes have emerged across the group.Variation in these landscapes was strongly predicted by the local density of functional elements and the recombination rate, suggesting that they have been shaped by widespread selection.Using the varying divergence times between pairs of taxa, we show that the correlations between F ST and genome features arose almost immediately after a population split and have become stronger over the course of time.Simulations of genomic landscape evolution suggest that BGS (i.e., selection against deleterious mutations) alone is too subtle to generate the observed patterns, but scenarios that involve positive selection and genetic incompatibilities are plausible alternative explanations.Finally, tests for introgression among these taxa reveal widespread evidence of heterogeneous selection against gene flow during this radiation.We discuss the implications of these results for our general understanding of genomic landscape evolution, particularly in light of recent efforts to reveal the genomic basis of adaptation and speciation.

A chromosome-level genome assembly, map, and annotation for the bush monkeyflower
To facilitate the analysis of genome-wide variation in this group, we constructed the first chromosome-level reference genome for the bush monkeyflower using a combination of long-read single molecule real time (SMRT) sequencing (PacBio), overlapping and mate-pair short-reads (Illumina), and a high-density genetic map (7,589 segregating markers across 10 linkage groups (LGs); S1 Fig; S1 Table).Contig building and scaffolding yielded 1,547 scaffolds, with an N50 size of 1.6 Mbp and a total length of 207 Mbp.The high-density map allowed us to anchor and orient 94% of the assembled genome onto 10 LGs, which is the number of chromosomes inferred from karyotypic analyses in all subspecies of M. aurantiacus and M. clevelandii [51].Analysis of assembly completeness based on conserved gene space [52] revealed that 93% of 1,440 universal single-copy orthologous genes were completely assembled, with a further 2% partially assembled (S2 Table ).Subsequent annotation yielded 23,018 predicted genes.

Phylogenetic relationships among taxa and patterns of discordance across the genome
To infer phylogenetic relationships among the taxa in this radiation, we sequenced 37 whole genomes from the 7 subspecies and 2 ecotypes of Mimulus aurantiacus (n = 4-5 per taxon) and its sister taxon M. clevelandii (n = 3; S2 Fig; S3 Table ).Close sequence similarity allowed us to align reads from all samples to the reference assembly with high confidence (an average of 91.7% reads aligned; S3 Table ).After mapping, we identified 13.2 million variable sites that were used in subsequent analyses (average sequencing depth of 21× per individual; S3 Table ).Relationships were then inferred among the 9 taxa from a concatenated alignment of genomewide SNPs using maximum-likelihood (ML) phylogenetic analysis in RAxML [53].
The tree topology obtained from this analysis (Fig 1) confirmed the same phylogenetic relationships as previous analyses based on reduced-representation sequencing and 5 different methods of phylogenetic reconstruction [40,54] and was supported by patterns of clustering from principal components analysis (PCA; S3 Fig).Individuals of each of the 7 subspecies formed monophyletic groups with 100% bootstrap support (Fig 1).Relationships within subspecies puniceus were more complex, because the red ecotype formed a monophyletic subclade within the paraphyletic yellow ecotype.This is consistent with the recent origin of red flowers from a yellow-flowered ancestor [54].
Although the whole-genome phylogeny provides a well-supported summary of the relationships among these taxa, concatenated phylogenies can obscure phylogenetic discordance in more defined genomic regions [55] (Fig 1).To test for fine-scale phylogenetic discordance, we next constructed ML phylogenies for 500-kb and 100-kb genomic windows.We then calculated a "concordance score" for each tree by computing the correlation between the distance matrix generated from each window-based tree and the whole-genome tree, with a stronger correlation indicating that 2 trees have a more similar topology.
At the 500-kb scale, only 22 (6%) trees showed the same taxon branching order as the whole-genome tree.However, concordance scores tended to be very high for all of the trees (mean = 0.964, SD = 0.039; minimum = 0.719), suggesting that variation in the topologies was due primarily to minor differences in branching order.This was confirmed by quantifying how often each node in the genome-wide tree was recovered in the set of window-based trees (Fig 1).Specifically, we found that differences in branching order were associated with the most recent splits, which included pairs of closely related taxa.For example, even though subspecies puniceus was monophyletic in the majority of trees (72.6%), individuals from the red ecotype only formed a monophyletic group in 27% of the trees.Similarly, the closely related subspecies longiflorus and calycinus were monophyletic in only 45.7% and 24.1% of trees, respectively.Higher discordance among these closely related, geographically proximate taxa is likely due to 2 factors.First, only a short time has passed since they shared a common ancestor, meaning that ancestral polymorphisms have had little time to sort among lineages.Second, ongoing gene flow between some taxa may have opposed sorting, prolonging the retention of ancestral variants among them.
Next, we examined how the level of phylogenetic discordance varied across the bush monkeyflower genome.If the discordance was due to the stochastic effects of neutral processes, variation in tree concordance scores should be distributed randomly across the genome [56].To test this prediction, we plotted the tree concordance scores across the 10 LGs (

Correlated patterns of genome-wide variation across the radiation
To gain deeper insight into the evolutionary processes that have shaped patterns of genomewide variation during this radiation, we used 3 summary statistics to quantify patterns of diversity (π), divergence (d xy ), and differentiation (F ST ) within and between these taxa (the same 500-kb and 100-kb sliding windows as above).π, or nucleotide diversity, was used to quantify the level of genetic variation within each taxon.It is defined as the average number of nucleotide differences per site between 2 sequences drawn from the population [57].Second, we calculated d xy as a measure of sequence divergence between all 36 pairs of taxa.In principle, π and d xy are the same measure, except that the former is calculated between sequences within a taxon and the latter between pairs of taxa; they have the same units and both are proportional to the average coalescence time of the sequences being compared multiplied by mutation rate.Third, we calculated F ST between each pair of taxa.Unlike d xy , which is a measure of divergence between sequences drawn from different taxa, F ST measures differentiation between the taxa relative to the total diversity in the sample.Between samples x and y, F ST is roughly equal to 1−π/d xy , where π is the mean diversity in the 2 taxa [58].F ST is strongly influenced by current levels of within-taxon diversity, whereas d xy is strongly influenced by the level of ancestral variation when divergence times are small.The use of the term "divergence" to describe both d xy and F ST has caused some confusion in the literature, leading to alternative naming schemes.Here, we follow a recently proposed convention, referring to d xy as "divergence" and F ST as "differentiation" [2].
The variation in F ST among all 36 pairs of taxa highlights the continuous nature of differentiation across the group ( These analyses confirmed that patterns of genome-wide variation were highly correlated across this group of taxa.Indeed, PC1 explained 65.9% of the variation in F ST across the 36 pairwise comparisons.Further, all comparisons loaded positively onto PC1 (mean loading = 0.78, SD = 0.18; S4 Table for all loadings), indicating that peaks and troughs of F ST tended to occur in the same genomic regions across all comparisons.Patterns of genome-wide divergence (d xy ) and diversity (π) also were highly correlated across comparisons, with PC1 explaining 69.5% and 84.7% of the variation among the window-based estimates, respectively.Again, all taxa (for π) and taxon comparisons (for d xy ) loaded positively onto PC1 (mean loading for d xy = 0.78, SD = 0.18; for π, 0.91, SD = 0.07).PC1 therefore provides a summary of the original landscapes and is effectively the same as taking the mean window-based scores for each statistic (r 2 between PC1 and mean scores > 0.995 for all 3 statistics).

Common genomic landscapes have been shaped by heterogeneous indirect selection
Observing highly similar genomic landscapes suggest that a common pattern of heterogeneous selection has shaped variation in all 9 taxa.Indeed, if a region experiences recurrent indirect selection across the phylogenetic tree, then it should show lower diversity (π) within species and lower divergence (d xy ) between species, because d xy is influenced by levels of diversity in the common ancestor [19,20,29].In agreement with this prediction, we observed a strong positive correlation between PC1 d xy and PC1 π (r = 0.84), indicating that regions of the genome with lower diversity tended to be less diverged between these taxa (and thus had lower ancestral diversity; Regions with reduced diversity also tended to show higher levels of differentiation (F ST ; r = −0.84)and tree concordance (r = −0.69).These relationships are also predicted by models of recurrent indirect selection, because local reductions in diversity decrease the amount of genetic variation in future generations, similar to a local reduction in N e .As a result, ancestral variants in impacted regions sort more rapidly than under neutrality [59].
In models of recurrent indirect selection, variation in its impact on associated sites is determined by the distribution of genomic features, including the local density of functional elements and the recombination rate [18,19].To test these theoretical predictions, we used our genome annotation and genetic map to calculate the number of protein coding genes and the average recombination rate (cM/Mbp) in each 500-kb window (  we did not observe a significant interactive effect of gene count and recombination rate on diversity (p = 0.057).This may be because the distribution of these features is correlated, making it difficult to tease apart their relative impacts.
Taken together, our results indicate that a common pattern of indirect selection has caused correlated genomic landscapes to evolve across this radiation.As predicted by theory [18,19,60] [21] and observed in diverse taxa [30,59,61], genome-wide variation in the strength of indirect selection is caused by the heterogeneous distributions of intrinsic genome features, namely, the density of functional elements and the local recombination rate.

A known adaptive locus shows a strong deviation from the common pattern of differentiation
Because widespread signatures of indirect selection are often assumed to reflect the impact of heterogeneous BGS across the genome, several authors have proposed that the correlation in F ST across multiple pairs of taxa can be used as a baseline for detecting genomic regions that have been affected by positive selection [29,62].More specifically, loci that have contributed to adaptation or speciation should be detectable as a positive deviation from the common pattern of differentiation (i.e., PC1), which is considered to reflect processes that are unrelated to ecologically relevant positive selection [29].
We have a unique opportunity to test the effectiveness of this comparative genomic approach by examining patterns of differentiation around a locus that is known to contribute to adaptation and speciation in subspecies puniceus.Using a candidate gene approach, Streisfeld and colleagues [63] showed that the shift from yellow to red flowers in puniceus was caused by a cis-regulatory mutation in the R2R3-MYB transcription factor MaMyb2.Patterns of haplotype variation within the gene indicate that the red allele was subject to strong positive selection and rapidly swept to fixation in what is now the geographic range of the red ecotype [54].
To test for a lineage-specific signature of differentiation at this locus, we examined patterns of Z-F ST in a relatively narrow region (approximately 3 Mbp) surrounding MaMyb2 (Fig 4 ; S10 Fig) .At the 500-kb scale, the comparison between the red and yellow ecotypes shows a sharp peak centered near the flower color locus.The peak is strongly elevated above PC1 Z-F ST (3.33 SDs), indicating that differentiation is indeed accentuated relative to the level observed between other taxon pairs.Further, other comparisons, including those with the red ecotype, do not show accentuated differentiation in this region, indicating that the signal is specific to this taxon pair.The peak is even more pronounced in the analysis at the 100-kb scale, rising 9.5 SDs above PC1 Z-F ST .At this scale, the signature of positive selection is visible in other comparisons that include the red ecotype, though the signal is less pronounced than in the comparison with the yellow ecotype (S10 Fig).

Correlated differentiation landscapes emerge rapidly following a population split
Although correlated landscapes may evolve due to the indirect effects of widespread BGS, it is also conceivable that they could evolve due to other evolutionary processes.To gain insight into the role that BGS may have played in shaping these patterns, we next tested several hypotheses about how BGS is expected to shape the genomic landscape over time.To do this, we used the level of genetic distance between each pair of taxa as a proxy for their divergence time and constructed a temporal picture of genomic landscape evolution that spans the first million years following a population split.
In a verbal model, Burri [29] made several predictions about how correlations between measures of variation and genome features should be impacted by BGS following a population split.First, the correlations between diversity (π) and intrinsic features should be relatively consistent over time, because heterogeneous BGS will continue to constrain patterns of diversity in each daughter population after the split.Second, π and d xy should remain highly correlated with one another, because d xy is largely influenced by ancestral diversity at these short timescales.Third, levels of genome-wide differentiation (F ST ) initially should be low and vary stochastically across the genome, due mainly to the sampling effect that accompanies a split.Therefore, F ST should not be correlated with the distribution of genomic properties early in divergence.However, as time passes, and the indirect effects of ancestral and lineage-specific BGS accumulate, levels of differentiation should gradually become more correlated with underlying genome features.
To test these hypotheses, we computed correlations between relevant window-based measures of variation and genomic features between all 36 pairs of taxa.We then plotted these correlations on corresponding measures of between-taxon d a , which ranged from very low (0.05%) between the parapatric red and yellow ecotypes of subspecies punicieus, up to 1.3% in allopatric comparisons that included M. clevelandii.We used d a (d xy − mean π), because it corrects for sequence variation present in the common ancestor, making it a better proxy of recent divergence time than d xy .
This analysis revealed very clear temporal signatures of genomic landscape evolution, some of which were consistent with the above predictions (  ).This is consistent with the prediction that diversity  ).Thus, as predicted, the differentiation landscape increasingly reflects the distribution of intrinsic features as divergence time increases.
However, 2 of the observed patterns differed markedly from existing predictions [29].First, π and d xy did not remain highly correlated over this relatively short timeframe.Although the correlation is almost perfect at the earliest time points (r > 0.99), it decays rapidly and is not significantly different from zero in the most diverged taxon pairs (Fig 5E).In addition, the correlations between F ST and proxies of the strength of indirect selection are much higher than expected at these early divergence times.For example, for the red and yellow ecotypes, the correlations between F ST and gene density (r = 0.415), recombination rate (r = −0.208),and mean π (r = −0.452)are substantial and highly significant (p < 0.0001) at a very early stage of divergence.The strength of these correlations also increases rapidly, with the diversity and differentiation landscapes almost perfectly mirroring one another in the most divergent comparisons (  differentiation landscapes in this system.Therefore, we next considered the potential for other evolutionary processes to generate these patterns.

Simulations suggest that adaptation has played an important role in genomic landscape evolution
To assess the plausibility of different modes of selection for generating the observed genomic landscapes and temporal patterns, we used individual-based simulations in SLiM [64,65].Our basic model consisted of an ancestral population (N = 10,000) that evolved for 10N generations and then split into 2 daughter populations that diverged for a further 10N generations.Each individual carried a 21 Mbp chromosome (similar to the physical size of a bush monkeyflower chromosome), partitioned into 3 equally sized regions.In the central third, all mutations were neutral, whereas some mutations in the 2 distal ends could affect fitness, depending on the scenario.Although simple, this partitioning scheme generates broad-scale variation in the strength of indirect selection across the chromosome and roughly approximates the distributions of genomic features in this system (e.g., LG 7 in Under some conditions, the BDM incompatibility model produced patterns of variation that were more similar to our empirical findings than the BGS model ( In all simulations, patterns of diversity and divergence were relatively homogenous at the time of the split.However, selection against gene flow induced by incompatibility loci on the chromosome ends caused patterns of variation to become highly structured over time.Indeed, higher π, lower d xy , and lower F ST were observed in the neutrally evolving chromosome center due to the homogenizing effects of higher gene flow in this region.As in our empirical data, the correlation between π and d xy decayed over the course of some simulations and even became negative depending on rates of gene flow and selection (S14 Fig) .Although we simulated this scenario starting with undifferentiated populations (i.e., primary contact), widespread BDMIs should produce the same patterns for populations that diverged in allopatry and came back into secondary contact.This is because gene flow will have a stronger homogenizing effect in regions of the genome that are not associated with incompatibilities [66].Highly heterogeneous differentiation landscapes, characterized by lower F ST in the chromosome center, emerged rapidly in all 3 scenarios.This pattern appeared irrespective of the parameter values examined and was not heavily influenced by other factors, like gene flow, the inclusion of BGS, or whether adaptation occurred from standing variation.However, some of these factors may have a larger influence in simulations with different nonneutral mutation rates and selection strengths.
Overall, the results of these simulations suggest that BGS is not primarily responsible for heterogeneous genomic landscapes.Although our simulations should be interpreted cautiously, because we have not thoroughly explored parameter space associated with each model, the results of other recent simulation studies also support this conclusion.For example, Matthey-Doret and Whitlock [67] simulated population divergence with BGS under different scenarios with simulations using parameters estimated from humans and stickleback.They found that BGS was unable to generate heterogeneous differentiation landscapes over short timescales, with or without gene flow.Similarly, Rettelbach and colleagues [68] simulated the evolution of diversity landscapes using empirical estimates of recombination rate and functional densities from the collared flycatcher genome.They found that BGS was able to generate modest variation in levels of diversity across chromosomes, but they concluded that it was not sufficient to explain the pronounced dips observed in empirical studies of flycatcher genomes (e.g., [30]).Both of these studies suggest that other processes (probably positive selection) are responsible for generating the observed patterns [67,68].
Our simulations suggest that divergence histories involving positive selection and/or incompatibilities are plausible explanations for heterogeneous genomic landscapes that emerge rapidly after a population split.However, in order to account for the presence of strong correlations between measures of variation and genomic features across multiple comparisons, these scenarios assume that the genomic basis of adaptation and/or reproductive isolation is diffuse, shared across multiple taxa, and evolves rapidly.Although this was once considered unlikely, recent studies suggest that these conditions may be common.For example, adaptation and speciation are now thought to be highly polygenic [37,39,69], and recent evidence suggests that adaptation draws heavily on ancestral mutations, rather than those arising independently in multiple, related descendent lineages [35,70,71].Similarly, the interaction between widespread hybrid incompatibilities (intrinsic or extrinsic) and intrinsic genomic features is thought to have caused similar patterns of genome-wide variation to evolve between different pairs of hybridizing taxa [39].
Given the rapid and extensive trait diversification that has occurred during the bush monkeyflower radiation [40,45,72], positive selection has probably played an important role in shaping patterns of genome-wide variation across the group.This includes the striking divergence of floral traits (Fig 1A), which is thought to be due to divergent pollinator-mediated selection across southern California [48][49][50].In the taxa that have been studied detail, crossing experiments have shown that most of these traits are polygenic, and there is evidence that they are targets of selection [43,45].Intrinsic genetic incompatibilities may also influence patterns of variation between some of the taxa, but they are unlikely to explain the correlated landscapes that have evolved across the whole group, because most of these taxa show little or no evidence for intrinsic incompatibilities between them [42,43].However, local adaptation also can generate a porous isolating barrier [14], so it is likely that gene flow has contributed to genomic landscape evolution among some taxa, particularly those that are geographically proximate and/or hybridize in areas where their distributions overlap.

Radiation-wide evidence for selection against gene flow
To determine if gene flow has contributed to the evolution of these correlated genomic landscapes, we first tested for evidence of introgression among these taxa using D-statistics [73].Patterson's D measures asymmetry between the numbers of sites with ABBA and BABA patterns (where A and B are ancestral and derived alleles, respectively) across 3 in-group taxa and an out-group (in our case, M. clevelandii) that have the relationship (((P1, P2), P3) O).A significant excess of either pattern gives a nonzero value of D, which is taken as evidence that gene flow has occurred between P3 and one of the in-group taxa [73].Overall, these tests provide evidence that gene flow has occurred during this radiation.All 48 of the appropriate 4-taxon combinations yielded significant nonzero values of Patterson's D (p < 0.0001; S7 Table ; S15 Fig) .The admixture proportions (f) for each test indicate that an average of 2.5% of the genome has been transferred between pairs of in-group taxa, but the proportion varies among the tests, ranging from less than 1% to more than 10% (S7 Table ).
Although gene flow initially involves the exchange of whole genomes between populations, selection against gene flow reduces effective migration (m e ) in regions of the genome that are associated with barrier loci [2,69].Therefore, porous isolating barriers should result in a heterogeneous pattern of introgression across the genome [14,74].To test this hypothesis, we calculated the f d statistic, a version of the admixture proportion modified for application to genomic windows [75].As predicted, estimates of f d varied markedly among genomic regions (Fig 2 ; S16 Fig) .Most windows showed levels of f d at or near zero, indicating that foreign alleles have been purged from these regions by selection.However, a proportion of windows showed substantial admixture proportions.In some tests, values of f d exceeded 0.5, indicating that more than half of the sites in some windows have been shared between taxa.
Because gene flow opposes differentiation, we would expect to see lower F ST in regions with a higher proportion of introgressed variants.Consistent with this prediction, we observed a substantial negative correlation (r = −0.57;p < 0.0001) between mean f d (for each window, averaged over the 48 tests) and PC1 F ST (Fig 3).This result is not driven by a limited number of the 4-taxon tests, because 44 of the 48 comparisons showed the same significant negative relationship (S8 Table ).Also consistent with models of widespread selection against gene flow, regions of the genome with higher f d scores tended to have higher diversity (r = 0.45; p < 0.0001), lower tree concordance scores (r = −0.59;p < 0.0001), fewer functional genes (r = −0.36;p < 0.0001), and a higher recombination rate (r = −0.18;p < 0.0001; Fig 3).
Overall, these results indicate that widespread gene flow has played a key role in the formation of the genomic landscape in this system.In addition to reductions in diversity and increased differentiation owing to selection against gene flow, the persistence of introgressed variants has probably resulted in higher levels of diversity in regions with fewer genes and higher recombination rate.The 4-taxon tests show that the impact of gene flow is widespread across the radiation, though some caution should be exercised when interpreting the specific pattern of introgression events.Given that there the potential for gene flow between so many taxa and ancestral lineages, it is difficult to infer the number, source, and timing of admixture events that have occurred during the radiation.For example, rather than reflecting recent gene flow between all pairs of taxa, some introgression events may have occurred deeper in history, and their consequences inherited by multiple taxa.Although more sophisticated methods will be needed to model gene flow across this group, these results clearly show that it has contributed to the rapid evolution of correlated genomic landscapes during this radiation.

Conclusions and implications for understanding genomic landscape evolution and the basis of adaptation and speciation
Facilitated by a new chromosome-level genome assembly, we have shed light on the causes of correlated genomic landscapes across a radiation of monkeyflowers.Adaptive divergence and gene flow are hallmarks of rapid radiations [76], and our data suggest that the indirect effects of selection resulting from these processes have contributed to a common pattern of differentiation among these taxa.Our "time-course" analysis shows that the common landscape emerged rapidly after populations split and has become more correlated with the distribution of genomic features as divergence time has increased.Although BGS may play some role, its effects are probably too subtle to have made a strong contribution to the correlated landscapes during the timeframe of this radiation.
In addition to enhancing our understanding of the processes that have shaped the genomic landscape during this radiation, our study contributes toward a more general understanding of the role that natural selection plays in shaping genome-wide variation.In line with the findings of other recent studies, our results indicate that little, if any, of the genome evolves free of the effects of natural selection [27,61].Moreover, our "time-course" analysis shows that between-taxon signatures of selection can emerge very rapidly after a population split and can be substantial, even between populations at the early stages of divergence.Overall, these results suggest that patterns of between-population variation cannot be understood without taking the effects of natural selection into account.
An important point arising from this work is that multiple divergence histories can generate heterogeneous differentiation landscapes that are correlated both among taxa and with the distribution of intrinsic genomic properties.When divergence is recent, possible explanations include polygenic adaptation across multiple populations and porous barriers to gene flow arising from divergent ecological selection and/or intrinsic incompatibilities.Although it has often been assumed that recurrent BGS is the primary cause of correlated landscapes, it is important to remember that all forms of selection can indirectly impact levels of variation at associated sites.In fact, our simulations show that alternative explanations may be more likely when divergence times are short and there is opportunity for gene flow among taxa.Thus, we advocate for a more nuanced approach when interpreting correlated differentiation landscapes, rather than assuming that they are caused by a single evolutionary process.
Finally, our results have important practical implications for studies attempting to identify the genomic basis of adaptation and speciation from patterns of genome-wide variation.Indeed, detecting these loci is a major goal of adaptation and speciation studies, and genome scans are now commonly used to identify promising candidate regions [2].In an effort to correct for the potentially confounding effects of BGS, it has been proposed that the correlation in F ST among multiple population pairs can be used as a baseline for outlier detection [29,62].A core premise of this method is that correlated differentiation landscapes are caused primarily by BGS, so removing this should expose the loci relevant to adaptation and/or speciation.Although this approach may be successful in identifying large-effect loci targeted by lineage-specific positive selection, as we illustrated for the MaMyb2 locus, we caution against treating the common pattern of differentiation simply as background noise.Rather than parsing out the effects of BGS, studies that use this approach may actually be discarding the genomic signature of polygenic adaptation and speciation.

Genome assembly
We used a combination of short-read Illumina and long-read SMRT sequencing to assemble the genome of a single individual from the red ecotype of M. aurantiacus subspecies puniceus (population UCSD; S3 Table ).Genomic DNA was isolated from leaf tissue using either ZR plant/seed DNA miniprep kits (Zymo Research, Irvine, CA) or GeneJet Plant Genomic DNA purification kits (Thermo Fisher, Waltham, MA).Illumina libraries were generated following the Allpaths-LG assembly pipeline [77], which included a single fragment library with average 180 bp insert size and 3 mate-pair libraries (average insert sizes: 3.5-5 kb, 5-7 kb, and 7-13 kb).Libraries were sequenced on the Illumina HiSeq 2500 using paired-end 100 bp reads.An initial scaffold-level assembly was performed with Allpaths-LG using default parameters and the haploidify function enabled.This assembly yielded 11,123 contigs (N50 = 40.5 kb) and 2,299 scaffolds (N50 = 1,310 kb), for a total assembly size of 193.3 Mbp.Long-read sequencing was performed from the same individual using 12 SMRT cells sequenced on the Pacific Biosystems RS II machine at Duke University.We obtained a total of 6.4 Gbp of sequence, which corresponds to approximately 21× coverage of the genome.The PacBio reads were used to rescaffold the Allpaths-LG scaffolds using Opera-LG [78].This reduced the number of scaffolds to 1,547 (N50 = 1,578 kb).
We then manually improved the scaffold containing the flower color gene MaMyb2 [63].We first aligned this scaffold to a previously published draft sequence assembly from this same individual [44], which was generated using Illumina short-reads and the Velvet assembler [79].We used long range PCR and cloning to generate Sanger sequences across 3 regions within 20 kb of MaMyb2 that did not assemble well.Genomic DNA was amplified using Phusion high fidelity polymerase (NEB, Ipswich, MA).PCR products were cloned into the pCR2.1 TOPO-TA vector (Life Technologies, Carlsbad, CA), and purified plasmids were sequenced with Sanger technology.Resulting sequences were aligned to the scaffold containing MaMyb2, and new PCR primers were designed to sequence internal fragments until the entire insert was sequenced.Using this approach, we sequenced a total of 9,824 bp across the 3 regions.The reference sequence in the assembly was corrected manually to match the Sanger data.
Finally, we gap filled the assembly using the PacBio data and the program PBJelly [80].Resulting scaffolds were assembled into pseudomolecules using Chromonomer (http:// catchenlab.life.illinois.edu/chromonomer/),according to the online manual.This software anchored and oriented scaffolds based on the order of markers in a high-density linkage map (see Construction of high-density linkage map) and made corrections to scaffolds when differences occurred between the genetic and physical positions of markers in the map.A final round of gap filling with PBJelly was performed to fill any gaps that were created by broken scaffolds in Chromonomer.To assess the completeness of the gene space in the assembly, we used both the BUSCO and CEGMA pipelines to estimate the proportion of 956 single copy plant genes (BUSCO) or 248 core eukaryotic genes (CEGMA) that were completely or partially assembled [52,81].The proportion of these genes present in an assembly has been shown to be correlated with the total proportion of assembled gene space and thus serves as a good predictor of assembly completeness.

Construction of high-density linkage map
We generated an outbred F 2 mapping population by crossing 2 F 1 individuals, each the product of crosses between different greenhouse-raised red and yellow ecotype plants collected from one red ecotype and one yellow ecotype population (populations UCSD and LO, respectively; S3 Table ).We then used restriction-site associated DNA sequencing (RADseq) to genotype F 1 and F 2 individuals.DNA was extracted from leaf material using Zymo ZR plant/seed DNA miniprep kits, and RAD library preparation followed the protocol outlined in Sobel and Streisfeld [43].Libraries were sequenced on the Illumina HiSeq 2000 platform using singleend 100 bp reads at the Genomics Core Facility, University of Oregon.
Reads were filtered based on quality, and errors in the barcode sequence or RAD site were corrected using the process_radtags script in Stacks version 1.35 [82,83].Loci were created using the denovo_map.plfunction of Stacks, with 3 identical raw reads required to create a stack, 2 mismatches allowed between loci for an individual, and 2 mismatches allowed when processing the catalog.SNPs were determined and genotypes called using a ML statistical model implemented in Stacks and a stringent χ 2 significance level of 0.01 to distinguish between heterozygotes and homozygotes.We then used the genotypes program implemented in Stacks to identify a set of 9,029 mappable markers.We specified a "CP" cross design (F 1 individuals coded as the parents), requiring that a marker was present in at least 75% of progeny at a minimum depth of 12 reads per individual, and we allowed automated corrections to be made to the data.
Linkage map construction was performed using Lep-MAP2 [84].The data were filtered using the Filtering module to include only individuals with less than 13% missing data and excluded markers that showed evidence for extreme segregation distortion (χ 2 test, P < 0.01).To assign markers to LGs, we used the SeparateChromosomes module with a logarithm of odds (LOD) score limit of 20 and no minimum size for LGs.This assigned 7,217 markers to 10 LGs, which matches the number of chromosomes in M. aurantiacus.The JoinSingles module was executed again with a LOD limit of 10 to join an additional 877 ungrouped markers to the 10 previously formed LGs.Fifty-seven singles that were not joined at this stage were discarded from the data set.Initial marker orders were determined using sex-averaged and sex-specific recombination rates using the OrderMarkers module.For each LG, we conducted 10 independent runs using the Kosambi mapping function (useKosambi = 1), with the data set split into 7 pseudofamilies to take advantage of parallel processing.When multiple markers had identical genotypes, only the duplicate marker with the least missing data was used in marker ordering.We retained the marker order from the run with the best likelihood.After removing markers with an error rate >0.05, the ML order was re-evaluated 10 times using the evaluateOrder flag and polished in 100 SNP windows.The map contained 8,094 informative loci from 269 F2 individuals, with an average of 3.5% ± SD 3.86 missing data per individual.
After the integration of our assembly and genetic map using the Chromonomer software [85], we made corrections to the map order based on the physical position of markers within assembled scaffolds.Using the output of Chromonomer, we identified markers that were out of order in the map compared to their local assembly order and aligned these markers to the assembly from Chromonomer using Bowtie2 version 2.2.5 [86] with the very_sensitive settings to obtain their physical order.We then re-estimated the map using the evaluateOrder flag in Lep-MAP2 as described above but with the marker order constrained to the physical order (improveOrder = 0) and with all duplicate markers included in the analysis (removeDuplicates = 0).After initial map construction, we removed 17 markers with an estimated error rate greater than 5% and estimated the map one last time using the same settings.The final map contained 7,589 markers across the 10 LGs.

Genome annotation
Prior to genome annotation, the assembly was soft-masked for repetitive elements and areas of low complexity with RepeatMasker (RepeatMasker Open-4.0)using a custom Mimulus aurantiacus library created by RepeatModeler (RepeatModeler Open-1.0),Repbase repeat libraries [87] and a list of known transposable elements provided by MAKER [88].In total, 30.99% of the genome assembly was masked by RepeatMasker.Repetitive elements were annotated with RepeatModeler.Hidden Markov Models for gene prediction were generated by SNAP [89] and Augustus [90] and were trained iteratively to the assembly using MAKER, as described by Cantarel and colleagues [91].Training was performed on the 14.5-Mbp sequence from LG9.Evidence used by MAKER for annotation included protein sequences from Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum, Solanum tuberosum, Daucus carota, Vitis vinifera (all downloaded from EnsemblPlants on 9 August 2016), Salvia miltiorrhiza (downloaded from Herbal Medicine Omics Database on 9 August 2016), Mimulus guttatus version 2 (downloaded from JGI Genome Portal on 9 August 2016), and all Uniprot/swissprot proteins (downloaded on 18 August 2016) [92][93][94] (Herbal Medicine Omics Database; Uniprot).We filtered the annotations with MAKER to include (1) only evidence-based information that also contained assembled protein support and (2) those ab initio gene predictions that did not overlap with the evidence-based annotations and that contained protein family domains, as detected with InterProScan [95].

Genome resequencing and variant calling
We collected leaf tissue from each taxon (collection locations available in S3 Table and S2 Fig) and extracted DNA from dried tissue using the Zymo Plant/Seed MiniPrep DNA kit following the manufacturer's instructions.We prepared sequencing libraries using the Kapa Biosystems HyperPrep kit, and libraries with an insert size between 400 and 600 bp were sequenced on the Illumina HiSeq 4000 using paired-end 150 bp reads at the Genomics Core Facility, University of Oregon.
We filtered raw reads using the process_shortreads script in Stacks version 1.46 to remove reads with uncalled bases or poor quality scores.We then aligned the retained reads to the reference assembly using the BWA-MEM algorithm in BWA version 0.7.15 [96].An average of 91.7% of reads aligned (range: 82.6%-96.0%),and the average sequencing depth was 21× (range: 15.16-30.86×).We then marked PCR duplicates using Picard (http://broadinstitute. github.io/picard).We performed an initial run of variant calling using the UnifiedGenotyper tool in GATK version 3.8 (McKenna and colleagues, 2010) and filtered the data to remove variants with a mapping quality <50, a quality depth <4, and a Fisher Strand score >50.We then used these variants to perform base quality score recalibration for each individual, before performing another run of the UnifiedGenotyper to call final variants.After the second run of variant calling, we removed variants with a mapping quality <40, a quality depth <2, and a Fisher Strand score >60.The final data set contained 13,233,829 SNPs across the 9 taxa.Finally, we ran UnifiedGenotyper with the EMIT_ALL_SITES option to output all variant and invariant genotyped sites.

Phylogenetic analyses and PCAs
We used RAxML version 8 to reconstruct the evolutionary relationships among the 9 taxa by concatenating variant sites from across the genome.To investigate patterns of phylogenetic discordance across the genome, we also built trees from windows across the genome.We phased SNPs using BEAGLE version 4.1 [97], using a window size of 100 kb and an overlap of 10 kb.We incorporated information on recombination rate from the genetic map and did not impute missing genotypes.After phasing, we used MVFtools (https://www.github.com/jbpease/mvftools) to run RAxML from 100-kb and 500-kb nonoverlapping windows, with the M. clevelandii samples set as the outgroups.We then visualized the window trees in DensiTree version 2.01 [98].
To assess concordance between the window-based trees and the whole-genome tree, we converted trees to distance matrices using the Ape package in R [99].We then calculated the Pearson's correlation coefficient between the distance matrix from each window and the whole-genome tree, with a stronger correlation indicating higher concordance with the wholegenome tree.We used one-dimensional autocorrelation analysis to determine whether variation in tree concordance was randomly distributed across the genome.This involved estimating the autocorrelation between genomic position and tree concordance for each LG with a lag size of 2 Mbp.The significance of the observed value for each LG was determined from a null distribution of autocorrelation coefficients estimated from 1,000 random permutations of the genome-wide data.
We also conducted a PCA based on all variant sites from across the entire genome using Plink version 1.90 [100].Initially, we ran the PCA with all 37 samples, but we consecutively reran the analysis by removing different taxa in order to assess clustering patterns among more closely related samples.

Population genomic analyses
To examine how genome-wide patterns of diversity, differentiation, and divergence varied among taxa, we calculated nucleotide diversity (π), between-taxon differentiation (F ST ), and between-taxon divergence (d xy ), respectively, in nonoverlapping and overlapping 100-kb (step size = 10 kb) and 500-kb (step size = 50 kb) windows using custom Python scripts downloaded from https://github.com/simonhmartin/genomics_general.We calculated measures of differentiation and divergence across all 36 pairwise comparisons among the 9 taxa, and diversity was estimated separately for each taxon.These scripts estimated π and d xy by dividing the number of sequence differences within a window (either within or between taxa) by the total number of sites in that window.To account for missing data, the script counted the number of differences between each sample, divided by the total number of variant sites that were genotyped within those samples, and then averaged across all pairs of samples.To provide an unbiased estimate of diversity and divergence, we incorporated invariant sites into the calculation by dividing the number of pairwise differences (within and between taxa, respectively) by the total number of genotyped sites (variant and invariant) within a window.F ST was calculated following the measure of K ST [101], Eq 9), but was modified to incorporate missing data using the same approach as π and d xy .We filtered the data separately for each taxonomic comparison, so that each site was required to be genotyped in at least 3 individuals for comparisons within the M. aurantiacus complex or at least 2 individuals for each comparison involving M. clevelandii.
We summarized the variation in each statistic across comparisons using a PCA, with taxon or taxon pair as the variables.Thus, across each window, PC1 for π, F ST , and d xy provided multivariate measures that explained the greatest covariance in the data.We used a one-dimensional autocorrelation analysis and permutation test to determine if the genome-wide patterns of PC1 π, F ST , and d xy departed from a random expectation, as described above for tree concordance (see Phylogenetic analyses and PCAs).
To examine the relationships among PC1 for diversity, differentiation, and divergence, we estimated Pearson's correlation coefficient among all 3 statistics across genomic windows.Further, we estimated correlations among these 3 statistics and tree concordance, gene density, and recombination rate.Recombination rate was estimated by comparing the genetic and physical distance (in cM/Mbp) between all pairs of adjacent markers on each LG from the genetic linkage map described above.We removed the top 5% of recombination rates, because these represented unrealistically high values of recombination.A minimum of 3 estimates was required to obtain an average recombination rate estimate within each window.Gene density was calculated from the number of predicted genes in each window, as determined from the annotation described above.Genes spanning 2 windows were counted in both.
To determine how the correlations among the statistics (diversity, differentiation, divergence, recombination rate, gene count) changed with increasing divergence time, we examined the correlation coefficient among all pairs of statistics individually for each of the 36 pairwise comparisons.Because diversity was measured within taxa, we calculated the mean value of π between each pair of taxa.Also, because many of the pairwise comparisons are nonindependent, we applied the phylogenetic correction outlined by Felsenstein and Coyne and Orr [102,103] to produce a statistically independent set of data points for this analysis.
We calculated the divergence time between M. clevelandii and M. aurantiacus based on a corrected estimate of sequence divergence (d a ) between individuals of M. clevelandii and all subspecies of M. aurantiacus combined.We then converted this value into a divergence time T (in generations) using the equation: T = da/(2μ), where μ is the mutation rate, 1.5 × 10 −8 [104,105].This value was then converted into years by multiplying by a generation time of 2 years.

Simulations
To assess the plausibility of different scenarios in producing heterogeneous genomic landscapes, we implemented forward-time Wright-Fisher simulations using SLiM [64,65].The basic model consisted of an ancestral population with a fixed population size of N = 10,000 that split after 10N nonoverlapping generations into 2 daughter populations, each with a fixed size of N.These populations were then allowed to evolve for a further 10N nonoverlapping generations.We simulated a 21 Mbp chromosome with a recombination rate of 1.5 × 10 −8 and a mutation rate of 10 −8 , both per base pair and per generation.
We explored the following 6 modifications of this basic model.(i) Neutral evolution: mutations did not impact fitness; (ii) BGS: mutations in the ancestor and daughter populations were neutral in the middle third of the chromosome but can be deleterious in the chromosome ends; (iii) BDMI: neutral and deleterious mutations occurred in the ancestor.After the split, we allowed migration between the 2 daughter populations.To simulate BDMIs, a fraction of the selected mutations was deleterious in each of the populations and neutral in the other.(iv) Positive selection: mutations in the ancestor and daughter population were neutral in the middle third of the chromosome but could be beneficial in the remaining regions.(v) Local adaptation: mutations in the ancestor and daughter population were neutral in the middle third of the chromosome but could be beneficial in the remainder.After the split, we allowed migration between the daughter populations.To simulate local adaptation, a fraction of the selected mutations was beneficial in each population and neutral in the other.Unlike the BDMI simulation, we allowed local adaptation from standing variation, so some of the mutations that were neutral in the ancestral population became beneficial after the split.BGS and positive

Fig 1 .
Fig 1. Evolutionary relationships and patterns of genome-wide variation across the radiation.(A) The black tree was constructed from a concatenated alignment of genome-wide SNPs and is rooted using M. clevelandii.The 387 gray trees were constructed from 500-kb genomic windows.The first number associated with each node or taxon is the bootstrap support for that clade in the whole-genome tree, and the second number is the percentage of window-based trees in which that clade is present.Flower photos were taken by the authors.(B) Levels of differentiation (F ST ), (C) divergence (d xy ), and (D) diversity (π) within and among taxa based on the same 500-kb windows.For simplicity, F ST and d xy are shown only for comparisons with the red ecotype of subspecies puniceus.See S7 Fig for the distributions of F ST and d xy across all pairs of taxa.https://doi.org/10.1371/journal.pbio.3000391.g001 Fig 2A; see S4 Fig for results from 100-kb windows and S5 Fig for plots along each chromosome).Rather than being randomly distributed, trees with lower concordance scores tended to cluster in relatively narrow regions of all 10 chromosomes (Fig 2A; autocorrelation analysis permutation tests p = 0.001-0.023;S6 Fig).This nonrandom pattern indicates that the rate of sorting varies along chromosomes, which could be due to variation in the strength of indirect selection across the bush monkeyflower genome.

Fig 2 .
Fig 2. Common genomic landscapes mirror variation in the local properties of the genome.(A) Tree concordance scores for 500-kb nonoverlapping genomic windows plotted across the 10 bush monkeyflower chromosomes.(B-D) Plots of PC1 for F ST , d xy , and π in overlapping 500-kb windows (step size = 50 kb).PC1 explains 66%, 70%, and 85% of the variation in F ST , d xy , and π, respectively, and is Z-transformed such that above average values have positive values and below average values have negative values.(E-F) Gene count and recombination rate (cM/Mbp) in overlapping 500-kb windows.(G) Mean f d (admixture proportion) in 500-kb nonoverlapping genomic windows.See S4 Fig for the same plot made for 100-kb windows.LG, linkage group; PC1, first principal component.https://doi.org/10.1371/journal.pbio.3000391.g002 Fig 1B; S7 Fig), with mean window-based estimates ranging from 0.06 (red versus yellow ecotypes of puniceus) to more than 0.70.Distributions of divergence (d xy ) show a similar pattern (Fig 1C), with mean values ranging from 0.54% (red versus yellow ecotypes) to 1.6% (yellow ecotype versus M. clevelandii).Broad distributions of window-based estimates indicate high variability in levels of differentiation and divergence among genomic regions (Fig 1B and 1C).Window-based estimates of nucleotide diversity also varied markedly (π; Fig 1D), ranging from 0.09% to 1.26%, even though mean estimates were very similar among the in-group taxa (0.37% to 0.53%) and were only slightly lower in M. clevelandii (0.26%).As with tree concordance, variation in these summary statistics was nonrandomly distributed across broad regions of each chromosome (p < 0.005; Fig 2; S4 Fig, S5 Fig and S6 Fig).To account for the magnitude of variation in these statistics across all 9 taxa (for π) or among the 36 pairs of taxa (for d xy and F ST ), we normalized the window-based estimates using Z-transformation and plotted them across the genome (S5 Fig).Visual inspection of these data revealed that patterns of genome-wide variation in each statistic were qualitatively similar in all comparisons.We therefore used PCA to quantify their similarity and extracted a single variable (first principal component [PC1]) that summarized the common pattern (S5 Fig).

Fig 3 ,
S8 Fig for scatterplots and S9 Fig for results at 100-kb scale).
Fig 2E and 2F; S4 Fig and S5 Fig).Regions of the genome with more functional elements tended to have a lower recombination rate (r = −0.40;p < 0.0001), leading to large variation in the predicted strength of indirect selection among regions.Consistent with the theoretical predictions outlined above, we found strong correlations between PC1 π and gene count (r = −0.84;p < 0.0001) and PC1 π and recombination rate (r = 0.44; p < 0.0001; Fig 3; S8 Fig and S9 Fig), indicating that diversity is indeed lower in regions where selection is predicted to have stronger indirect effects.However,

Fig 3 .
Fig 3. Correlations between measures of diversity and intrinsic features reveal the impact of heterogeneous indirect selection.Matrix of pairwise correlations between PC1 F ST , PC1 d xy , PC1 π, tree concordance, mean f d , gene density, and recombination rate, all estimated in 500-kb nonoverlapping windows.The heat map indicates the strength of the correlation and its sign.All correlations are statistically significant at p < 0.001.(See S8 Fig for a more detailed correlation matrix and S9 Fig for the correlation matrix from 100 kb windows.)PC1, first principal component.https://doi.org/10.1371/journal.pbio.3000391.g003

Fig 5 ;
See S11 Fig for results with d xy as a proxy for time).First, the presence of strong correlations between π (mean of the 2 taxa) and the distribution of genomic features barely changed with increasing divergence time between populations (Fig 5A and 5B; S5 Table

Fig 4 .
Fig 4. A large-effect adaptive locus shows a lineage-specific signature of positive selection.Plots of Z-transformed F ST across the genome, estimated in 500-kb sliding windows (step size 50 kb).The red line shows values between the red and yellow ecotypes of subspecies puniceus, whereas the gray lines show the values of all other comparisons.The dashed blue line shows PC1 calculated across all of the comparisons.The triangle marks the position of the gene MaMyb2.A cis-regulatory mutation that is tightly linked to this gene is responsible for the shift from yellow to red flowers.See S10 Fig for the same plot made for 100 kb windows.LG, linkage group; PC1, first principal component.https://doi.org/10.1371/journal.pbio.3000391.g004 Fig 5F; r = −0.94;Fig 6; S12 Fig).As our results only partially match the above predictions, it is possible that other forces aside from BGS are responsible for the rapid emergence of

Fig 5 .
Fig 5.The range of divergence times reveals static and dynamic signatures of recurrent indirect selection.Correlations between variables (500-kb windows) for all 36 taxonomic comparisons (gray dots) plotted against the average d a as a measure of divergence time.The left panels show how the relationships between π (each window averaged across a pair of taxa) and (A) gene count and (B) recombination rate vary with increasing divergence time.The middle panels (C and D) show the same relationships but with F ST .The right panels show the relationships between (E) d xy and π and (F) F ST and π.The regressions (dashed lines) in each plot are fitted to the 8 independent contrasts (colored points) obtained using a phylogenetic correction.The color gradient shows the strength of the correlation.https://doi.org/10.1371/journal.pbio.3000391.g005 Fig 2).We implemented 6 modifications to this basic model: (i) Neutral divergence: no mutations affect fitness; (ii) BGS: non-neutral mutations are deleterious; (iii) Bateson-Dobzhansky-Muller incompatibility (BDMI): As (ii), except that after the split, a fraction of non-neutral mutations is deleterious in one population and neutral in the other (randomly chosen).(iv) Positive selection: non-neutral mutations have positive fitness effects.(v) BGS and positive selection: (ii) and (iv) combined; (vi) Local adaptation: As (iv), but after the split some nonneutral mutations are beneficial in one population and neutral in the other.Migration between populations was allowed only in scenarios (iii) and (vi).See Materials and methods for more details and S6 Table for parameter values.To summarize the results, measures of variation (π, d xy , and F ST ) were calculated in 500-kb regions at 10 time points for each simulation.This broad range of scenarios generated quite different patterns of variation, both through time and across the chromosome (Fig 7; S13 Fig).As expected, the neutral model did not produce heterogeneous genomic landscapes.BGS produced a diversity landscape characterized by higher diversity in the unconstrained central third of the chromosome (Fig 7; S13 Fig and S14 Fig).Consistent with the predictions of Burri (2017a), π and d xy remained highly correlated over the course of the simulations.However, even with a high proportion of deleterious mutations (10%) and substantial fitness effects (1%), the variation in π and d xy was modest, as was variation in the differentiation landscape (Fig 7).
The 3 models with positive selection (positive, BGS and positive, and local adaptation) all produced highly heterogeneous genomic landscapes across the range of parameter values explored (Fig 7; S13 Fig).In all cases, positive selection in the ancestor created a highly heterogeneous diversity landscape that was inherited by each daughter population and was maintained for the duration of the simulation.As in our data, π and d xy were perfectly correlated at early divergence times, but the correlation decayed rapidly and even became negative in scenarios with stronger selection.Unlike in the BDMI scenario, in which the correlation between π and d xy decayed due to higher gene flow in the chromosome center, it broke down in these simulations because positive selection caused d xy to increase more rapidly in the chromosome ends (S13 Fig and S14 Fig).

Fig 6 .
Fig 6.Emergence of a heterogeneous differentiation landscape across 1 million years of divergence.(A) Plots of F ST (500-kb windows) across the genome for pairs of taxa at early (red versus yellow), intermediate (red versus parviflorus), and late stages (red versus M. clevelandii) of divergence.(B) Average nucleotide diversity (for the red ecotype of subspecies puniceus, yellow ecotype of subspecies puniceus, subspecies parviflorus, and M. clevelandii) across the genome in 500-kb windows.The geographic distribution (parapatric or allopatric), sequence divergence (d a × 10 −2 ), and correlation between F ST and mean π are provided next to each taxon pair.LG, linkage group.https://doi.org/10.1371/journal.pbio.3000391.g006

S11Fig.
The range of divergence times reveals static and dynamic signatures of recurrent indirect selection.Correlations between variables (500-kb windows) for all 36 taxonomic comparisons (gray dots) plotted against the average d xy as a measure of divergence time.The left panels show how the relationships between π (each window averaged across a pair of taxa) and (A) gene count and (B) recombination rate vary with increasing divergence time.The middle panels (C and D) show the same relationships but with F ST .The right panels show the relationships between (E) d xy and π and (F) F ST and π.The regressions (dashed lines) in each plot are fitted to the 8 independent contrasts (colored points) obtained using a phylogenetic correction, with the regression equation and strength of the correlation given in each panel.The color gradient shows the strength of the correlation.(EPS) S12 Fig. Negative correlation between nucleotide diversity and differentiation becomes stronger with increasing divergence time.Bivariate plots of the correlation between F ST and π at varying levels of sequence divergence (d xy ).(EPS) S13 Fig. Genomic landscapes simulated under different divergence histories.Each row of plots shows patterns of within-and between-population variation (π, d xy , and F ST ) across a 21-Mb chromosome (500-kb windows) at 10 time points (in N generations, where N = 10,000) for one parameter combination of 6 scenarios: neutral divergence, BGS, BDMIs, positive selection, BGS and positive selection, and local adaptation.The gray boxes in the first column show the areas of the chromosome that are constrained by selection.Mean centered (above line) and raw values (below line) of π and d xy .The parameter Ns modulates the average selective coefficient (where s = Ns/N), whereas Prop is the proportion of new mutations that are not neutral.Nm is the average number of migrants per generation.BDMI, Bateson-Dobzhansky-Muller incompatibility; BGS, background selection.(PDF) S14 Fig. Correlations between measures of variation change over the course of the simulations.Each row of plots shows how the correlations between d xy and π (left plot) and F ST and π (right plot) change over 10 time points (in N generations, where N = 10,000) in one of 6 scenarios: neutral divergence, BGS, BDMIs, positive selection, BGS and positive selection, and local adaptation.The red line in each plot shows the correlation for the simulation shown in Fig 7 of the main text.The gray lines show how correlations change for different combinations of parameter values in each scenario.The parameter Ns modulates the average selective coefficient (where s = Ns/N), whereas Prop is the proportion of new mutations that are not neutral.Nm is the average number of migrants per generation.Correlations represent the mean values estimated from 5 replicate simulations of each set of parameter values.BDMI, Bateson-Dobzhansky-Muller incompatibility; BGS, background selection.(PDF) S15 Fig.Evidence for widespread gene flow across the bush monkeyflower radiation.Genome-wide estimates of Patterson's D (blue circles) and the admixture proportion (f; red crosses) are shown for all 48 possible 4-taxon comparisons, with M. clevelandii as the outgroup in each test.The tests are ordered by increasing values of D, and each value is significant based on a block jackknife approach.The taxa included in each test are shown in the order P1, P2, P3.D is always positive, meaning gene flow occurs between P2 and P3.(EPS) S16 Fig. Variation in levels of admixture across the genome.The gray lines show measures of a modified test of the admixture proportion, f d , estimated in 500-kb nonoverlapping windows for 48 different 4-taxon comparisons, plotted across the 10 LGs of the bush monkeyflower genome.The blue line gives the mean value of f d , calculated by taking the average value across all 48 tests in each genomic window.LG, linkage group.(EPS)