Out of the Pacific and Back Again: Insights into the Matrilineal History of Pacific Killer Whale Ecotypes

Killer whales (Orcinus orca) are the most widely distributed marine mammals and have radiated to occupy a range of ecological niches. Disparate sympatric types are found in the North Atlantic, Antarctic and North Pacific oceans, however, little is known about the underlying mechanisms driving divergence. Previous phylogeographic analysis using complete mitogenomes yielded a bifurcating tree of clades corresponding to described ecotypes. However, there was low support at two nodes at which two Pacific and two Atlantic clades diverged. Here we apply further phylogenetic and coalescent analyses to partitioned mitochondrial genome sequences to better resolve the pattern of past radiations in this species. Our phylogenetic reconstructions indicate that in the North Pacific, sympatry between the maternal lineages that make up each ecotype arises from secondary contact. Both the phylogenetic reconstructions and a clinal decrease in diversity suggest a North Pacific to North Atlantic founding event, and the later return of killer whales to the North Pacific. Therefore, ecological divergence could have occurred during the allopatric phase through drift or selection and/or may have either commenced or have been consolidated upon secondary contact due to resource competition. The estimated timing of bidirectional migration between the North Pacific and North Atlantic coincided with the previous inter-glacial when the leakage of fauna from the Indo-Pacific into the Atlantic via the Agulhas current was particularly vigorous.


Introduction
The classical interpretation of the theory of speciation suggests that an allopatric phase of two or more populations of the ancestral species is essential to prevent gene flow and achieve reproductive isolation [1,2]. Divergence and reproductive isolation can then be consolidated if the two divergent forms come into secondary contact, for example through reinforcement, e.g. resource competition leading to selection for extreme phenotypes and reduced hybrid fitness [3,4]. However, recent theoretical models have demonstrated potential mechanisms for sympatric speciation [5][6][7][8], and several empirical studies have argued that such mechanisms do operate in nature [9][10][11][12]. Sympatric speciation is likely to require strong natural selection on traits that are heritable and which are also linked to mate choice to overcome the homogenizing effect of gene flow [5][6][7][8]. Identifying the origins and subsequent dispersal and migrations of taxa can help elucidate the geographic context and underlying mechanisms and processes responsible for population structure and/or speciation (e.g. [13][14][15]). For example, did divergence take place during sympatry, or did sympatry arise from secondary contact; was divergence vicariant or with low levels of on-going gene flow; what was the relative importance of drift and selection (e.g. [13,14,16]). Applying phylogenetic and coalescent analyses to address such questions can provide insights into the species' geographic origins, historic radiations, and the mode of speciation (e.g. [13,[16][17][18][19][20]).
The killer whale (Orcinus orca) is a useful model species with which to investigate the spatial and temporal context of genetic divergences, as both geographic separation and ecological specialisation appear to be drivers of population structure [21][22][23][24]. In this study we aim to resolve the outstanding question of the mode of evolutionary and ecological divergence of three killer whale ecotypes that currently occur in partial sympatry in the eastern North Pacific.
Although currently considered a single species, the killer whale has radiated into morphologically and ecologically disparate types in the Antarctic, North Atlantic and North Pacific [25][26][27][28][29]. A previous study indicated that the worldwide pattern of mitochondrial DNA (mtDNA) diversity for this species was consistent with a historical bottleneck, followed by a rapid radiation leading to a stochastic geographic distribution of lineages [29]. A number of analyses were presented which supported this hypothesis; mtDNA haplotypes were shared between ocean basins, the phylogeny was star-shaped and mismatch distribution was unimodal, and variation at five nuclear microsatellite loci was also low [29]. However, the phylogenetic analyses used were limited to the mitochondrial control region (995 base pairs), which represents only 6% of the killer whale mitochondrial genome [23]. Many sites within this often hyper-variable region of the mammalian mtDNA are believed to be mutational hot spots [30,31] resulting in a loss of phylogenetic resolution [19]. Given these limitations, a larger phylogenetic analysis [23] that used complete mitogenomes was subsequently performed which demonstrated almost complete lineage sorting of all killer whale types, and dated divergence times of several major clades to prior to the date of the bottleneck suggested by Hoelzel et al. [29].
The reconstructed mitogenome phylogeny indicated that three Antarctic lineages (AntA1, AntA3, AntA4), all classified as type A (see [26]), and three North Atlantic lineages (ENACI2, WNAGM, ENA2S2) are sister taxa and cluster together in a paraphyletic clade with the other Antarctic lineages [24]. This suggests a more recent movement of these lineages from the Antarctic into the North Atlantic, resulting in secondary contact between North Atlantic clades [24]. The three Antarctic types for which mitogenomes were sequenced (types A, B & C) are sister taxa [23], suggesting that evolutionary divergence occurred in Antarctica and potentially in the face of ongoing gene flow. However, studies with strong evidence for sympatric speciation are typically those in which the geographic context is small and isolated enough to rule out any allopatric phase, e.g. fish in crater lakes or birds and plants on small remote oceanic islands [10][11]. Past glaciations and the strong ocean temperature gradient across the polar frontal zone that separates the Antarctic and sub-Antarctic are potential barriers that could have resulted in the Antarctic types being temporarily isolated from one another in localised refugia [32]. However, these are arguably the most morphologically divergent of killer whale types [22,33], and the only types for which natural selection on heritable traits has been demonstrated to date [34]. Stronger selection may therefore have driven phenotypic divergence despite the potential for on-going lowlevels of gene flow between Antarctic types.
There was a relatively large phylogenetic divergence between the North Pacific 'transient' type and the other two North Pacific types ('resident' and 'offshore'), which were clustered with killer whales sampled in the North Atlantic. This could imply that sympatry with the transient type arose from secondary contact. However, the chronology of divergences amongst the North Atlantic and North Pacific offshore and resident clades was not clearly resolved, and there was low support (posterior probabilities of 0.299 and 0.533) for divergences at the two nodes of these clades [23]. Therefore, the sequential radiation of currently delineated killer whale types has not yet been fully resolved.
In this study we apply novel analyses to a previously published dataset of killer whale mitogenome sequences to further investigate if the North Pacific ecotypes diverged in the Pacific and there was subsequent founding of the Atlantic populations, or if sympatry of the North Pacific ecotypes arises from secondary contact following an allopatric phase during which the ancestors of the resident and offshore lineages were in the Atlantic.

Materials and Methods
Individuals were assigned to type before sequencing based upon morphological or behavioural data (see [23,34]). The North Pacific transient type is a mammal-eating specialist, the Pacific resident type is a fish-eating specialist, and the Pacific offshore type is also thought to be piscivorous, with sharks appearing to be a key component of the diet [24,[35][36][37][38][39]. There are subtle morphological differences among the North Pacific types [25,40], which have been confirmed by molecular analysis to be reliable in determining type in the absence of predation observations [41]. When assignment of an individual to type was not possible prior to sequencing, the individual was classified as type unknown.

Phylogenetic analyses
Firstly, we aimed to improve phylogenetic resolution and increase the support at the nodes at which clades 2-5 diverge by using partitioned data, which can improve phylogenetic resolution (e.g. [42]). We used the mitogenome sequences of 143 individuals, which have been published by two previous studies [23,24] (see Table S1 for accession numbers). This included 135 complete and 8 partial mitochondrial genome sequences, and the long-finned pilot whale, Globicephala melas, as an outgroup sequence [23]. Sequences were aligned and subsequently pruned to retain only the 2 rRNA and 13 protein-encoding genes (ca. 14,038 bp in length). The control region was excluded from analysis due to suspected saturation and repetitive sequences. The resulting alignment was divided into four partitions: the first, second and third codon sites of the protein-coding genes (3,827 bp per partition) and the rRNA genes (2,557 bp). The most likely model of evolution that was compatible with models implemented in MrBayes 3.1.2 [43] was selected for each partition using jModelTest 1.1 [44] using the corrected Akaike Information Criterion (AIC). The selected models were GTR+G for the first codon position of the protein-coding genes and HKY for the second and third codon positions and the rRNA genes. Four independent Monte Carlo Markov chains (MCMC) were run simultaneously for 1,000,000 generations with the current tree and parameter values sampled every 100 generations. Convergence was judged to have occurred when the standard deviation of split frequencies across the chains was ,0.01 after 1,000,000 generations resulting in 10,000 trees, of which the first 25% (2,500) were discarded as burn-in. The potential scale reduction factor (PSRF) was 1.0 for all parameters. The majority-rule consensus tree was summarised from the final 7,500 trees, and support for clades was expressed as posterior probabilities.
Secondly, we applied an additional approximately unbiased test of phylogenetic tree selection to further investigate the most likely branching order of clades 1-5. Maximum-likelihood phylogenetic analysis of the non-partitioned protein-coding and rRNA gene sequences (14038 bp) was performed, using PhyML 3.0 [45], using the best-fit model of nucleotide substitution of HKY for the nonpartitioned data, selected as above. The transition/transversion ratio, the proportion of invariable sites, and the gamma distribution were estimated using PhyML. The starting tree was estimated using BIONJ algorithm implemented in PhyML. The site-wise likelihood values recovered from the PhyML analysis were subsequently used to evaluate the levels of statistical support for alternative topologies, each consistent with a different chronology of divergences. Statistical support for each topology was evaluated from the P-values of approximately unbiased (AU) tests [46] estimated using Consel 0.1 k [47]. The AU test estimates the probability that the tree is the true tree by calculating the approximately un-biased P-values from the change in the bootstrap probability values along the changing sequence length [46]. Trees can be rejected as representing the true tree at a significance level a,0.05 [46].

Genetic diversity measures
Step-wise founding events are expected to lead to a concurrent step-wise reduction in genetic diversity with each subsequent founding event (e.g. [48]), though this will also be influenced by demographic factors such as subsequent effective population size and immigration. We used DnaSP v5 [49] to calculate a number of genetic diversity measures for each clade with at least 4 complete sequences: the number of segregating sites (S), the number of segregating sites per nucleotide site (P S ), haplotype diversity (h h ), nucleotide diversity (p) and Tajima's D [50]. We use the sub-notation 'h' to differentiate between haplotype diversity (h h ) and Watterson's theta (h), which is an estimate of population mutation rate and is a parameter in the isolation with migration analysis below.

Isolation with migration analysis
To further investigate the temporal pattern of migration between ocean basins we estimated population sizes, the time of splitting, and migration rates by implementing the 'isolation with migration' (IM) model of population divergence [51] using IMa [52], run on the BioHPC clusters at Cornell University. Initial runs without data (-j0) were used to assess the boundaries of the assumed prior distributions for parameters and to ensure that the posterior distributions fell within the prior range. Non-partitioned sequences of the protein-coding and rRNA genes were used in the input file. The substitution rate per year for the locus (4610 25 ) was estimated by multiplying the per-site values for the complete mitogenome in Morin et al. [23] by 14038. A published generation time of 15 years estimated from long-term photo-identification studies [53] was used. Five independent replications of 100,000,000 iterations, with a 10% burn-in period under a finite sites model (HKY) were performed and then combined in an L mode analysis. To ensure convergence, the five independent runs were compared for repeatability, the mixing properties of the MCMC were assessed by visual inspection of the parameter trend plots and by examining the level of autocorrelation between initial and final parameter values using the effective sample size (ESS), and ensuring that update rates were greater than 20%. Convergence upon the stationary distribution was accepted if independent runs generated similar posterior distributions, with each having ESS of at least 50 for each parameter as recommended by Hey & Nielsen [51]. The posterior estimates of the model were scaled into demographic units using the locus mutation rate per generation (m, based upon the per year estimate above and assuming a generation time of 15 years). The population mutation rates (Watterson's theta; h 1 , h 2 and h A ) were converted into estimates of effective population size parameters (N 1 , N 2 and N 3 ) where N e = h/2m, time since splitting (t) was converted to calendar years (t = t/m), and directional migration rates (m 1 , m 2 ) were converted to the effective number of female migrants per generation (M = mm).

Results and Discussion
The results of this study provide further resolution of the chronology of phylogenetic divergence and the movement between ocean basins of killer whale types, providing new insights into the geographic context and potential mode of speciation. In particular, these new analyses strongly suggest an allopatric phase and secondary contact of the partially sympatric Pacific ecotypes. The ancestral maternal lineages of the Pacific resident and offshore ecotypes appear to have migrated back to the Pacific from the Atlantic.

Phylogenetic analyses
Phylogenetic estimates were consistent across both Bayesian analysis of the partitioned sequence data (Fig. 1, Fig. S1) and maximum-likelihood analysis of the unpartitioned sequence data (not shown), and with those proposed in previous studies [23,24]. The analysis of the partitioned data resulted in higher support at the two nodes that previously had low support [23,24], however, the support for the node at the divergence of clades 4 and 5 was still relatively low (0.55).
The only topology with strongly significant support (P = 0.974) from the approximately unbiased test run using Consel 0.1 k was consistent with a series of dispersal events rather than vicariant splits (Fig. 2). The results strongly imply that the partial sympatry between the North Pacific transient, resident and offshore lineages arises out of secondary contact following the migration of the resident and offshore lineages from the North Atlantic to the North Pacific. Migration appears to have been bidirectional as a second North Atlantic clade is nested within the North Pacific clade containing the offshores. We could reject with a level of significance of a#0.05 all topologies not consistent with the resident and offshore clades having originated from the North Atlantic. There were two alternative topologies that we were unable to significantly reject, but which had very low support (P = 0.219 and P = 0.056). These differed in the order of dispersal of the North Pacific offshore clade and the second North Atlantic clade (Fig. 2). Divergences among these clades are estimated to have occurred relatively rapidly approximately 200 (80-300 95% HPDI) KYA [23]. Such rapid divergences would explain the uncertainty over the chronology of dispersal and the relatively low support (0.55) at the node between the mixed clades 4 and 5.

Genetic diversity measures
Genetic diversity summary statistics h and p were also consistent with the inferred chronology of founding events from a North Pacific source population to the North Atlantic and then back to the North Pacific (Table 1). However, these results need to be interpreted cautiously as demographic expansion and selection can also influence these values. Values for Tajima's D were nonsignificant (P.0.1) and therefore not indicative of a strong departure from mutation-drift equilibrium. It should be noted that h h assumes that mutations are neutral, and samples are drawn from a randomly mating population that has remained constant in size. If h h tends to be greater than p, as is the case here, then this can indicate that the assumptions required for estimating h h are not met. The fact that the samples within each clade were not drawn from a single randomly mating population would make this highly likely. However, h h and p are highly correlated (r 2 = 0.872) within our dataset and p is not subject to the same assumptions. Therefore the difference in diversity summary statistics between clades does not appear to be simply an artefact of an undetected departure from mutation-drift equilibrium.
Nucleotide diversity (p) was relatively low and Tajima's D statistic was negative for all clades in Table 1, which although nonsignificant, could still suggest a historic population expansion. Hoelzel et al. [29] suggested a bottleneck event ca. 150-210 KYA. In addition to the estimated time to most recent common ancestor (TMRCA) of the two North Atlantic clades and the North Pacific resident and offshore clades being approximately 200 KYA, Morin et al. [23] also estimated the TMRCA for the North Pacific transients at 0.19 (0.10-0.31 95% HPDI) MYA. A bottleneck for several clades during the glacial maximum at approximately 200 KYA [54] does therefore appear to be a potential cause of reduced genetic diversity at both mtDNA and nuclear loci [29]. However, the analysis of the mitogenome sequences differed from analysis of the mtDNA control region [29], as it suggested that many of the known killer whale ecotypes had diverged prior to this glacial maximum [23]. A historical bottleneck at approximately 200 KYA would have accelerated coalescence and lineage sorting of the clades that had already diverged at this point in time [55]. This would also further confound interpretation of diversity measures as indicators of the chronology of divergences.

Isolation with migration analysis
The independent replicated IMa runs provided consistently similar posterior probability distributions for all parameters and displayed good mixing properties of the MCMC chains as indicated by the parameter trend line plots and ESS values of greater than 50. Although the mitochondrial genome is a single locus, the analyses produced clear curves with consistent posterior distributions, suggesting good resolution of the 6 parameters being estimated.  The results also suggest migration of maternal lineages has been bi-directional between the North Pacific and North Atlantic, but at a very low level; the number of effective migrants per 1,000 generations (15,000 years) was estimated at 0.058 (0.003-0.129 90% Highest Probability Density Interval, HPDI) into the North Atlantic, and 0.043 (0.003-0.099 90% HPDI) into the North Pacific. Based on this estimate the number of founding females of the resident and offshore lineages would be very small, potentially even a single female. The establishment and persistence of reproductively isolated and evolutionary divergent populations from a single migrant following secondary contact have been noted in other species e.g. Darwin's finches (Geospiza fortis) [56]. Such case studies highlight how stochastic events can play a key role in speciation [56]. The estimates of the number of effective females, (N Atlantic = 68,942, 90% HPDI = 33,564-103,144; N Pacific = 66,608, 90% HPDI = 41,404-90,894), appeared large, but not unrealistic, given that abundance estimates from line transect surveys have been in the tens of thousands for waters around Iceland in the North Atlantic [57], and in the thousands for a relatively small coastal area around the Aleutian chain in the North Pacific [41]. Migration from un-sampled 'ghost' populations can also moderately inflate estimated effective population size [52]. As the ancestors of North Atlantic lineages within clade 6 (Type 2 and Mid-Atlantic) have clearly migrated from the Antarctic, the estimated N e of the ancestral female population may therefore be more representative of the female N e for the genus as a whole. The estimated time of splitting between killer whale lineages in the North Pacific and North Atlantic was 630 (270-1000 90% HPDI) KYA, which is consistent with phylogenetic estimates of the time since splitting of all other killer whales from the North Pacific transients of 702 (489-956 95% HPDI) KYA [23]. This would be consistent with an initial radiation during the inter-glacial that commenced approximately 650 KYA [58].
The IMa analyses suggested that migration events of females between the North Atlantic and North Pacific have taken place in both directions (Fig. 3). There is an overlap in the peak of the probability density distributions for migration events in each direction (Fig. 3), which occurs during the previous inter-glacial period (approximately 131-114 KYA BP [54]). This peak coincides with a period of maximal 'Agulhas leakage': the vigorous exchange of fauna that occurred between the Indo-Pacific and the southwest Atlantic during the inter-glacials of the late Pleistocene, promoted by an enhanced Agulhas current around the southern tip of Africa [59]. Fish stocks thought to have colonised the North Atlantic and Mediterranean during these episodic oceanic interchanges include other top marine predators such as the great white shark (Carcharodon carcharias) [20] and species predated by killer whales such as bluefin tuna (Thunnus thynnus) and swordfish (Xiphias gladias) [60]. This could therefore have been the driver of a period of rapid episodic inter-oceanic interchange of killer whale lineages. A secondary more recent peak of migration into the North Atlantic was inferred to have occurred at the start of the current inter-glacial, which started approximately 11 KYA BP [54] (Fig. 3). Inspection of the phylogenies in the Figure  S1 and reference [23] suggests this secondary peak is probably inferred from the inclusion of a sequence of an individual sampled in the Atlantic (Newfoundland Canada) that had a haplotype (WNAUCAN) nested within the North Pacific offshore haplotypes in clade 5.

Insights into the mode of divergence
Although ecologically and sometimes morphologically disparate types of killer whale currently exist in partial sympatry in the Antarctic, Atlantic and Pacific, in at least two of these ocean basins (the North Pacific and North Atlantic) this appears to result from secondary contact rather than sympatric divergence. An isolated allopatric phase would allow the accumulation of mutations through drift and/or selection leading to genetic, ecological and phenotype divergence between isolated populations. Genetic divergence could lead to reproductive isolation through the epistatic interactions, including Dobzhansky-Muller incompatibilities between mutations accumulated in different populations [61]. However, incompatibilities could have arisen, or have been further promoted upon secondary contact through resource competition promoting specialisation and further divergence [3,4].
The new analyses of partitioned mitochondrial genome sequences presented here provide further resolution of the matrilineal history of killer whale types. However, mtDNA may not fully reflect the underlying pattern of divergence and lineage formation if there is male mediated gene flow between ecotypes upon secondary contact. Previous results based on nuclear data (microsatellites) found a mid-Atlantic population containing individuals from all three Atlantic clades, indicating that secondary contact in killer whales does not always promote reproductive isolation [24]. Microsatellite analyses also suggest low levels of gene flow between Pacific killer whale ecotypes [22]. Therefore, future work should focus on comparing nuclear gene allele frequencies among populations and types. Figure S1 Bayesian phylogeny of the samples analysed showing internal nodes of each clade. Branch colours indicate geographic origin of samples as follows: Antarctic (black), Atlantic (blue) and Pacific (red). Bold numbers indicate the basal node to the clades as used in the AU test and genetic diversity summary statistics. Posterior probabilities are given for nodes of interest. The tree is rooted with long-finned pilot whale (not shown). (TIFF)