Estimation of a Killer Whale (Orcinus orca) Population’s Diet Using Sequencing Analysis of DNA from Feces

Estimating diet composition is important for understanding interactions between predators and prey and thus illuminating ecosystem function. The diet of many species, however, is difficult to observe directly. Genetic analysis of fecal material collected in the field is therefore a useful tool for gaining insight into wild animal diets. In this study, we used high-throughput DNA sequencing to quantitatively estimate the diet composition of an endangered population of wild killer whales (Orcinus orca) in their summer range in the Salish Sea. We combined 175 fecal samples collected between May and September from five years between 2006 and 2011 into 13 sample groups. Two known DNA composition control groups were also created. Each group was sequenced at a ~330bp segment of the 16s gene in the mitochondrial genome using an Illumina MiSeq sequencing system. After several quality controls steps, 4,987,107 individual sequences were aligned to a custom sequence database containing 19 potential fish prey species and the most likely species of each fecal-derived sequence was determined. Based on these alignments, salmonids made up >98.6% of the total sequences and thus of the inferred diet. Of the six salmonid species, Chinook salmon made up 79.5% of the sequences, followed by coho salmon (15%). Over all years, a clear pattern emerged with Chinook salmon dominating the estimated diet early in the summer, and coho salmon contributing an average of >40% of the diet in late summer. Sockeye salmon appeared to be occasionally important, at >18% in some sample groups. Non-salmonids were rarely observed. Our results are consistent with earlier results based on surface prey remains, and confirm the importance of Chinook salmon in this population’s summer diet.

Introduction events are difficult to observe directly (e.g., [26]). Here, we report on a study using highthroughput sequencing of DNA from fecal samples to estimate the diet composition of the southern resident killer whale population in its core summer range.

Field Methods and Sampling
Field activities were based out of the San Juan Islands between May and September of 2006-2011 (Fig 1). Fecal samples were collected using two different previously described techniques: 1) a modification of a method developed by Ford and Ellis [9] for prey sampling that involved following a focal animal's ''fluke prints" until a sample was observed (for additional details, see [17]) or 2) using scent detection dogs to locate samples floating on the water's surface [27]. Samples were initially stored in plastic bags on ice packs and later stored at -20 C or -80 C prior to analyses. Samples were collected in U. An initial group of 244 fecal samples (subsequently reduced to 175 based on DNA quality considerations-see below) were sorted by sample collection date into 13 experimental groups (Fig 2; S1 Table). The groupings were designed to allow for comparisons between years and to capture shifts in the proportions of prey species consumed throughout the course of the late spring and summer. Samples were divided by year and samples collected in the same year were subdivided into groups using the following date ranges: early summer (May-July 25 th ), midsummer (July 26 th -September 4 th ), and late summer (September 5 th -30 th ) (Fig 2). The number of individual fecal samples contributing to each group varied due to differences in sampling effort and whale presence during different years and seasons. The date ranges for the groupings were selected after sample collection, considering natural breaks in the temporal sequence of samples and the number of samples in each grouping. The largest group included 32 samples and the smallest group included 4 samples (Fig 2; S1 Table).

DNA Extraction
We extracted DNA from fecal samples using a Qiagen QiaCube and the QIAamp DNA Stool Mini Kit (Qiagen, Valencia, CA) in a polymerase chain reaction (PCR)-free laboratory. Fecal samples affixed to substrate (sterile gauze or swabs) were trimmed and transferred into a 2mL extraction vial and feces were lysed directly on the substrate. For fecal samples not affixed to a substrate, DNA was extracted from approximately 200uL of feces. Samples containing large quantities of seawater were centrifuged at low speed prior to extraction to pellet the feces and excess seawater was decanted. For all fecal samples, DNA was eluted off extraction columns by incubating 100uL of elution buffer for 5 minutes at room temperature prior to centrifugation. Temporal distribution of fecal samples included in the analysis and of the approximated daily abundance of Chinook, sockeye and coho salmon in the San Juan Islands area. Each dot represents fecal samples collected on a specific day, with the area of the dot proportional to the number of samples on that day (smallest size = 1 sample). The dots are color-coded to indicate the within-year pools of samples that were combined for sequencing analysis, for a total of 13 year-by-season sample pools. Smoothed daily salmon abundance was estimated by local polynomial regression of daily catch-per-unit-effort data scaled by total annual run size (see S1 Text for details).
To confirm that each fecal sample was indeed killer whale feces, we amplified and sequenced the 16s region of the mitochondrial genome [28] and compared the sequence with known killer whale sequence. Samples were only collected in the proximity of known southern resident killer whale groups, and thus were unlikely to be from other populations of killer whales. In addition, the fecal samples were also genotyped at a series of microsatellite loci as previously described [27,29] to confirm the presence of killer whale DNA and in many cases identify the specific pod or individual of origin (S1 Table).

Prey Detection Primer Design
Previous work [9,10] has shown that the southern resident killer whale population does not prey on marine mammals, so our study focused exclusively on detecting fish prey. PCR primers were designed to amplify approximately 330bp of the 16s region of the mitochondrial genome of a wide range of potential fish prey species while excluding amplification of killer whale DNA. Primers were selected based on an alignment of more than 40 potential prey species representing all major fish families (S2 Table). Species were chosen based on previous studies or common presence in the whales' habitat. To avoid amplification bias among taxa, PCR was performed using an equimolar mix of two forward primers (Table 1). Primer "Salmon-F" is complementary to the priming site in salmonids and several non-salmonids and "Groundfish-F" is complementary to the sequence in most non-salmonids (S2 Table). PCR product length (excluding primers) ranged from 326bp to 341bp depending on the species.
The prey detection primers were subsequently modified to allow for indexing of individual pools during the library preparation step. Both forward primers (Salmon-F and Flatfish-F) and the reverse 16s-R were modified by the addition of Illumina primer sequence and overhang adaptors complementary to Illumina's Nextera index tag kit on the 5' ends (Table 1). Primers were HPLC purified to remove truncated primers that might compromise amplification specificity.

Quality Assessment and Sample Pooling
In order to assess the quality and initial quantity of prey DNA within each fecal sample, all fecal DNA extractions were screened via qPCR. The qPCR assay was designed to amplify a smaller (approximately 87bp) sequence nested within the 330bp 16s region described above. The assay used the same forward primers as the prey detection assay in combination with the reverse primer 16s-short-R: 5'-tccatagggtcttctcgtctt. Each reaction was carried out in a final volume of 12.5uL using 6.25uL of Power Sybr Green Master Mix (Applied Biosystems), 0.2uM of forward primers, 0.2uM of reverse primer, and 2uL of DNA. Reactions conditions were as follows: 95°C for 10min, followed by 40 cycles of 95°C for 15 sec; 60°C for 1 min, followed by a melt curve analysis. All reactions were performed using the ABI 7900HT Fast Real-Time PCR System. Each qPCR assay plate included a 1:10 dilution series of prey DNA standard. The prey DNA standard was created by pooling normalized DNA (quantified fluorometrically via a Qubit) from 8 fish species (the six salmon species plus halibut and English sole- Table 2). Fecal DNA extractions that failed to amplify or amplified poorly using the qPCR assay were removed from further analysis. A total of 175 samples remained after qPCR evaluation. Individual fecal DNA extractions were quantified against the standard curve, normalized within a group (as identified above), and pooled into one of 13 discrete pools of fecal DNA (see above for pool definitions). The normalized DNA pools were re-screened using qPCR to verify normalization. In addition to the experimental pools, two control pools were created by combining DNA extracted from 5 known southern resident killer whale fish prey species: Chinook, coho, and sockeye salmon; halibut and English sole. To create the control pools, genomic DNA was extracted from finclips or fish muscle tissue and screened via gel electrophoresis. High quality, high molecular weight genomic DNA samples were selected and quantified with the qPCR assay used to normalize fecal DNA extractions. The first control pool consisted of equal proportions of Chinook, sockeye, halibut and English sole DNA. The second control pool was made up of 40% halibut, 40% sockeye, 15% coho and 5% Chinook DNA.

Amplicon Library Preparation and Sequencing
Amplicon libraries were generated from each of the 13 experimental pools and 2 control pools using a 2-step PCR workflow provided by Illumina. The first PCR employed the modified prey detection primers to generate template libraries of individual pools: 13 separate 40uL reactions, Table 2. Reference Sequences used for species assignment. The second step of the workflow was a limited-cycle PCR that used the overhang adaptors to append P5 and P7 Illumina sequencing adapters and Illumina Nextera XT index tags to amplicons generated during the first round of PCR. The second PCR step was performed in 50uL reactions containing 8uL of gel purified PCR product from the first-step PCR, 1X NEB Phusion HF DNA Polymerase buffer (Ipswich, MA), 0.2mM of each dNTP, 5uL each of one Illumina Nextera forward and reverse index tag, and 1 unit of NEB Phusion HF DNA Polymerase. Each of the 15 DNA pools amplified in PCR #1 were re-amplified with a unique combination of 1 of 6 Nextera forward primers containing barcodes N01-N06 and 1 of 4 Nextera reverse primers containing barcodes S501-S504. We used PCR cycling conditions specified by the Illumina Nextera amplicon sequencing protocol as follows: 72°C for 3 min, 98°C for 30 sec, followed by 12 cycles of 98°C for 10 sec, 55°C for 30 sec, 72°C for 30 sec, and a final extension at 72°C for 5 min. The 15 indexed amplicon libraries were gel purified, quantified using the KAPA SybrGreen Library Quantification Kit (KAPA Biosystems, Boston, MA), normalized, and pooled prior to sequencing. The final pool of libraries was normalized to 10nM. In preparation for sequencing on an Illumina MiSeq, the pool was further diluted to 10pM and 5% PhiX was added to create diversity, optimize cluster formation on the MiSeq flowcell, and allow for real-time quality metrics during the run. Finally, 250bp paired-end sequencing was performed using a MiSeq Reagent Kit v2 for 500 cycles.

Data Analysis
Using the Illumina MiSeq Reporter Software, the indexed reads were de-multiplexed and FASTQ files were generated for each library. Reads were trimmed to remove adaptor and index sequences prior to further analysis. Because the length of the amplicon was~330bp, there was a 65-100bp region of overlap between the paired reads. We used the PANDAseq [30] software program to align and merge the paired reads into a single sequence, using the default alignment threshold of 0.6.
Merged reads were aligned against a reference sequence database using the BLAST+ command line program [31]. The reference sequence database consisted of the target 16s amplicon from 19 species including 6 salmonids (Chinook, coho, steelhead, sockeye, chum and pink salmon); 3 species of flatfish (Pacific halibut, Dover sole, and English sole); as well as lingcod, sablefish, rockfish, hake, herring, smelt, surfperch, dogfish, ratfish and killer whale ( Table 2). To account for intraspecific variation among the relatively closely related salmon species, we included 2-4 haplotypes for each salmon species except for pink salmon (Table 3). Most species pairs differed by >50bp, although some species, including the six salmon species, differed by <10bp (S3 Table). In order to further validate the power of the baseline to distinguish among the six Pacific salmon species, we also searched GenBank for 16s Oncorhynchus sequences, and found 40 such sequences (S4 Table). Of these, 39 assigned to the 'correct' species when using our baseline, the one exception being a putative O. mykiss sequence that was identified as O. kisutch using BLAST (S4 Table). This sequence, however, appears to have obtained from an unlabeled specimen in a Hong Kong market, and may therefore have been mislabeled [32]. Overall, however, this segment of the 16s gene appears to have sufficient power to distinguish among species of interest. Species identifications of the unknown sequences were based on the closest match to the reference database, after first removing alignments of <320 bp, >1 bp gap, and <95% sequence similarity to the best match in the database to reduce spurious assignments due less than full length or poor quality sequences. Sequences that had identical alignment scores to two different species were removed from further analysis.

Results
A total of 13,769,809 sequence reads were generated of which 12,586,467 (91.4%) passed the initial filter specified by the Illumina MiSeq software. Of the reads that passed the filter, 79.3% were assigned to an index. After assembling the paired reads, there were 5,168,233 total sequences, ranging from 228,906 to 518,974 sequences per group. Over the 13 pools, the mean sequence identity and length of the best-fit alignment for each of the sequences were 99.25% and 324.5 bp, respectively. After filtering for sequence length, gaps, and percent sequence identify (see above), 4,987,107 alignments remained with mean sequence identity and length of 99.5% and 325.8 bp, respectively. A total of 45,881 killer whale sequences were detected (0.9% of sequences), indicating that our primer design was generally successful at limiting amplification of host (killer whale) vs. prey DNA. For the two control pools, there were 667,983 paired reads, which were reduced to 591,593 after filtering for length, gaps, and percent sequence identity. A small number (<0.5%) of sequences resulted in identical blast scores to two different species, and these were removed from further analysis.
For the salmon sequences in the control pools, the BLAST reference sequences and the query sequences were derived from the same individuals, allowing for a rough characterization of sequencing error rates. The average percent sequence divergence between the aligned salmon control and query sequences was 0.6%, consistent with the 0.5-1.0% substitution error rate that has been previously reported for MiSeq sequencing [33].
The estimated species composition in the two control groups differed somewhat from expectations (Table 3). In control group 1 (equal proportions of four species), the two salmon species (Chinook and coho) were overrepresented (25% expected, 30% observed) at the expense of the two groundfish species (25% expected, 20% observed). In control group 2, sockeye and coho salmon were within 1 percentage point of their expected values (40% and 15%, respectively), and Chinook was overrepresented (5% expected, 12% observed) at the expense of halibut (40% expected, 34% observed).
Salmonids made up >98.6% of the sequences, with halibut and herring being the most abundant non-salmonids at <1% each (Table 4). Within sample groups, the lowest percentage of salmonid sequences was 90.5% in the mid-summer 2011 sample, with the remainder being primarily herring (9.4%). The late summer 2007 sample also had a relatively high percentage of non-salmonids (6.4% halibut). All other samples contained >99% salmonid sequences. Of the six salmonid species, Chinook salmon was the most common at 79.5% of the overall sequences (Table 4). Coho salmon was also relatively common (15% of sequences overall), and three other species (chum, sockeye and steelhead) had small overall contributions of <3% each. There were some clear patterns over the course of the summer. The early summer samples were >96% Chinook salmon in all five years. The mid-summer samples were also mostly Chinook salmon, but in 2008 and 2011 also contained some sockeye salmon (12.1% and 18.3%, respectively). The late summer samples had the lowest proportions of Chinook (30-75%, average of 51.3%), and contained substantial fractions of coho salmon (18-60%, average 43.5%). The samples were obtained from > 54 individuals across all three pods (S1 Table,  Table 5), so these results are likely to be representative of the population as a whole rather than individual level variation. Most of the samples were obtained from the west side of San Juan Island (Fig 1), consistent with the whales' frequent use of this area [34].

Discussion
Using high-throughput sequencing of DNA extracted from killer whale feces collected in the field, our results confirm earlier studies [9,10,17] indicating that salmon, and especially Chinook salmon, are by far the dominant component of the southern resident killer whale diet during the summer months. The 79.5% overall estimated proportion of Chinook salmon sequences for the entire May-September time period is nearly the same as the 80% a prior study obtained based on analysis of surface prey remains [17]. The second largest estimated diet component, coho salmon, was also similar in both studies, with 15% in the current study and 9% based on prey remains. The marked similarity in estimated diet composition between the fecal analysis and the prey remains analysis confirms the utility of the surface collection techniques [9], and suggests that at least during the summer time period the prey consumed near the surface are unlikely to be a taxonomically biased sample of the whales' diet.
Despite the overall prevalence of Chinook salmon, there was also some evidence of dietary shifts over the course of the summer. Most notable was a marked shift toward coho salmon in the late summer in all four years for which we had late summer samples (a shift that was also observed in prey remains [17]). The other apparent seasonal shift was a spike (up to 18%) in sockeye salmon in mid-summer in two of the four years. A smaller spike of sockeye in midsummer was also apparent in an earlier prey remains study on this population, but this was based on very limited data (a total of only 4 sockeye prey remains; [17]).
In a broad sense, these diet shifts from Chinook salmon to other salmon species generally coincide with the run timing of the salmon species to the San Juan Islands area. In particular, sockeye salmon and coho salmon have a mid-and late-summer run timing distribution, respectively, and these are also the time periods in which these species appear in the killer whale diet (Fig 2). However, our results also clearly confirm previous observations [9] indicating that the whales are not consuming the salmon species in proportion to their abundance. In particular, during the peak of their run, sockeye salmon are often greater than 10-fold more abundant than Chinook salmon, but Chinook salmon were estimated to contribute >70% of the whales' diet, even during the peak of sockeye runs. Even in 2010 when a particularly large run of sockeye returned to the Fraser River (Fig 2) during a time period well sampled by our fecal analysis, the estimated contribution of sockeye to the whales' diet was close to zero (Table 4). Similarly, the Fraser River has a large run of pink salmon that also returns in midsummer in odd-numbered years, but pink salmon sequences were entirely absent from the fecal samples. The primer regions for pink salmon are identical to the other species (S2 Table), so PCR failure is not a plausible explanation for the absence of this abundant species from the fecal samples.
In contrast, the shift toward coho salmon in the late summer clearly does coincide with the presence of this species in the area (Fig 2). During this time period, overall coho abundance is of similar magnitude to Chinook abundance, but coho are generally increasing during this time while Chinook are decreasing. Our results suggest that coho salmon may be a more important component of the southern resident whales' diet than has been previously appreciated. A similar fish-eating population of killer whales in Prince William Sound has been observed to eat primarily coho salmon [35], and our finding of substantial seasonal coho consumption suggests some additional diet similarities between these geographically distinct populations.
Overall, our results clearly support and extend previous estimates of southern resident killer whale diet based on prey remains and non-quantitative (presence/absence) analysis of fecal DNA. However, quantification of DNA sequence abundance in feces has some potential biases and limitations, only some of which are under experimental control, and these limitations should be considered when interpreting our results. First, our study was designed to detect fish prey only; potential prey from other taxa such as marine mammals or invertebrates would not be detected. Although prior studies [9,10,17] have indicated that these killer whales specialize on fish prey, this limitation must be kept in mind.
Second, even considering only fish prey, estimation of diet by fecal DNA sequence quantification makes a number of assumptions, including unbiased collection of feces with respect to prey consumed, equal concentration of mitochondria and digestibility in all potential prey species, and non-biased amplification and sequencing of DNA from all consumed prey [25,36]. Our primers were designed to amplify a large variety of known and potential fish prey species in an unbiased way. The reasonably close correspondence between the expected and observed species compositions in the control pools supports the idea that our results are unlikely to be substantially biased by PCR or sequencing errors, although the higher amplification rates of Chinook and Coho salmon suggest that rare species may be detected less often than they are actually represented in the diet. The controls were performed using extracted DNAs, however, not with equivalent weights or volumes of fish tissue. We have thus not controlled or tested for factors such as differences in mtDNA concentration or digestibility which have been shown to vary between species [25].
Conducting controlled diet experiments in a captive setting would be useful in testing for species quantification biases [36,37]. At this point we therefore consider the diet estimates generated in this study to be approximations, rather than precise estimates. On the other hand, we are reasonably confident in the absence from the diet of any species in our baseline that were not detected in the millions of sequences that we generated and that were demonstrated to amplify with our primers. Overall, the close correspondence between the fecal DNA results and earlier prey remains results lends credence to the estimates derived from both methods.
Estimating diet by quantifying the relative abundance of prey DNA in a predator's feces has been applied to a variety of taxa [25,38], but has had limited applications to cetaceans [26,39]. This method has the potential to be particularly useful for species, such as cetaceans, that are difficult to observe extensively in the wild. Estimation of predator diets has become increasingly important as formerly rare predator species have increased in abundance due to successful protection under laws such as the U.S. Endangered Species Act or Marine Mammal Protection Act and are preying on other species of concern [6]. Unlike other North Pacific killer whales [40], the southern resident population has not experienced consistent population growth following the implementation of protections in the mid-1970s [14]. An accurate understanding of the whales' diet, along with the diets of their potential competitors, across the full range of seasons and habitats used by the whales will be helpful in identifying recovery actions that are likely to positively impact this endangered population.
Supporting Information S1 Table. Table of Table. BLAST results of Pacific salmon 16s sequences from GenBank against our baseline.