Figures
Abstract
Understanding the distribution and abundance of marine mammals is important for assessing population dynamics and evaluating the impacts of human activities on these species. Here, we assessed the capability of microbial and small plankton communities to predict the density of Balaenopteridae whales in the Southern California Current Ecosystem in each season from 2014 to 2020 using data from the California Cooperative Oceanic Fisheries Investigations (CalCOFI). Densities of Balaenopteridae whales were estimated from visual line transect surveys for three target species – blue (Balaenoptera musculus), fin (Balaenoptera physalus), and humpback (Megaptera novaeangliae) whales – and microbial and small plankton communities were examined in concurrent water samples via metabarcoding of the 16S and 18S rRNA genes. Planktonic communities specific to each target whale species appeared as strong statistical predictors of whale estimated density, explaining 81–99% of variability and predicting density estimates to within ~1 individual per 1000 km2. Our approach improved out-of-sample root mean square prediction error by up to 65% compared with simple alternative methods. Specific planktonic communities observed indicate that some predictor taxa may be ecologically associated with whales as parasites, as skin and respiratory microbiome species, or through the food chain of whale prey. However, further studies are needed to understand how these organisms function collectively as a community and interact with the “ecological habitat” that supports whales. Our results suggest that using planktonic communities to quantify the potential ecological habitat of larger organisms, like baleen whales, can enhance predictive models and may inform hypotheses about the ecological relationships between whales and the biological communities with which they co-occur.
Citation: Satterthwaite EV, Ruiz TD, Patin NV, Alksne MN, Thomas L, Dinasquet J, et al. (2026) Microbial and small zooplankton communities predict density of baleen whales in the southern California Current Ecosystem. PLoS One 21(5): e0334209. https://doi.org/10.1371/journal.pone.0334209
Editor: Vitor Hugo Rodrigues Paiva, MARE – Marine and Environmental Sciences Centre, PORTUGAL
Received: September 23, 2025; Accepted: March 6, 2026; Published: May 6, 2026
Copyright: © 2026 Satterthwaite et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Sequence data generated in this study have been deposited in the NCBI Sequence Read Archive under BioProject accession numbers PRJNA555783, PRJNA665326, and PRJNA804265. Sighting data, density estimates, NCOG sample metadata, taxonomic annotations, and data used for modeling are publicly available via Zenodo at https://doi.org/10.5281/zenodo.15678927. R code to reproduce the analyses is available on GitHub at https://github.com/ruizt/marine-mammal-edna.
Funding: This material is based upon research supported by: Office of Naval Research (N00014-22-1-2719 to EVS, SBP, BXS, and LT); National Oceanic and Atmospheric Administration (NA15OAR4320071 and NA19NOS4780181 to AEA); Simons Foundation Collaboration on Principles of Microbial Ecosystems (PriME) (970820 to AEA); US Navy Pacific Fleet (N62473-18-2-0016, N62473-19-2-0028, and N62473-16-2-0012 to SBP); and the Research, Scholarly & Creative Activities Program awarded by the Cal Poly Division of Research (to TDR). Additionally, EVS was supported by a partnership among CalCOFI participants, including Scripps Institution of Oceanography (SIO), NOAA Southwest Fisheries Science Center (NA20OAR4170258), California Department of Fish and Wildlife (P2370002), and California Sea Grant (NA22OAR4170106). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Large baleen whales play a vital role in marine ecosystems by helping to regulate ecosystem processes [1]. They are of conservation and management relevance as many populations are listed as threatened (Vulnerable, Endangered, or Critically Endangered) on the IUCN Red List of Threatened Species [2], and they provide significant cultural value to people [3]. In the eastern North Pacific, blue (Balaenoptera musculus), fin (Balaenoptera physalus), and humpback (Megaptera novaeangliae) whales are among the most widespread and frequently encountered of the family Balaenopteridae, with long-range migrations that connect low-latitude wintering grounds to mid- and high-latitude summer foraging grounds [4,5]. Each year, blue, fin, and humpback whales migrate to the productive waters of the California Current to feed on dense aggregations of krill and schooling fishes [6–8]. Their foraging grounds and migratory corridors overlap with commercial shipping routes and areas of military activity, particularly in Southern California [9,10], making the ability to predict baleen whale distribution an important management directive in the region.
The seasonal distribution and abundance of these species in the southern California Current Ecosystem (CCE) is largely shaped by oceanographic conditions that influence prey availability. Blue whales preferentially consume specific krill species, favoring Thysanoessa spinifera over Euphausia pacifica [11], and their distributions tend to be correlated with krill aggregations [6,12]. Humpback whales are opportunistic foragers, feeding on both krill and small schooling fish such as sardine, anchovy, sand lance, and herring [13–15], with diet composition likely reflecting prey community structure and underlying oceanographic conditions [7]. Fin whales are also opportunistic foragers, though less is known about their prey preferences, seasonal distribution, and general ecology; they occur year-round in the CCE [16] but, like blue whales, peak density occurs in summer months [17]. Across species, migratory phenology and interannual variability are closely linked to environmental conditions. For instance, long-term acoustic studies have documented shifts in call occurrence associated with biological productivity [18,19], sea surface temperature [20], and broader events such as marine heatwaves and decadal-scale climate fluctuations [21].
The close association between baleen whales and their preferred habitat suggests that their density in the CCE is linked to habitat-specific ecological characteristics [22]. We use the term “ecological habitat” to describe the community of small organisms whose presence reflects the ecological conditions that support the occurrence of another larger organism (following [23]). The ecological habitat of whales includes taxa linked to them through food-web pathways, such as bacteria, protists, and small zooplankton that contribute to productivity or serve as prey of whale prey. We also include taxa that may be directly or indirectly biologically associated such as whale parasites, commensal organisms, and microorganisms found on whale skin [24–30]; in their gut [31,32]; or in respiratory fluids [33,34]. Whale microbiomes, which consist of communities of microorganisms living on and in the whale, have been found to significantly differ from the microbial community in the surrounding seawater [33,35], and distinct bacterial taxa have been associated with the skin of humpback whales across the North Pacific [29]. Additional studies have observed parasites, viruses, and epibiotic fauna specific to whales [28].
Beyond studies of microbial and other taxa living on or in whales, previous research has linked prey and zooplankton biomass or abundance to baleen whale distribution [36,37], and recent work has used molecular methods to characterize baleen whale prey species [38,39]. Thus, the ecological habitat of baleen whales is not limited to potential zooplankton prey but also includes prokaryotic microbes, small eukaryotic heterotrophic microbes, phytoplankton, and protozoans.
Understanding these ecological relationships complements efforts to monitor the distributions of baleen whales over time, which is necessary for tracking population changes and understanding the effects of human activity on these species. Baleen whales are monitored using a combination of visual surveys and photo identification [17,40,41], acoustic methods [8,16], satellite imagery [42], tags [43], and, increasingly, genetic methods [44]. Sampling whales and other marine mammals is challenging due to their wide-ranging and often patchy distributions [45], low encounter rates [17], brief or intermittent surface cues [46], deep-diving or cryptic behaviors [47], and the high logistical, financial, and permitting costs of direct or invasive sampling methods.
Because large baleen whales are difficult to sample directly, yet are closely linked to their habitat and associated biota, we investigated whether microbial and small plankton communities could serve as proxies for predicting their density in the southern California Current Ecosystem. The California Current Ecosystem was chosen because it is a highly productive upwelling system and is known to be an important foraging ground for baleen whales [9,10].
We leveraged six years of ship-based marine mammal data coupled with data on the microbial and small-plankton community (hereafter planktonic communities) to predict seasonal and interannual baleen whale density. We used environmental DNA (eDNA) metabarcoding to characterize baleen whale ecological habitat and to predict their seasonal density in the California Current Ecosystem. This approach captures DNA from single-celled microbes and cells shed by multicellular organisms into the water column to generate a molecular “fingerprint” of the biological community spanning multiple trophic levels [48,49]. We used two well-established genetic markers, the 16S and 18S ribosomal RNA genes, to capture a wide range of prokaryotic and eukaryotic microbes as well as small metazoan zooplankton like copepods and krill [50,51]. We focus our analyses on blue, fin, and humpback whales, given that they are abundant in the California Current Ecosystem [17,52], forage at low trophic levels [12], and have been shown to have existing connections to various microbes [29,31]. Additionally, concurrent eDNA samples and visual sightings of baleen whales exist from the California Cooperative Oceanic Fisheries Investigations (CalCOFI), the longest integrated marine ecosystem observing program in the world. This approach allows us to evaluate the extent to which marine planktonic communities can serve as predictors of the ecological habitats of top consumers, like baleen whales. By predicting seasonal and interannual baleen whale density from microbial community composition, this work may help quantify broad ecological linkages between baleen whales and microbes on a key foraging ground, potentially guiding the inclusion of such community-level associations into future habitat suitability and species distribution modeling efforts.
Methods
Sampling area
The Southern California Bight region is situated in the southern portion of the California Current Ecosystem (CCE), a productive upwelling system that supports important baleen whale species. The California Cooperative Oceanic Fisheries Investigations (CalCOFI), one of the longest running integrated marine ecosystem monitoring programs in the world, has systematically sampled the physics, chemistry, and biology of the CCE since 1949.
Quarterly CalCOFI sampling consists of 75 stations along six “core area” transects that extend from San Diego, CA to north of Point Conception (Morro Bay, CA) and include coastal stations (∼50 m depth) and stations within the core of the California Current (CC) out to ∼250–550 km offshore. The transects are spaced approximately 40 nautical miles (nm) apart. Along the transect lines, stations are spaced approximately 40 nm apart for offshore stations and 20 nm apart for coastal stations. In this project we utilize data collected from the core sampling area comprising stations located on CalCOFI lines 93.3 to 76.7 (SE corner: 32.956, −117.305; NE corner: 35.088, −120.777; NW corner: 33.388, −124.323; SW corner: 29.846, −123.587) between 2014 and 2020.
Environmental DNA collection and amplicon sequencing
From select stations on CalCOFI cruises, 743 DNA samples were collected within the core sampling area from 2014−2020 as part of the NOAA-CalCOFI Ocean Genomics (NCOG) time series [53]. We sampled key stations on lines 80 and 90, as well as basin stations, to provide an onshore-offshore gradient across two transects. Seawater was collected from the near-surface (normally 10 m) and the subsurface chlorophyll maximum layer and filtered onto 0.22 µm SterivexTM filters (MilliporeSigma SVGP01050) that were immediately flash frozen in liquid nitrogen and stored at −80°C. In between samples, bottles and lines were rinsed with Milli-Q water, then rinsed again three times with the sample itself prior to beginning filtration. The average volume filtered was 3.3 L. Following each cruise, samples were brought to J. Craig Venter Institute for processing with equipment and workspaces cleaned with 70% ethanol between each use. DNA was extracted with the Macherey-Nagel NucleoMag Plant kit (Cat. no. 744400) on an Eppendorf epMotion 5075TMX and assessed on a 1.8% agarose gel. Blank samples were included during extraction to confirm the absence of contamination.
Amplicon libraries separately targeting the V4-V5 region of the 16S rRNA gene and both the V4 and V9 regions of the 18S rRNA genes were constructed via a one-step PCR with the TruFi DNA Polymerase PCR kit (Cat. no. AZ-1702). The primers used here were established by previous studies to capture the entire microbial community with commonly-used marker gene regions, and have been shown to outperform other primer sets [51]. For 16S, the 515F-Y (5′-GTG YCA GCM GCC GCG GTA A-3′) and 926R (5′-CCG YCA ATT YMT TTR AGT TT-3′) primer set was used [54]. For 18S-V4, the V4F (5′-CCA GCA SCY GCG GTA ATT CC-3′) and V4RB (5′-ACT TTC GTT CTT GAT YR-3′) primer set modified from [55] was used. For 18S-V9, the 1389F (5′-TTG TAC ACA CCG CCC-3′) and 1510R (5′-CCT TCY GCA GGT TCA CCT AC-5′) primer set was used [56]. In addition to the extraction blank samples, negative PCR controls were also included.
Each reaction was performed with an initial denaturing step at 95°C for 1 minute followed by 30 cycles of 95°C for 15 seconds, 56°C for 15 seconds, and 72°C for 30 seconds. 2.5 µL of each PCR reaction was run on a 1.8% agarose gel to confirm amplification, then PCR products were purified with Beckman Coulter AMPure XP beads (1x) following the manufacturer’s instructions. DNA quantification of the PCR products was performed in duplicate using the Invitrogen Quant-iT PicoGreen dsDNA Assay kit (Cat. no. P7589). All samples regardless of positive amplification as well as a subset of extraction blanks and PCR controls were then combined in equal proportions where possible into multiple pools followed by another 0.8x AMPure XP bead purification on the final pool. DNA quality of each pool was evaluated on an Agilent 2200 TapeStation, and quantification was performed with the Invitrogen Qubit HS dsDNA kit (Cat. no. Q32854). Each 16S or 18S pool was sequenced on an Illumina MiSeq (2 x 300 bp for 16S and V4 or 2 x 150 bp for V9) except for the one pool for the 2014–2016 euphotic zone V9 samples, which was run on an Illumina NextSeq 500 (Mid Output, 2 x 150 bp).
Amplicons were analyzed with QIIME2 v2019.10 [57]. Briefly, paired-end reads were trimmed to remove adapter and primer sequences with cutadapt [58]. Trimmed reads were then denoised with DADA2 to produce amplicon sequence variants (ASVs). Each run was denoised with DADA2 separately to account for different error profiles in each run then merged. Taxonomic annotation of ASVs was performed with the q2-feature-classifier naïve bayes classifier using the SILVA database (Release 138) for 16S ASVs and the PR2 database (v4.13.0) for 18S ASVs [59–62].
Visual surveys of baleen whales
Since 2004, visual sightings of baleen whales have been recorded during cruises along CalCOFI transects. In this project, we focus on data from 2014 to 2020 collected contemporaneously with the NCOG data described in the previous section (Environmental DNA collection and amplicon sequencing). The dataset includes marine mammal monitoring effort from 25 individual CalCOFI cruises spanning all four seasons and limited to the core sampling area.
Visual monitoring effort was conducted in “passing mode” and adapted from standard line transect marine mammal survey protocols [63,64] following methods outlined in [17]. Two trained marine mammal observers used 7x50 Fujinon binoculars to observe and record marine mammals during daylight hours as the ship transited between CalCOFI stations. Observers systematically recorded species identification, group size estimates, reticle position below the horizon, angle relative to the bow, latitude and longitude, ship’s heading, sea state, swell height and visibility. Survey effort was suspended when sea state was greater than Beaufort 6 or when visibility less than 1 km.
Whale sightings were only included when classified as both “on-effort” and “on-transect”. The on-effort criteria was met when two observers were actively scanning while the vessel was traveling above 10 knots in a sea state below Beaufort 6 with greater than or equal to 1 km visibility. The on-transect criteria was met when sightings were along one of the core CalCOFI transect lines.
Estimated baleen whale densities
The marine mammal visual survey data was used to estimate density (number of individuals per 1000 km2) using multiple covariate distance sampling methods [65]. This analysis involved two stages: (1) estimating each species’ detectability as a function of factors potentially affecting sighting conditions; (2) estimating species’ density per cruise given the number detected and the estimated detectability. The major advantage of this approach over simply using sighting rates (number of individuals detected per unit survey effort) is that it can account for differences in sighting rates that are caused by differences in detectability (for example, if sighting conditions tend to be worse in winter) that might otherwise confound downstream inferences about the relationship between whales and their ecological habitat (see, e.g., [66]). Distance sampling analyses were undertaken using the Distance R package [67].
Distance sampling methods for line transect surveys use the distribution of perpendicular distances of observed animals to estimate a “detection function” (i.e., probability of detection as a function of perpendicular distance and other covariates), and from this the average probability of detection within the surveyed strips [68]. For each sighting, the measured reticle position and angle relative to bow were used, together with the knowledge of observer eye height above the water, to calculate perpendicular distance for each sighting. Following [17], a perpendicular truncation distance of 2400 m was used. For each species, candidate detection functions were fitted incrementally using forward selection, starting with a key function and adding terms if the resulting model had a lower Akaike Information Criterion (AIC) score. Key functions were uniform, half-normal and hazard rate. In one set of analyses the terms added were cosine (with uniform and half-normal) or polynomial (with hazard rate) adjustment terms. In another set of analyses using just half-normal and hazard-rate key functions, the terms added were covariates affecting the scale parameter of the key functions: Beaufort sea state, swell height, observer height above water (as a factor with 3 levels) and group size. The final model chosen for inference was the one with lowest AIC over both sets. A list of all candidate models is given in supporting information (S5 Table). Goodness of fit was assessed visually by comparing the fitted detection function with histograms of observed distances, and using a Cramér-von Mises test.
Given a fitted detection function, estimated density per cruise, denoted , for each species was calculated as:
where is the transect length on cruise i, w is the truncation distance,
is the number of detections of the species on survey i,
is the group size of the gth detection, and
is the estimated detection probability of the gth detection given its covariates
. This detection probability is computed by averaging the detection function over the perpendicular distances:
where is the estimated detection function and x is perpendicular distance. Variance in estimated density was calculated by combining variance in the estimated detection function and variance in detection rate between transect lines, as detailed by [69].
Analysis of amplicon relative abundances
We removed rare ASVs present in under of samples across all cruises and abundant ASVs present in more than
of samples across all cruises, as neither rare nor ubiquitous ASVs yield data with sufficient variation to provide a basis for prediction of marine mammal density. Under the assumption that all remaining ASVs were physically present across samples and cruises, geometric Bayesian multiplicative count zero imputation [70] was used to estimate relative abundances for non-detections.
To match the spatial resolution of the whale density estimates, amplicon relative abundances were aggregated to the cruise level by weighted (geometric) averaging. In detail, if denotes the relative abundance of ASV
from the sample taken at station
on transect
and depth
on cruise
, relative abundances were aggregated across depth and sampling location by taking a weighted geometric mean:
Weights were inversely proportional to spatial sampling density with respect to geolocation and maximized α-diversity with respect to depth; the latter criterion resulted in weights slightly favoring samples taken at max chlorophyll-a depth. The resulting quantity measures the average relative abundance of ASV
observed across samples collected on cruise
.
The centered log-ratio (CLR) transformation [71] was then applied to average relative abundances . The “typical” average relative abundance taken across ASVs on cruise
is given by the geometric mean:
The CLR transformation is defined as the natural logarithm of the ratio:
This captures the factor by which the average relative abundance of a particular ASV deviates from the typical average relative abundance across all ASVs on a given cruise; for example, value of indicates that ASV
is twice as abundant as the typical ASV on cruise
.
The above aggregations and transformations were performed separately for the 16S, 18S-V4, and 18S-V9 markers, yielding three sets of ; this level of detail is omitted from the notation.
Seasonal log-ratios
Seasonality was removed from the whale density and amplicon data with a secondary log-ratio transformation using the respective seasonal averages. The seasonal geometric means, written as functions of the observation index (cruise) , are:
In these expressions is an index set comprising the indices of all observations made in the same quarter as observation
. The seasonally-adjusted estimated whale density and the seasonally-adjusted average amplicon relative abundance are then
and
, respectively. The resulting quantities are best interpreted as deviations from seasonal averages; for example, a value of
would indicate that on the first cruise, observed density was half of the seasonal average for the corresponding quarter.
Log-contrast model framework
We formulated a log-contrast-type model [72] to identify and estimate statistical relationships. This model expresses the seasonally-adjusted estimated densities as linear functions of the seasonally-adjusted average amplicon relative abundances:
Due to the logratio transformations, the coefficients capture multiplicative changes in median estimated density associated with multiplicative changes in relative abundances, after adjusting for seasonality and assuming error normality. For example, a twofold change in the relative abundance of ASV , relative to its seasonal average, is associated with a change in median density, relative to its seasonal average, of a factor of
. We specified separate models for each whale species of interest and each marker, amounting to 9 models in total.
Variable selection and parameter estimation
A partial least squares (PLS) latent variable framework [73,74] was used for variable selection and parameter estimation. PLS allows for estimation of the full set of model coefficients even when least squares is ill-posed due to the number of covariates (ASVs) exceeding the number of samples (cruises) [75]. Writing the log-contrast model (Eq. 1) in linear model form , the PLS framework stipulates a set of latent variables or “components”:
The columns of
or component “loadings” are estimated sequentially by component
by maximizing the correlation with the response and the variance of the latent component, subject to an orthogonality constraint with respect to previous latent components and a unit-norm constraint (when
the orthogonality constraint is omitted):
The SIMPLS algorithm [76] was used to compute estimates of the loading matrix . Subsequently, a linear model was fit with the latent components
as covariates and least squares estimates were back-propagated to obtain coefficient estimates for the model as originally specified:
The log-contrast models (Eq. 1) were specified using a small subset of candidate amplicons identified through variable selection to improve model interpretability and prioritize identification of relatively stronger associations. The selection procedure resulted from applying the stability selection method of [77] to sparse partial least squares (sPLS) estimates of [78]. The sPLS method introduces an
penalty to the PLS optimization problem (Eq. 2), which has the effect of shrinking small component loadings
to exactly zero and inducing sparsity in the loading matrix
. In [78], the authors approximate the solution to the resulting problem using a surrogate approach with a hyperparameter
controlling the degree of sparsity in the sPLS estimate; this results in a sparse coefficient estimate
. Stability selection is a computationally-intensive procedure that traverses the problem of hyperparameter tuning by estimating the selection probability of each variable for a “path” of hyperparameter values in a specified region
. Selection probability estimates are obtained by computing
repeatedly from subsamples of the data.
We used leave-one-out partitions to estimate, for each hyperparameter and each candidate amplicon
, the probability of selecting that amplicon using the sPLS method:
The “stable set” consists of frequently-selected amplicons, specifically those variables whose estimated selection probability exceeds for at least one
:
In [77], the authors provide heuristics for choosing the region to control the expected number of falsely selected variables (per-family error rate); we determined
according to their method in order to control the per-family error rate at 0.5.
In the context of our model, the sPLS coefficient estimate depends not only on the sparsity hyperparameter
but also on the number
of latent components. We addressed this by estimating stable sets
for
and choosing the number of components that optimized mean square prediction error estimated from leave-one-out partitions of the data. We re-computed seasonal adjustments when forming data partitions so that subsamples did not incorporate information about held-out observations via seasonal averages. Once the stable set
was estimated for each model, we computed SIMPLS estimates of the model coefficients with the number of latent components
used to determine the stable set.
Model validation
We sought to further assess the consistency of the variable selection procedure by comparing stable sets obtained under perturbations of the data partitions used to estimate selection probabilities. An “outer validation” was performed by constructing a set of nested leave-one-out partitions and performing the entire stability selection procedure holding out one observation (cruise) at a time. This yielded stable sets (one from holding out each of the
cruises) which we then compared for consistency using a thresholded Jaccard index:
This measures the proportion of amplicons selected at least % of the time among the stable sets obtained from the outer validation procedure. We chose
to look at the proportion of amplicons selected more often than not across validation runs as a measure of consistency of the variable selection procedure.
Narrative review of direct relationships between baleen whales and microbes/small zooplankton
A narrative review was conducted to identify known direct relationships between blue, humpback, and fin whales and bacteria, microbes, and small plankton from the existing literature for comparison with potential relationships identified by our models. The review focused on studies that examine microbial and planktonic interactions with these baleen whale species. The Publish or Perish software was utilized to search the academic literature database Google Scholar for peer-reviewed articles and reports. Specific title-keyword combinations were searched to capture relevant studies related to bacteria, microbes, and plankton associated with blue, humpback, and fin whales. Search terms included the common name of each whale species in the title and “bacteria,” “plankton,” or “microbes” in the keywords.
The titles of these papers were reviewed to assess relevance to the research question, focusing on studies that explored direct connections between baleen whales and microbes, bacteria, or plankton. Studies that examined direct interactions between microbes or small plankton and baleen whales were selected. Interactions included topics such as: baleen whale health and disease, including microbial diversity and baleen whale pathology; feeding ecology, including studies that analyzed prey-plankton dynamics and their relationship with baleen whale foraging behavior; baleen whale microbiome, such as studies on respiratory, gut, or skin microbiomes; baleen whale parasitology and pathology, which examined diseases and parasites associated with baleen whale health; and baleen whale strandings and carcasses. Studies unrelated to direct baleen whale-microbe relationships were excluded from the review. This search strategy yielded 18 relevant papers.
These selected papers were examined for relevant information that was entered into a structured table that contained the following information: citation, title of the paper, url/link to the paper, notes, whale species (e.g., blue, humpback, or fin whales), type of relationship between the whale species and bacteria, microbes, or plankton, type of sample or data used in the study, location, method, taxonomic classification of the bacteria/microbe/small plankton (Kingdom/Domain, Phylum, Class, Order, Infraorder, Family, Genus, Accepted Name).
For each microbial taxon mentioned in the selected studies, a detailed taxonomic classification was retrieved programmatically in Python using the World Register of Marine Species (WoRMS) API and programmatically in R using the NCBI (National Center for Biotechnology Information) database (S4 Table).
Results
Qualitative assessment of microbial community data, baleen whale sightings, and density estimates identified spatial and temporal patterns in both baleen whale sighting rates and density estimates across species, and relatively homogeneous sampling across space and time suggests minimal bias in measurement of microbial communities (Section Datasets). Statistical models relating baleen whale density estimates to microbial community data identified small sets of ASVs that explained a large proportion of variation in whale density estimates after adjusting for seasonality (Section Estimated Relationships). The relative abundances of taxa within these subcommunities provide accurate predictions of density of target baleen whale species, and community predictors outperform naive forecasting methods (Section Model Predictions). Through a narrative review of existing literature, we found some of the taxa in our study have also been previously documented as microbial associates of baleen whales (Section Communities of Taxonomic Annotations). Lastly, despite some taxonomic overlap in microbial subcommunity predictors of blue, humpback, and fin whales, our models suggest that each species is related to a distinct subcommunity (Section Narrative Review Findings).
Datasets
Fig 1 shows the locations along CalCOFI survey transects of (a) NCOG samples sequenced and (b) on-effort whale sightings used in the analysis; sightings tended to occur nearer to shore. The distribution of NCOG samples across space and time was approximately uniform (Table 1); data for a given cruise and genetic marker typically comprised 20–40 samples collected across 5–6 transects during a 2–3 week period. However, there is some variation in the number of samples sequenced by cruise – with as few as 14 samples collected in spring 2014 and winter 2019 and as many as 56 collected in winter 2018 – as well as in the length of the survey period and the spatial distribution of sampling locations.
(A) Sampling locations for NCOG samples used in the analysis and (B) sighting locations recorded from visual survey data for target species during 2014-2020. All on-effort, on-transect sightings are shown; in the line transect analysis, the small number of sightings at perpendicular distances greater than 2400 m were truncated.
S1 Table shows NCOG sample counts by marker, cruise, and transect, providing a more fine-grained look at the spatial distribution of these samples. Certain transects – namely, 080.0, 090.0, and 093.3 – were consistently sampled more densely. S2 Table lists the unique ASVs retained in the analysis after filtering out rare and ubiquitous ASVs for the 16S, 18S-V4, and 18S-V9 markers, along with their taxonomic classifications, if known. There were 6234, 6824, and 9511 such “candidates”, respectively. Known taxonomic classifications among candidates (i.e., after relative abundance filtering) exhibited relatively less overlap between 16S and 18S markers (9.9% between 16S and 18S-V9 and 0.7% between 16S and 18S-V4 at the genus level) than between the two 18S markers (51.2% between 18S-V4 and 18S-V9 at the genus level).
Whale sightings exhibited clear seasonal variation (Table 2): blue whales were the most seasonal and rarely sighted except in summer; fin whales were also more common in summer, but sighted throughout the year in most years; humpback whales were sighted year-round but in greater numbers during spring. Sample sizes for the distance sampling analysis, and tables showing AIC values for fitted detection function models are given in S5 Table. The selected (i.e., lowest-AIC) models were uniform with one cosine adjustment for blue whales, half-normal with sea state, group size and observer platform height as covariates for fin whales and half-normal with swell, platform height and swell as covariates for humpback whales. Selected models for all 3 species were good fits to the distance data (Craemer von-Mises test p-values 0.70, 0.31, and 0.59 respectively). Estimated density varied by season in a similar manner to the sighting rates, with strong inter-annual variability (Fig 2).
Estimated density (number of individuals per 1000 km2) over time across years (left) and by season (right) for each whale species; seasonal averages are shown in red.
Estimated relationships
Our analysis identified sparse well-fitting models, selecting stable sets (Eq. 2) comprising between 23 and 60 ASVs depending on marker and whale species (between 0.24% and 0.85% of candidates) that explain an estimated 81–99% of variation in density estimates after adjusting for seasonality (Table 3). Residual diagnostic checks indicated no issues with model specification or time dependence between successive surveys (S1 Fig). Optimal model hyperparameters varied slightly; models used between 4 and 11 latent components (a maximum of 12 were available for hyperparameter optimization). The selected ASVs spanned 7–19 classes, 8–25 orders, and 9–28 families, again depending on target species and marker.
Our model selection procedure exhibited some sensitivity to data perturbation. Depending on the model and taxonomic level, anywhere from 22–68% of selected taxa are robust to leave-one-out data perturbations. In detail, Table 4 shows the modified Jaccard index (Eq. 4) computed for each model at the ASV, family, order, and class levels. This measure quantifies the proportion of taxa enumerated across models fit to leave-one-out data partitions that are included more often than not in the stable set.
S3 Table enumerates selected ASVs by whale species and marker (i.e., by model) along with taxonomic classifications, if known, and an estimated measure of association (i.e., estimated model coefficient in Eq. 1) quantifying multiplicative change in whale sightings per doubling of ASV relative abundance after adjusting for seasonality. For example, an estimate of 1.338 would indicate that every doubling of the relative abundance of that particular microbe (relative to its seasonal average) is associated with an estimated 33.8% increase in estimated blue whale sightings (relative to its seasonal average). Estimates greater than 1 indicate a positive association, and estimates less than 1 indicate a negative association.
Model predictions
The selected amplicons summarized in the previous section and enumerated fully in the supporting information (S3 Table) were predictive of deviations of whale sightings from seasonal averages (Table 5). Combining predicted deviations with estimated seasonal trends, out-of-sample predictions of density were within 0.69–1.41 individuals per 1000 km2 of observed values on average, depending on whale species and marker. For context, this result represents a 11–52% reduction in prediction error compared with imputing the seasonal average, and a 20–65% reduction in prediction error compared with carrying forward the last observation from the same season, again depending on whale species and marker. While all markers produced comparable gains in predictive power, the selected 18S-V9 amplicons yielded the best predictions for all target species.
Fig 3 shows leave-one-out predictions from all nine models compared with observed values. In particular, our models predicted spikes in density and low density with comparable accuracy. Predictions were made with comparable precision – as measured by 90% bootstrap percentile intervals – on the log scale, which translates to greater uncertainty associated with predictions of higher sightings.
At left, separate time series are shown for each marker and distinguished by color and line dash. At right, separate panels compare the observations and predictions for each species/marker combination on the log scale; the solid line represents perfect predictive accuracy. The vertical line ranges show 90% bootstrap percentile intervals quantifying prediction uncertainty.
Communities of taxonomic annotations
A total of 148 unique taxonomic annotations were identified as predictive of baleen whales. 20% of annotations (29) were shared across all three species (blue, humpback, and fin whales), and 21% (31) of annotations were shared by two species (either blue/fin, blue/humpback, fin/humpback). Additionally, the rest (59%) were unique to a single species – 27 annotations were unique to blue whales, 43 annotations were unique to fin whales, and 18 annotations were unique to humpback whales (S4 Table).
Narrative review findings
We found 18 publications that documented 457 microbial and plankton taxa associated with blue, fin, and humpback whales, based on studies conducted in various regions throughout the world (S4 Table). After accounting for duplicate taxa, 403 unique taxa remained. These studies illustrate taxa that are known to be associated, either internally or externally, with fin, humpback, and blue whales. They highlight various aspects of baleen whale biology and ecology, including related to fecal, digestive, respiratory, and skin microbiomes; prey composition; and feeding habitats. They also document a range of epibiotic parasitic and commensal organisms, such as barnacles and diatoms. A complete list of taxa known to be associated with whales from the narrative review is contained in S4 Table.
Overlap of annotations with literature identified in narrative review
Of the 148 unique taxonomic annotations that comprised the microbial communities predictive of baleen whales in our study, 23% of the annotations (34 out of 148) were found within the existing literature exploring baleen whale microbial parasites, commensals, prey, or respiratory-associated microbes, matching at the genus (17 out of 148) or family level (17 out of 148) (S4 Table). The rest of the annotations either matched at higher taxonomic levels, including order, class, and phylum (41%; 61 out of 148) or did not have any associated matches in the literature (36%; 53 out of 148) (S4 Table).
Discussion
In this work we leveraged the microbial and small plankton community composition for prediction of blue, fin, and humpback whale density. We found that biological communities, as identified from marker genes capturing both prokaryotic and eukaryotic plankton, were strong predictors of whale density across multiple seasons and years. To our knowledge, these are some of the first data showing that the ecological habitat of baleen whales can predict their density and track interannual variability. These results align with a growing body of evidence suggesting that baleen whale distributions in the California Current are tightly coupled to environmental conditions that drive prey availability [19,22,79], which in turn are linked to microbial and planktonic community composition.
From among six to eight thousand candidate microbial taxa per genetic marker (S2 Table), our analyses found small groups of taxa that represent the ecological habitat of baleen whales and are strongly associated with and strongly predictive of the density of blue, fin, and humpback whales (Tables 3, 5, and S3 Table). In total, these groups of ASVs represented 148 unique taxonomic annotations identified across marker genes, with 20% shared among all three whale species, 21% shared between two species, and 59% unique to a single species. These results suggest that there is some overlap in the ecological habitat among blue, humpback, and fin whales, but each species is also related to a distinct planktonic community, with blue whales exhibiting the highest uniqueness in associated taxa.
Additionally, we compared the predictive microbial and small plankton annotations from our study to taxa found in our literature review of previously documented connections between microbes and small plankton with baleen whales. We found that 23% of the predictive taxa matched taxa from the literature we reviewed, at the genus or family level. Many of the taxa that matched are known to be prey, parasites, and commensals of baleen whales. For example, members of the genera Sphingomonas, Pseudoalteromonas, Acinetobacter, and Pseudomonas have been found in blow samples of humpback whales [33]. Additionally, members of the genus Psychrobacter have been found on skin and in blow samples of humpback whales [30,33] and in blow samples of blue whales [34]. Serratia spp. have been observed in the digestive tract of fin whales [80]. Additionally, both calanoid and cyclopoid (including Poecilostomatoida) copepods have been detected as prey in humpback whale feces [32]. In addition to these examples, many other taxa in our communities were found in the existing literature on microbial whale associates (S4 Table). Our findings suggest that some of these signals may be ecologically relevant – either directly through food-web or symbiotic interactions, or indirectly by reflecting environmental features of whale habitats, such as distinct water masses or seasonal patterns (S4 Table). Given that the rest either matched at higher taxonomic levels (41%) or had no documented matches in our literature search (36%), and some were uncultured or unassigned, this suggests that future work could examine these specific taxa further. This could include investigating some of the taxa identified in this study in greater detail to identify ecological relationships with baleen whales, applying other genetic methods to improve taxonomic resolution, or further exploring novel or understudied microbial associations relevant to whale habitats.
Ultimately, our focus is not on individual ASVs in isolation, but rather on their combined presence as a collective community. This suite of microbes and small plankton likely represents an integral component of the ecological habitat surrounding baleen whales, contributing to the complex collection of organisms that may influence or reflect their density. Our results point to a consistent community-level association between large cetaceans and microbes or small plankton, which may reflect physical and chemical signatures of oceanic water masses [81–84]. Marine microbial diversity is vast, and while marker gene amplicons (e.g., 16S or 18S rRNA) are valuable for broad community characterization, they often provide insufficient resolution to infer fine-scale taxonomy or ecological niche [85–87]; thus, we refrain from presenting individual microbial taxa as predictors of whales. However, microbial communities are well-known to transform nutrients and other small molecules in ways that shape the surrounding food web [88–90]. We therefore postulate that microbial communities exist that are characteristic of habitats favorable to whale species, whether due to indirect relationships (e.g., because it reflects the biochemical habitat of prey items like krill and fish) or some direct relationships (e.g., microbial associates of whales).
Notably, our molecular data encompass both prokaryotic and eukaryotic plankton, which allowed us to characterize microbial diversity across different domains of life. There are likely many bacterial-bacterial and bacterial-plankton interactions, as well as other as-yet uncharacterized ecological relationships among the predictor taxa. Taken together, the community-level synergy is likely greater than the sum of its parts and encompass the holistic ecological habitat of baleen whales. We present a first step at understanding and untangling the complex linkages between the environment, base of the food web, and top consumers. However, further research is needed to more deeply examine the groups of ASVs that collectively define these microbial communities.
Our statistical analysis framework combines log-ratio methodology for compositional data analysis [71,72] with sparse partial least squares [78] for interpretable dimension reduction, and we used stability selection [77] to improve the robustness of the variable selection procedure to small perturbations in training data. We found that the resulting estimation procedure in our analysis exhibits a range of selection consistencies, depending on the model. Among the 9 models in our analysis, anywhere from an estimated 30–40% of ASVs were consistently selected when holding out one data point at a time (Table 3). Although surprisingly low, given that stability methods aim precisely at achieving consistent selection, there are several possible explanations for this result. First, the modest sample size entails that each data point exerts considerable influence on model fit. Among the 25 cruises in our analysis, one observation constitutes 4% of available data. Second, the density estimates are sparse – most cruises record few sightings and the limited number of cruises per year (~4 per year) may mean that we miss the full range of whale densities in a given year (e.g., minimum or maximum). Although spikes in sightings comprise only 8–16% of available data, depending on whale species, it is reasonable to speculate that these cruises capture the most information about potential ecological correlations. By comparison with the remaining 84–92% of the data, removal of one or two of these high-sightings observations substantially alters the variation in the time series. Thus, depending on which observations are held out, fitted models may describe fundamentally different ecological processes—either small variations among low density or large fluctuations between high and low densities. Third, our analysis did not account for (a) uncertainty in taxonomic classification of ASVs or (b) potential biases inherent to eDNA methods. These include extraction, primer, and amplification biases, which can cause some species to be missed and others to appear more or less abundant than they really are. Strong correlations among amplicon relative abundances due to either factor could lead to instability in variable selection under small perturbations of the data; this is a well-documented phenomenon in the statistical literature [77,91,92]. Fourth, partial least squares estimation is known to be sensitive to outliers—a fact which has produced proposals for robust PLS estimators [93]. It is therefore plausible that sparse partial least squares would exhibit high selection variability under data perturbations, especially in light of the data sparsity discussed above. Finally, varying uncertainty in density estimation by cruise (Fig 2) is not accounted for in the modeling framework, but may produce uneven signal strength of associations between microbial communities and baleen whale density depending on which cruises are used to fit models. All of these factors may contribute to the wide range of selection consistency observed in our work.
Overall, our findings suggest that planktonic communities can serve as a predictor of baleen whale densities. This study expands on previous research focused on specific planktonic prey as whale predictors to integrate the full planktonic community, including direct and indirect relationships to baleen whales, as an ecological habitat predictor. This study presents a reliable approach to predict baleen whale densities across long temporal (6 years) and spatial scales (over 200,000km2) using metabarcoding-derived communities of microbes and small plankton found in the water column.
By demonstrating links between microbial community composition and large whale density, this work can be used to generate potential explanatory variables to predict cetacean density. Current habitat and density surface models typically explain only a small proportion of variation in the input data, which is often derived from line transect surveys. Given the low detectability of whales and the challenges of sampling their eDNA, plankton eDNA may serve as a complementary proxy for predicting whale density. Thus, planktonic community measures may help to explain more of the variation in whale density and help to improve existing density surface models (e.g., [66]). For example, a microbial community index could be used to characterize the ecological habitat of specific whale species. Establishing such an index offers an additional tool for monitoring and managing whale populations.
This work can also inform hypotheses about the ecological relationships between whales and bacterioplankton, phytoplankton, and zooplankton. Additionally, important insights can be gained about planktonic communities that support higher trophic levels off the coast of California, effectively serving as ecological “fingerprints” of habitat suitability and to monitor changes in habitat quality. This also highlights the potential for using whales and their ecological habitats as sentinels for detecting and tracking changes in marine ecosystems.
This holistic predictive framework is broadly transferable beyond marine mammal research, management, and conservation. For example, future work could leverage large microbiome programs like Tara Oceans Expedition or Earth Microbiome Project to identify ecological habitats relevant to the conservation, population management, or reintroduction efforts of other species. We believe that assessing the overall ecological habitat offers a holistic predictive approach that can be extended and tested for other species, ecosystems, regions, and over longer time scales.
Supporting information
S1 Table. Numbers of NCOG samples tabulated by marker, cruise, and transect.
https://doi.org/10.1371/journal.pone.0334209.s001
(XLSX)
S2 Table. Taxonomic annotations of candidate ASVs used in the analysis.
Table divided into (a) 16S, (b) 18S-V4, and (c) 18S-V9 markers.
https://doi.org/10.1371/journal.pone.0334209.s002
(XLSX)
S3 Table. Estimated model coefficients and taxonomic annotations of selected ASVs used for prediction.
Table divided into (a) 16S, (b) 18S-V4, and (c) 18S-V9 models.
https://doi.org/10.1371/journal.pone.0334209.s003
(XLSX)
S4 Table. Summary of narrative review findings.
Table divided into (a) taxa identified by our models with supporting citations from the existing literature, if any, and (b) taxa reported in the existing literature with known direct relationships between blue, humpback, and fin whales and bacteria, microbes, and small plankton.
https://doi.org/10.1371/journal.pone.0334209.s004
(XLSX)
S5 Table. Summary of density estimation results.
(a) per-species sample size of detections before and after truncation for line transect analysis; (b) list of all line transect detection function models fitted for each species, with corresponding AIC values and delta AIC (i.e., the difference in AIC between that model and the lowest-AIC model for that species).
https://doi.org/10.1371/journal.pone.0334209.s005
(XLSX)
S1 Fig. Residual diagnostic checks for density models.
A) residuals against fitted values for each model (whale species and marker combination) and B) partial autocorrelations for each model to assess possible time dependence across successive surveys not accounted for in the models.
https://doi.org/10.1371/journal.pone.0334209.s006
(TIFF)
Acknowledgments
We are thankful to Matthew Robbins for his support in developing code to programmatically retrieve taxonomic information from the World Register of Marine Species (WoRMS) about taxa in our literature review. We are grateful to Ryan Kelly, Ole Shelton, and Eiren Jacobson for their valuable input. We also thank the reviewers (Matthew Harke, Chloe Robinson, and an anonymous reviewer) and the editor for their constructive comments and suggestions that improved the manuscript. We are grateful to all of the people who helped to make the CalCOFI cruises and associated sample and data collection possible. We are thankful to John Hildebrand and Joshua Jones for the continued support in organizing marine mammal visual observations during CalCOFI cruises, and Katherine Whitaker as well as other observers who collect the visual observations at sea.
References
- 1. Bowen W. Role of marine mammals in aquatic ecosystems. Mar Ecol Prog Ser. 1997;158:267–74.
- 2.
International Union for Conservation of Nature IUCN. The IUCN Red List of Threatened Species. https://www.iucnredlist.org
- 3. Cook D, Malinauskaite L, Davíðsdóttir B, Ögmundardóttir H, Roman J. Reflections on the ecosystem services of whales and valuing their contribution to human well-being. Ocean & Coastal Management. 2020;186:105100.
- 4. Calambokidis J, Steiger GH, Rasmussen K, Urbán J, Balcomb KC, Salinas M. Migratory destinations of humpback whales (Megaptera novaeangliae) that feed off California, Oregon and Washington. Mar Ecol Prog Ser. 2000;192:295–304.
- 5. Hazen EL, Palacios DM, Forney KA, Howell EA, Becker E, Hoover AL. WhaleWatch: A dynamic management tool for predicting blue whale (Balaenoptera musculus) density in the California Current. J Appl Ecol. 2017;54:1415–28.
- 6. Fiedler PC, Reilly SB, Hewitt RP, Demer D, Philbrick VA, Smith S, et al. Blue whale habitat and prey in the California Channel Islands. Deep Sea Research Part II: Topical Studies in Oceanography. 1998;45(8–9):1781–801.
- 7. Fleming AH, Clark CT, Calambokidis J, Barlow J. Humpback whale diets respond to variance in ocean climate and ecosystem conditions in the California Current. Glob Chang Biol. 2016;22(3):1214–24. pmid:26599719
- 8. Scales KL, Schorr GS, Hazen EL, Bograd SJ, Miller PI, Andrews RD. Should I stay or should I go? Modelling year-round habitat suitability and drivers of residency for fin whales in the California Current. Diversity and Distributions. 2017;23(10):1204–15.
- 9. Calambokidis J, Kratofil MA, Palacios DM, Lagerquist BA, Schorr GS, Hanson MB, et al. Biologically important areas II for cetaceans within US and adjacent waters—West Coast Region. Front Mar Sci. 2024;11:1283231.
- 10. Calambokidis J, Steiger GH, Curtice C, Harrison J, Ferguson MC, Becker E. Biologically important areas for selected cetaceans within US waters—West Coast region. Aquat Mamm. 2015;41:39–53.
- 11. Nickels CF, Sala LM, Ohman MD. The morphology of euphausiid mandibles used to assess selective predation by blue whales in the southern sector of the California Current System. J Crustac Biol. 2018;38(5):563–73.
- 12. Croll DA, Marinovic B, Benson S, Chavez FP, Black N, Ternullo R. From wind to whales: trophic links in a coastal upwelling system. Mar Ecol Prog Ser. 2005;289:117–30.
- 13. Baker CS, Herman LM, Perry A, Lawton WS, Straley JM, Straley JH. Population characteristics and migration of summer and late-season humpback whales (Megaptera novaeangliae) in southeastern Alaska. Mar Mamm Sci. 1985;1:304–23.
- 14. Geraci JR, Anderson DM, Timperi RJ, Staubin DJ, Early GA, Prescott JH. Humpback whales (Megaptera novaeangliae) fatally poisoned by dinoflagellate toxin. Can J Fish Aquat Sci. 1989;46:1895–8.
- 15. Clapham PJ, Leatherwood S, Szczepaniak I, Brownell RLJ. Catches of humpback and other whales from shore stations at Moss Landing and Trinidad, California, 1919–1926. Mar Mamm Sci. 1997;13:368–94.
- 16. Širovic A, Rice A, Chou E, Hildebrand J, Wiggins S, Roch M. Seven years of blue and fin whale call abundance in the Southern California Bight. Endang Species Res. 2015;28(1):61–76.
- 17. Campbell GS, Thomas L, Whitaker K, Douglas AB, Calambokidis J, Hildebrand JA. Inter-annual and seasonal trends in cetacean distribution, density and abundance off southern California. Deep Sea Research Part II: Topical Studies in Oceanography. 2015;112:143–57.
- 18. Oestreich WK, Abrahms B, McKenna MF, Goldbogen JA, Crowder LB, f JP. Acoustic signature reveals blue whales tune life-history transitions to oceanographic conditions. Funct Ecol. 2022;36(4):882–95.
- 19. Ryan JP, Oestreich WK, Benoit-Bird KJ, Waluk CM, Rueda CA, Cline DE, et al. Audible changes in marine trophic ecology: Baleen whale song tracks foraging conditions in the eastern North Pacific. PLoS One. 2025;20(2):e0318624. pmid:40009591
- 20. Szesciorka AR, Ballance LT, Širović A, Rice A, Ohman MD, Hildebrand JA, et al. Timing is everything: Drivers of interannual variability in blue whale migration. Sci Rep. 2020;10(1):7710. pmid:32382054
- 21. ZoBell VM, Posdaljian N, Lenssen KL, Wiggins SM, Hildebrand JA, Baumann-Pickering S, et al. Climatic and economic fluctuations revealed by decadal ocean soundscapes. J Acoust Soc Am. 2025;157(6):4233–51. pmid:40471053
- 22. Palacios DM, Bailey H, Becker EA, Bograd SJ, DeAngelis ML, Forney KA, et al. Ecological correlates of blue whale movement behavior and its predictability in the California Current Ecosystem during the summer-fall feeding season. Mov Ecol. 2019;7:26. pmid:31360521
- 23. Satterthwaite E, Allen A, Lampe R, Gold Z, Thompson A, Bowlin N, et al. Toward Identifying the Critical Ecological Habitat of Larval Fishes: An Environmental DNA Window into Fisheries Management. Oceanog. 2023.
- 24.
Rausch RL, Rice DW. Ogmogaster trilineatus sp. n. (Trematoda: Notocotylidae) from the fin whale, Balaenoptera physalus L. Proc Helminthol Soc Wash. 1970;37(2):196–200.
- 25. Measures LN. Annotated list of metazoan parasites reported from the blue whale, Balaenoptera musculus. J Helminthol Soc Wash. 1993;60(1):62–6.
- 26. Félix F, Bearson B, Falconí J. Epizoic barnacles removed from the skin of a humpback whale after a period of intense surface activity. Mar Mamm Sci. 2006;22(4):979–84.
- 27. Kane EA, Olson PA, Gerrodette T, Fiedler PC. Prevalence of the commensal barnacle Xenobalanus globicipitis on cetacean species in the eastern tropical Pacific Ocean, and a review of global occurrence. Fish Bull. 2008;106(4):395–404.
- 28. Ten S, Raga JA, Aznar FJ. Epibiotic fauna on cetaceans worldwide: A systematic review of records and indicator potential. Front Mar Sci. 2022;9:846558.
- 29. Apprill A, Mooney TA, Lyman E, Stimpert AK, Rappé MS. Humpback whales harbour a combination of specific and variable skin bacteria. Environ Microbiol Rep. 2011;3(2):223–32. pmid:23761254
- 30. Apprill A, Robbins J, Eren AM, Pack AA, Reveillaud J, Mattila D, et al. Humpback whale populations share a core skin bacterial community: towards a health index for marine mammals?. PLoS One. 2014;9(3):e90785. pmid:24671052
- 31. Glaeser SP, Silva LMR, Prieto R, Silva MA, Franco A, Kämpfer P, et al. A Preliminary Comparison on Faecal Microbiomes of Free-Ranging Large Baleen (Balaenoptera musculus, B. physalus, B. borealis) and Toothed (Physeter macrocephalus) Whales. Microb Ecol. 2022;83(1):18–33. pmid:33745062
- 32. Reidy RD, Lemay MA, Innes KG, Clemente-Carvalho RBG, Janusson C, Dower JF, et al. Fine-scale diversity of prey detected in humpback whale feces. Ecol Evol. 2022;12(12):e9680. pmid:36619710
- 33. Apprill A, Miller CA, Moore MJ, Durban JW, Fearnbach H, Barrett-Lennard LG. Extensive Core Microbiome in Drone-Captured Whale Blow Supports a Framework for Health Monitoring. mSystems. 2017;2(5):e00119-17. pmid:29034331
- 34. Domínguez-Sánchez CA, Álvarez-Martínez RC, Gendron D, Acevedo-Whitehouse K. Common core respiratory bacteriome of the blue whale Balaenoptera musculus, in the Gulf of California. bioRxiv. 2022.
- 35. Vendl C, Slavich E, Wemheuer B, Nelson T, Ferrari B, Thomas T, et al. Respiratory microbiota of humpback whales may be reduced in diversity and richness the longer they fast. Sci Rep. 2020;10(1):12645. pmid:32724137
- 36. Friedlaender AS, Hazen EL, Nowacek DP, Halpin PN, Ware C, Weinrich MT, et al. Diel changes in humpback whale Megaptera novaeangliae feeding behavior in response to sand lance Ammodytes spp. behavior and distribution. Mar Ecol Prog Ser. 2009;395:91–100.
- 37. Pendleton D, Sullivan P, Brown M, Cole T, Good C, Mayo C, et al. Weekly predictions of North Atlantic right whale Eubalaena glacialis habitat reveal influence of prey abundance and seasonality of habitat preferences. Endang Species Res. 2012;18(2):147–61.
- 38. Carroll EL, Gallego R, Sewell MA, Zeldis J, Ranjard L, Ross HA, et al. Multi-locus DNA metabarcoding of zooplankton communities and scat reveal trophic interactions of a generalist predator. Sci Rep. 2019;9(1):281. pmid:30670720
- 39. Boyse E, Robinson KP, Beger M, Carr IM, Taylor M, Valsecchi E, et al. Environmental DNA reveals fine‐scale spatial and temporal variation of marine mammals and their prey species in a Scottish marine protected area. Environmental DNA. 2024;6(4).
- 40. Falcone EA, Keene EL, Keen EM, Barlow J, Stewart J, Cheeseman T, et al. Movements and residency of fin whales (Balaenoptera physalus) in the California Current System. Mamm Biol. 2022;102(4):1445–62.
- 41. Barlow J, Forney KA. Abundance and population density of cetaceans in the California Current ecosystem. Fish Bull. 2007;105(4):509–27.
- 42. Cubaynes HC, Fretwell PT, Bamford C, Gerrish L, Jackson JA. Whales from space: Four mysticete species described using new VHR satellite imagery. Marine Mammal Science. 2018;35(2):466–91.
- 43. Irvine LM, Mate BR, Winsor MH, Palacios DM, Bograd SJ, Costa DP, et al. Spatial and temporal occurrence of blue whales off the U.S. West Coast, with implications for management. PLoS One. 2014;9(7):e102959. pmid:25054829
- 44. Baker CS, Steel D, Nieukirk S, Klinck H. Environmental DNA (eDNA) from the wake of the whales: droplet digital PCR for detection and species identification. Front Mar Sci. 2018;5:133.
- 45. Irvine LM, Palacios DM, Lagerquist BA, Mate BR. Scales of Blue and Fin Whale Feeding Behavior off California, USA, With Implications for Prey Patchiness. Front Ecol Evol. 2019;7.
- 46. Croll DA, Acevedo-Gutiérrez A, Tershy BR, Urbán-Ramírez J. The diving behavior of blue and fin whales: is dive duration shorter than expected based on oxygen stores?. Comp Biochem Physiol A Mol Integr Physiol. 2001;129(4):797–809.
- 47. Tyack PL, Johnson M, Soto NA, Sturlese A, Madsen PT. Extreme diving of beaked whales. J Exp Biol. 2006;209(Pt 21):4238–53. pmid:17050839
- 48. Valentini A, Taberlet P, Miaud C, Civade R, Herder J, Thomsen PF, et al. Next-generation monitoring of aquatic biodiversity using environmental DNA metabarcoding. Mol Ecol. 2016;25(4):929–42. pmid:26479867
- 49. Deiner K, Bik HM, Mächler E, Seymour M, Lacoursière-Roussel A, Altermatt F, et al. Environmental DNA metabarcoding: Transforming how we survey animal and plant communities. Mol Ecol. 2017;26(21):5872–95. pmid:28921802
- 50. Bucklin A, Lindeque PK, Rodriguez-Ezpeleta N, Albaina A, Lehtiniemi M. Metabarcoding of marine zooplankton: prospects, progress and pitfalls. J Plankton Res. 2016;38(3):393–400.
- 51. McNichol J, Berube PM, Biller SJ, Fuhrman JA. Evaluating and Improving Small Subunit rRNA PCR Primer Coverage for Bacteria, Archaea, and Eukaryotes Using Metagenomes from Global Ocean Surveys. mSystems. 2021;6(3):e0056521. pmid:34060911
- 52. Becker EA, Forney KA, Redfern JV, Barlow J, Jacox MG, Roberts JJ, et al. Predicting cetacean abundance and distribution in a changing climate. Diversity and Distributions. 2018;25(4):626–43.
- 53. James CC, Barton AD, Allen LZ, Lampe RH, Rabines A, Schulberg A, et al. Influence of nutrient supply on plankton microbiome biodiversity and distribution in a coastal upwelling region. Nat Commun. 2022;13(1):2448. pmid:35508497
- 54. Parada AE, Needham DM, Fuhrman JA. Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. Environ Microbiol. 2016;18(5):1403–14. pmid:26271760
- 55. Berdjeb L, Parada A, Needham DM, Fuhrman JA. Short-term dynamics and interactions of marine protist communities during the spring-summer transition. ISME J. 2018;12(8):1907–17. pmid:29599520
- 56. Amaral-Zettler LA, McCliment EA, Ducklow HW, Huse SM. A method for studying protistan diversity using massively parallel sequencing of V9 hypervariable regions of small-subunit ribosomal RNA genes. PLoS One. 2009;4(7):e6372. pmid:19633714
- 57. Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol. 2019;37(8):852–7. pmid:31341288
- 58. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17(1):10–2.
- 59. Bokulich NA, Kaehler BD, Rideout JR, Dillon M, Bolyen E, Knight R, et al. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome. 2018;6(1):90. pmid:29773078
- 60. Guillou L, Bachar D, Audic S, Bass D, Berney C, Bittner L, et al. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote small sub-unit rRNA sequences with curated taxonomy. Nucleic Acids Res. 2013;41(Database issue):D597-604. pmid:23193267
- 61. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.
- 62. Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, et al. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 2007;35(21):7188–96. pmid:17947321
- 63.
Barlow J. Preliminary estimates of cetacean abundance off California, Oregon, and Washington based on a 1996 ship survey and comparisons of passing and closing modes. 1997.
- 64. Barlow J, Forney KA. Abundance and population density of cetaceans in the California Current ecosystem. Fishery Bulletin. 2007;105(4):509–27.
- 65. Marques TA, Thomas L, Fancy SG, Buckland ST. Improving Estimates of Bird Density Using Multiple- Covariate Distance Sampling. The Auk. 2007;124(4):1229–43.
- 66. Roberts JJ, Best BD, Mannocci L, Fujioka E, Hapin PN, Palka DL, et al. Habitat-based cetacean density models for the U.S. Atlantic and Gulf of Mexico. Scientific Reports. 2016;6:22615.
- 67. Miller DL, Rexstad E, Thomas L, Marshall L, Laake JL. Distance sampling in R. J Stat Softw. 2019;89:1–28.
- 68.
Buckland ST, Anderson DR, Burnham KP, Laake JL, Borchers DL, Thomas L. Introduction to Distance Sampling. Oxford: Oxford University Press. 2001.
- 69. Marques FFC, Buckland ST. Covariate models for the detection function. Advanced Distance Sampling. Oxford University PressOxford. 2004. 31–47.
- 70. Martín-Fernández JA, Hron K, Templ M, Filzmoser P, Palarea-Albaladejo J. Bayesian-multiplicative treatment of count zeros in compositional data sets. Stat Model. 2015;15(2):134–58.
- 71. Aitchison J. The Statistical Analysis of Compositional Data. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1982;44(2):139–60.
- 72. Aitchison J, Bacon-shone J. Log contrast models for experiments with mixtures. Biometrika. 1984;71(2):323–30.
- 73. Wold S, Sjöström M, Eriksson L. PLS-regression: a basic tool of chemometrics. Chemometr Intell Lab Syst. 2001;58(2):109–30.
- 74. Abdi H. Partial least squares regression and projection on latent structure regression (PLS Regression). WIREs Computational Stats. 2010;2(1):97–106.
- 75. Carrascal LM, Galván I, Gordo O. Partial least squares regression as an alternative to current regression methods used in ecology. Oikos. 2009;118(5):681–90.
- 76. De Jong S. SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems. 1993;18(3):251–63.
- 77. Meinshausen N, Bühlmann P. Stability selection. J R Stat Soc Ser B Stat Methodol. 2010;72(4):417–73.
- 78. Chun H, Keleş S. Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc Series B Stat Methodol. 2010;72(1):3–25. pmid:20107611
- 79. Irvine LM, Lagerquist BA, Schorr GS, Falcone EA, Mate BR, Palacios DM. Ecological drivers of movement for two sympatric marine predators in the California current large marine ecosystem. Mov Ecol. 2025;13(1):19. pmid:40102967
- 80. Herwig RP, Staley JT. Anaerobic bacteria from the digestive tract of North Atlantic fin whales (Balaenoptera physalus). FEMS Microbiology Letters. 1986;38(6):361–71.
- 81. Agogué H, Lamy D, Neal PR, Sogin ML, Herndl GJ. Water mass-specificity of bacterial communities in the North Atlantic revealed by massively parallel sequencing. Mol Ecol. 2011;20(2):258–74. pmid:21143328
- 82. Samo TJ, Pedler BE, Ball GI, Pasulka AL, Taylor AG, Aluwihare LI, et al. Microbial distribution and activity across a water mass frontal zone in the California Current Ecosystem. Journal of Plankton Research. 2012;34(9):802–14.
- 83. Djurhuus A, Boersch-Supan PH, Mikalsen S-O, Rogers AD. Microbe biogeography tracks water masses in a dynamic oceanic frontal system. R Soc Open Sci. 2017;4(3):170033. pmid:28405400
- 84. Saunders JK, McIlvin MR, Dupont CL, Kaul D, Moran DM, Horner T, et al. Microbial functional diversity across biogeochemical provinces in the central Pacific Ocean. Proc Natl Acad Sci U S A. 2022;119(37):e2200014119. pmid:36067300
- 85. Shapiro BJ, Polz MF. Ordering microbial diversity into ecologically and genetically cohesive units. Trends Microbiol. 2014;22(5):235–47. pmid:24630527
- 86. Cordero OX, Polz MF. Explaining microbial genomic diversity in light of evolutionary ecology. Nat Rev Microbiol. 2014;12(4):263–73. pmid:24590245
- 87. Salazar G, Sunagawa S. Marine microbial diversity. Curr Biol. 2017;27(11):R489–94. pmid:28586685
- 88. Gralka M, Szabo R, Stocker R, Cordero OX. Trophic Interactions and the Drivers of Microbial Community Assembly. Curr Biol. 2020;30(19):R1176–88. pmid:33022263
- 89. Kieft B, Li Z, Bryson S, Hettich RL, Pan C, Mayali X, et al. Phytoplankton exudates and lysates support distinct microbial consortia with specialized metabolic and ecophysiological traits. Proc Natl Acad Sci U S A. 2021;118(41):e2101178118. pmid:34620710
- 90. Dal Bello M, Lee H, Goyal A, Gore J. Resource-diversity relationships in bacterial communities reflect the network structure of microbial metabolism. Nat Ecol Evol. 2021;5(10):1424–34. pmid:34413507
- 91. Zhao P, Yu B. On model selection consistency of Lasso. The Journal of Machine Learning Research. 2006;7:2541–63.
- 92. Bach FR. Bolasso: model consistent Lasso estimation through the bootstrap. In: Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 2008. 33–40.
- 93. Hoffmann I, Serneels S, Filzmoser P, Croux C. Sparse partial robust M regression. Chemometr Intell Lab Syst. 2015;149:50–9.