## Figures

## Abstract

Stable isotope mixing models (SIMMs) provide a powerful methodology for quantifying relative contributions of several sources to a mixture. They are widely used in the fields of ecology, geology, and archaeology. Although SIMMs have been rapidly evolved in the Bayesian framework, the underdetermination of mixing space remains problematic, i.e., the estimated relative contributions are incompletely identifiable. Here we propose a statistical method to quantitatively diagnose underdetermination in Bayesian SIMMs, and demonstrate the applications of our method (named *β*-dependent SIMM) using two motivated examples. Using a simulation example, we showed that the proposed method can rigorously quantify the expected underdetermination (i.e., intervals of *β*-dependent posterior) of relative contributions. Moreover, the application to the published field data highlighted two problematic aspects of the underdetermination: 1) ordinary SIMMs was difficult to quantify underdetermination of each source, and 2) the marginal posterior median was not necessarily consistent with the joint posterior peak in the case of underdetermination. Our study theoretically and numerically confirmed that *β*-dependent SIMMs provide a useful diagnostic tool for the underdetermined mixing problem. In addition to ordinary SIMMs, we recommend reporting the results of *β*-dependent SIMMs to obtain a biologically feasible and sound interpretation from stable isotope data.

**Citation: **Osada Y, Matsubayashi J, Tayasu I (2021) Diagnosing underdetermination in stable isotope mixing models. PLoS ONE 16(10):
e0257818.
https://doi.org/10.1371/journal.pone.0257818

**Editor: **Hideyuki Doi, University of Hyogo, JAPAN

**Received: **January 22, 2021; **Accepted: **September 13, 2021; **Published: ** October 1, 2021

**Copyright: ** © 2021 Osada et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All relevant data are within the manuscript and Supporting Information files.

**Funding: **This project was supported financially by JST CREST (Grant number JPMJCR13A3) and JSPS KAKENHI (Grant number JP21H04784) through grants awarded to YO, JM and IT. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

In animal ecology, the development of a methodology for quantifying trophic interactions between consumers and their dietary sources has a long history [1, 2]. Stable isotope mixing models (SIMMs) are popular statistical tools not only for ecologists to estimate the relative contribution of each dietary source to consumers based on isotopic signatures [3, 4], but also used in the other fields such as climatology, oceanography, sedimentology and archaeology [5–7]. Using a Bayesian framework, the applicability of SIMMs to complex isotopic mixing spaces of realistic systems has rapidly improved [8]. The improvements include the incorporation of measurement errors [9], isotopic correlations [10], element concentrations [11, 12], dietary routings [13], additional residual errors of unknown sources [12], and hierarchical structures of consumer populations [14] and food-webs [15]. More recently, most of them are applicable as an open-source program [16].

However, the considerable advances in SIMMs have made it tempting to overlook a fundamental issue in mixing models, the underdetermined mixing problem [7, 16–19] (see S1 Appendix in S1 File). As the terminology suggests, this is a statistical problem related to the underdetermination of estimates, originating from the analysis of many dietary sources against few elements and/or problematic isotopic geometries (e.g., three or more sources are arranged on the same line) in mixing spaces [17]. Let consider a simple example of underdetermined mixing problem for Bayesian models. In this example, we are interested in a consumer species, which has two candidate dietary sources with their relative contributions, *θ* and 1−*θ* (*θ* can be either 1.0, 0.5 or 0.0 for simplicity). For sound mixing spaces, the unique relative contributions of these sources have the maximal posterior probability, e.g., *P*(*θ* = 0.5)>*P*(*θ* = 1.0)≥*P*(*θ* = 0.0). Statistically, such relative contributions are referred to as being identifiable (or estimable). On the other hands, the underdetermined mixing problem has two or more relative contributions with the maximal posterior probability, e.g., *P*(*θ* = 0.5) = *P*(*θ* = 1.0)≥*P*(*θ* = 0.0). Importantly, this problem may result in the inappropriate interpretation and wrong secondary use of SIMM results; in the case of underdetermination, SIMM results cannot be summarized only by a representative value (mean, median or mode) of the marginal posterior distribution.

The underdetermination is a general statistical problem but particularly notorious for SIMMs, in which increasing the number of elements, rather than the number of isotope samples, is needed to improve the identifiability of relative contributions. Previous studies proposed several methods to diagnose this problem (e.g., graphical checking [17], posterior correlation [18], posterior multi-modality [18] and normalized source polygon area [19]). However, these methods are unsuitable for diagnosing complex mixing spaces because their diagnostic signals are sensitive to various sources of data uncertainty. Here we present an alternative method to quantitatively diagnose the underdetermination for existing Bayesian SIMMs (Fig 1). In this paper, we explained how our method obtains the intervals of joint posterior peaks as an accurate diagnostic of underdetermination, and demonstrated its applications to two motivated examples; a simple toy simulation [17] (Fig 2) and a published field dataset for Brent geese (*Branta bernicla*) [12, 20] (Fig 3). The proposed method uses a simple, general statistical framework (i.e., *β*-dependent posterior probability), and thus achieves accurate diagnosability and broad applicability to all Bayesian SIMMs.

First, we calculate the posteriors of both ordinary and *β*-dependent SIMM (panel b) models from isotopic mixing space (panel a). Second, ΔBDP (i.e., 95% highest posterior interval of *β*-dependent posterior) is overlaid on the ordinary posterior (panel c), which make us easy diagnostic checking.

The mixing space (left panel) represents the isotopic value of an individual consumer (a black circle), and the means and standard deviations of isotopic values of four sources (colored circles and bars). The posterior distribution of each relative contribution (right panel) was calculated from ordinary (blue area) and *β*-dependent SIAR model (red area). The theoretical relative contribution is represented in Fig 1C.

The mixing space (left panel) represents the fractionally corrected isotopic values of individual consumers (black circles), and the means and standard deviations of isotopic values of four sources (colored circles and bars). The posterior distribution of each relative contribution was calculated from ordinary (blue area) and *β*-dependent SIAR model (red area).

## Materials and methods

### Diagnosing underdetermination in SIMMs

Since SIMM is one of Bayesian statistical models, its posterior probability, *p*(** θ**|

**,**

*φ***), is derived from a prior,**

*X**π*(

**), and a data probability,**

*θ**p*(

**|**

*X***,**

*θ***), as follows: (1) where**

*φ***is a vector of estimated quantities (e.g., relative contributions),**

*θ***is a vector of user-given quantities, and**

*φ***is a vector of isotopic data. As an example of Eq (1), we described the model of SIAR [12] in S2 Appendix in S1 File. To diagnose underdetermination, we introduced a**

*X**β*-dependent posterior probability: (2)

*β*-dependent posterior probability with

*β*= 1 reduces to the ordinary posterior probability. For large

*β*(→ ∞), the posterior probability of estimates other than the join posterior peak become zeros (see S3 Appendix in S1 File for this proof). Thus,

*β*-dependent posterior distributions with arbitrarily large

*β*correspond to the distributions of the joint posterior peaks. An intrinsic issue of the underdetermined problem is low identifinability among the relative contributions of two or more sources, and thereby provides wide intervals of

*β*-dependent posterior distributions (BDP). Importantly, the model complexity of SIMMs does not influence diagnosability of our method. In this paper, we used the ΔBDP (the BDP width of 95% highest density probability interval) as a quantitative diagnostic for underdetermination. The ΔBDP will be near zero when the underdetermined problem does not exist in the focal mixing space. Note that

*β*has little influence on ΔBDP if the

*β*is sufficiently large (e.g., > 500).

Our method is non-Bayesian despite adoption of the full Bayesian setup for standard Markov chain Monte Carlo calculations. Because it works just as the device to numerically obtain the joint posterior peaks (i.e., BDP), simultaneous calculations of the ordinary posterior are needed to interpret the whole results of Bayesian SIMMs (Fig 1). Note that our method is closely related to previous established methods of obtaining maximum likelihood estimates (MLE) [21, 22] (also see S3 Appendix in S1 File). The difference between our method and their MLE methods is whether priors are exponentiated or not.

### Motivated examples

We provide two motivated examples. The first example is a two-isotope simulation in which the consumer signature is at the centroid of four source signatures (Fig 2; see S4 Appendix in S1 File for simulation details). This simulation was used to diagnose the underdetermination by previous diagnostic methods [17, 18]. It also helps to confirm the theoretical validity of our method: the BDP of each relative contribution theoretically becomes 23.9–26.0% in our setting. The second motivated example is a set of published field data collected to investigate the relative contributions of four different dietary sources (two seagrasses, green algae, and terrestrial grasses) to Brent geese by using carbon and nitrogen stable isotope ratios [12, 20] (Fig 3).

In these examples, we estimated relative contributions of sources to the mixture using ordinary SIAR and *β*-dependent SIAR (i.e., SIAR with *β*-dependent posterior probability). In the latter, we set *β* = 1,000 because sufficiently large *β* qualitatively unchanged the results (see S3 Appendix in S1 File). As described above, we used ΔBDP to diagnose underdetermination for each source relative contribution. We also conducted correlation analyses between the estimated relative contribution of a source and the others (S5 Appendix in S1 File). The mean determination coefficient () is considered as another indicator of underdetermination [18], which will approach one if underdetermination exists. The posterior multi-modality was not used as an indicator of underdetermination because our examples exhibit no multi-modality. To obtain the posterior distributions of both SIARs numerically, the Gibbs sampling and importance sampling was implemented using R 4.1.1 [23] (our implementation is available as an R package ‘siarbeta’ at https://github.com/yutakaos/archives/tree/master/simm/siarbeta). We assigned uninformative or vague priors for all parameters according to previous studies [12, 16].

### Different error structure parameterizations

The recent publications [16, 24] compared the model performance between different error structure parameterizations. To investigate the influence of error structure parameterizations on underdetermination, we conducted an additional analysis for published geese data using the Stock’s parameterization model (i.e., Eq 4 in [24]). The details of model formulations and results are described in S6 Appendix in S1 File.

## Results

In the toy simulation, the estimated relative contributions of sources to the mixture were moderately correlated ( = 0.676, 0.674, 0.675 and 0.676; Table 1, S1 Fig in S1 File). On the other hand, the *β*-dependent SIAR exhibited relatively narrow BDP for all sources (23.5–26.4% for all sources; Table 1, Fig 2). These intervals were much the same as the expected theoretical intervals (i.e., 23.9–26.0%).

Using the published geese data, the estimated relative contributions of dietary items were weakly correlated ( = 0.186, 0.122, 0.133 and 0.330; Table 1, S2 Fig in S1 File). However, ΔBDP varies substantially among different sources. Terrestrial grasses had a narrow BDP (5.3–6.3%), while *Enteromorpha* spp. had a relatively wide BDP (29.1–38.3%). The BDP of *Zostera* and *Ulva lactuca* were intermediate (55.9–59.2% and 0.0–5.4%, respectively). Interestingly, all the marginal posterior medians fell outside the BDP (Table 1, Fig 3). We also found that changing error structure parameterization from SIAR to Stock’s model leads to wider ΔBDP but improves the consistency between ordinary and beta-dependent marginal posteriors (S1 Table, S3 Fig in S1 File).

## Discussion

In our simulation, our method provides the expected theoretical intervals for relative contributions of each source to the mixture. This confirms that *β*-dependent posterior distributions can quantitatively diagnose the underdetermined mixing problem. Because *β*-dependent posterior probability can be applied even for complex mixing systems, our method balances accurate diagnosability and broad applicability. This is the biggest advantage over previous methods such as graphical checking, posterior correlation, posterior multi-modality and normalized source polygon area [17–19].

Using the published geese data, we found two problematic aspects of underdetermined mixing problems highlighted by the analysis of *β*-dependent SIMM. First, ΔBDP vary among dietary sources, and the mean determination coefficient from ordinary SIMM cannot quantify the variation (Table 1, Fig 3). This may not be surprising because the BDPs only depend on the geometry of the isotopic mixing space, while the mean determination coefficients depend on both isotopic geometry and other data uncertainties. Second, the marginal posterior medians of ordinary SIMM are necessarily inconsistent with the joint posterior peaks. In this example, all the medians fell outside of the BDPs. This inconsistency was serious for *Enteromorpha* spp. whose relative contribution was most underdetermined (Fig 3). It is explained by failing to approximate marginal posterior distributions to Gaussian distributions [25] due to underdetermination. Furthermore, the slight difference of model structures may unexpectedly influence the underdetermination. We found that the Stock’s error parameterization improved the consistency between ordinary and beta-dependent marginal posteriors at the cost of expanding ΔBDP for published geese dataset. These results emphasize the importance of quantitatively diagnosing underdetermination in SIMMs even in seemingly sound mixing spaces.

The serious underdetermined mixing problem results in high correlation and multi-modality for estimated relative contributions [18]. However, our results showed that these indicators may be a rough diagnostic. For example, *Zostera* had higher mean determination coefficient than those of *Ulva lactuca* (0.186 vs 0.133), but the inverse relationship was observed for ΔBDP (0.8% vs 5.3%). The multi-modality cannot even be detected for our examples. Checking correlation and multi-modality should be recognized as a preliminary tool for diagnosing serious underdetermined problems.

Our method exponentiates not only data probability but also prior probability because our interest is to diagnose underdetermination of posterior probability. Therefore, our method inherits the benefits and limitations of Bayesian methods. Specifically, we can utilize the information from other investigations (e.g., stomach contents analysis) as informative priors, while we can obtain trivial results when data has little information. For Bayesian users, it is important to understand that underdetermined mixing problems are resolved either by improving data probability (i.e., additional isotope elements) or by using informative priors. Future works should aim at determining appropriate priors for underdetermined mixing problems, although it is beyond the scope of this study.

SIMMs are widely used in stable isotope studies to improve the biological interpretation of isotope data. However, misinterpretation may result from underdetermined mixing problems, even in sophisticated field studies. In the case of the Brent geese data, the relative contribution of *Enteromorpha* spp. involved high uncertainty due to underdetermination probably because the isotopic signatures were on the inside of the polygon composed of the other sources. Such problematic isotopic geometries frequently occur in isotopic studies [26–29], requiring a comprehensive discussion of the potential influence of underdetermination on their results. Furthermore, the secondary use of representative estimates for additional analyses is common in isotopic studies [30, 31]. We provide two recommendations for SIMM users. First, the users should report the BDPs in addition to results of ordinary SIMMs, and discuss the influence of underdetermination on their results for sources with moderate ΔBDPs (e.g., ~10%). If interested sources have large ΔBDPs (e.g., 10%~), more isotope elements should be used to remedy underdetermined mixing problems [17]. The allowable underdetermination criterion is context-dependent but for most practical cases, the mixing problem with ΔBDP < 10% may have little influence on the interpretation of SIMM results. Second, the users should not use the marginal posterior medians and modes for secondary use. Instead, we can use the samples from posterior distributions of ordinary or *β*-dependent SIMMs.

Recently, there have been the remarkable developments of biological tracer analysis (e.g., compound-specific stable isotopes of amino acids and fatty acids [32]). These developments should increase the utility and reliability of Bayesian SIMMs. However, we may need to continue struggling the underdetermined problem because our interest will expand to the system with more mixing sources and finer resolution (e.g., species-level to population-level). Hopefully, our method will contribute to sound interpretation and secondary use of isotopic information in many practical settings.

## References

- 1. Hyslop EJ. Stomach contents analysis—a review of methods and their application. J Fish Biol. 1980; 17: 411–429.
- 2. Kohn MH, Wayne RK. Facts from feces revisited. Trends Ecol Evol. 1997; 12: 223–227. pmid:21238046
- 3. Phillips DL. Mixing models in analyses of diet using multiple stable isotopes: a critique. Oecologia. 2001; 127: 166–170. pmid:24577645
- 4. Phillips DL, Gregg JW. Source partitioning using stable isotopes: coping with too many sources. Oecologia. 2003; 136: 261–269. pmid:12759813
- 5. Mann PJ, Eglinton TI, McIntyre CP, Zimov N, Davydova A, Vonk JE, et al. Utilization of ancient permafrost carbon in headwaters of Arctic fluvial networks. Nat Commun. 2015; 6: 7856. pmid:26206473
- 6. Walling DE. The evolution of sediment source fingerprinting investigations in fluvial systems. J Solis Sedemints. 2013; 13: 1658–1675.
- 7. Makarewicz CA, Sealy J. Dietary reconstruction, mobility, and the analysis of ancient skeletal tissues: Expanding the prospects of stable isotope research in archaeology. J Archaeol Sci. 2015; 56: 146–158.
- 8. Moore JW, Semmens BX. Incorporating uncertainty and prior information into stable isotope mixing models. Ecol Lett. 2008; 11: 470–480. pmid:18294213
- 9. Ward EJ, Semmens BX, Schindler DE. Including source uncertainty and prior information in the analysis of stable isotope mixing models. Environ Sci Technol. 2010; 44: 4645–4650. pmid:20496928
- 10. Hopkins JB, Ferguson JM. Estimating the diets of animals using stable isotopes and a comprehensive Bayesian mixing model. Plos One. 2012; 7: e28478. pmid:22235246
- 11. Phillips DL, Koch PL. Incorporating concentration dependence in stable isotope mixing models. Oecologia. 2002; 130: 114–125. pmid:28547016
- 12. Parnell AC, Inger R, Bearhop S, Jackson AL. Source partitioning using stable isotopes: coping with too much variation. Plos One. 2010; 5: e9672. pmid:20300637
- 13. Fernandes R, Millard AR, Brabec M, Nadeau MJ, Grootes P. Food reconstruction using isotopic transferred signals (FRUITS): a Bayesian model for diet reconstruction. Plos One. 2014; 9: e87436. pmid:24551057
- 14. Semmens BX, Ward EJ, Moore JW, Darimont CT. Quantifying inter- and intra-population niche variability using hierarchical Bayesian stable isotope mixing models. Plos One. 2009; 4: e6187. pmid:19587790
- 15. Kadoya T, Osada Y, Takimoto G. IsoWeb: a Bayesian isotope mixing model for diet analysis of the whole food web. Plos One. 2012; 7: e41057. pmid:22848427
- 16. Stock BC, Jackson AL, Ward EJ, Parnell AC, Phillips DL, Semmens BX. Analyzing mixing systems using a new generation of Bayesian tracer mixing models. PeerJ. 2018; 6: e5096. pmid:29942712
- 17. Fry B. Alternative approaches for solving underdetermined isotope mixing problems. Mar Ecol Prog Ser. 2013; 472: 1–13.
- 18. Semmens BX, Ward EJ, Parnell AC, Phillips DL, Bearhop S, Inger R, et al. Statistical basis and outputs of stable isotope mixing models: comment on Fry (2013). Mar Ecol Prog Ser. 2013; 490: 285–289.
- 19. Brett MT. Resource polygon geometry predicts Bayesian stable isotope mixing model bias. Mar Ecol Prog Ser. 2014; 514: 1–12.
- 20. Inger R, Ruxton GD, Newton J, Colhoun K, Robinson JA, Jackson AL et al. Temporal and intrapopulation variation in prey choice of wintering geese determined by stable isotope analysis. J Anim Ecol. 2006; 75: 1190–1200. pmid:16922855
- 21. Robert CP. Prior feedback: a Bayesian approach to maximum likelihood distribution. Comput Stat. 1993; 8: 279–294.
- 22. Lele SR, Dennis B, Lutscher F. Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecol Lett. 2007; 10: 551–563. pmid:17542934
- 23.
R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. 2020. Available from: http://www.R-project.org/
- 24. Stock B. C. and Semmens B. X. 2016. Unifying error structures in commonly used biotracer mixing models. Ecology, 97, 2562–2569. pmid:27859126
- 25.
Gelman A, Carlin JB., Stern HS, Rubin DB. Large-sample inference and frequency properties of Bayesian inference. In: Bayesian Data analysis 2nd ed. Boca Raton: Chapman and Hall/CRC Press; 2003. pp. 101–113.
- 26. McClellan CM, Braun-McNeill J, Avens L, Wallace BP, Read AJ. Stable isotopes confirm a foraging dichotomy in juvenile loggerhead sea turtles. J Exp Mar Biol Ecol. 2010; 387: 44–51.
- 27. Rutz C, Bluff LA, Reed N, Troscianko J, Newton J, Inger R, et al. The ecological significance of tool use in New Caledonian crows. Science. 2010; 329: 1523–1526. pmid:20847272
- 28. Votier SC, Bearhop S, Witt MJ, Inger R., Thompson D, Newton J. Individual responses of seabirds to commercial fisheries revealed using GPS tracking, stable isotopes and vessel monitoring systems. J Appl Ecol. 2010; 47: 487–497.
- 29. Miller TW, Bosley KL, Shibata J, Brodeur RD, Omori K, Emmett R. Contribution of prey to Humboldt squid Dosidicus gigas in the northern California Current, revealed by stable isotope analyses. Mar Ecol Prog Ser. 2013; 477: 123–134.
- 30. Inger R, Harrison XA, Ruxton GD, Newton J, Colhoun K, Gudmundsson GA, et al. Carry-over effects reveal reproductive costs in a long-distance migrant. J Anim Ecol. 2010; 79: 974–982. pmid:20579179
- 31. Matsubayashi J, Morimoto J, Mano T, Aryal A, Nakamura F. Using stable isotopes to understand the feeding ecology of the Hokkaido brown bear (Ursus arctos) in Japan. Ursus. 2014; 25: 87–97.
- 32. Boecklen WJ, Yarnes CT, Cook BA, James AC. On the use of stable isotopes in trophic ecology. Annu Rev Ecol Evol Syst. 2011; 42: 411–40.