Tracking Contributions to Human Body Burden of Environmental Chemicals by Correlating Environmental Measurements with Biomarkers

The work addresses current knowledge gaps regarding causes for correlations between environmental and biomarker measurements and explores the underappreciated role of variability in disaggregating exposure attributes that contribute to biomarker levels. Our simulation-based study considers variability in environmental and food measurements, the relative contribution of various exposure sources (indoors and food), and the biological half-life of a compound, on the resulting correlations between biomarker and environmental measurements. For two hypothetical compounds whose half-lives are on the order of days for one and years for the other, we generate synthetic daily environmental concentrations and food exposures with different day-to-day and population variability as well as different amounts of home- and food-based exposure. Assuming that the total intake results only from home-based exposure and food ingestion, we estimate time-dependent biomarker concentrations using a one-compartment pharmacokinetic model. Box plots of modeled R2 values indicate that although the R2 correlation between wipe and biological (e.g., serum) measurements is within the same range for the two compounds, the relative contribution of the home exposure to the total exposure could differ by up to 20%, thus providing the relative indication of their contribution to body burden. The novel method introduced in this paper provides insights for evaluating scenarios or experiments where sample, exposure, and compound variability must be weighed in order to interpret associations between exposure data.


Introduction
Correlation coefficients between environmental and biomarker measurements are widely used in environmental health assessments and epidemiology to explain the exposure associations between environmental media and human body burdens [1][2][3][4]. As a result considerable attention and effort have been given to interpretation of these coefficients [5][6][7]. However, there is limited information available on how the variance in environmental measurements, the relative contribution of exposure sources, and the elimination half-life affect the reliability of the resulting correlation coefficients. To address this information gap, we conducted a simulation study for various exposure scenarios of home-based exposure (e.g., inhalation, dermal uptake, non-dietary dust ingestion) to explore the impacts of pathway-specific scales of exposure variability on the resulting correlation coefficients between environmental and biomarker measurements.
Biomonitoring data, including those from blood, urine, hair, etc., have been used extensively to identify and quantify human exposures to environmental and occupational contaminants [8,9]. However, because the measured levels in biologic samples result from multiple sources, exposure routes, and environmental media, the levels mostly fail to reveal how the exposures are linked to the source or route of exposure [10]. Thus, comparison of biologic samples with measurements from a single environmental medium (e.g., dust or air) results in weak correlations and lacks statistically significance. In addition, cross-sectional biological sample sets that track a single marker have large population variability and do not capture longitudinal (i.e. day-to-day) variability, especially for compounds with relatively short biologic half-lives, which can be on the order of days such as pesticides and phthalates. Therefore, in the case where the day-to-day variability of biological sample measurements is large, the use of biomarker samples with a low number of biological measurements in epidemiologic studies as a dependent variable can result in a misclassification of exposure as well as questions of reliability [11].
For chemicals frequently found at higher levels in indoor residential environments than in outdoor environments, it is common to assume that major contributions to cumulative intake are home-based exposure and/or food ingestion. This simplification can be further justified because people generally spend more than 70 percent of their time indoors [12,13]. Compounds with significant indoor sources and long half-lives in the human bodyon the order of years for chemicals such as polybrominated diphenyl ethers (PBDEs)-have been found to have positive associations between indoor dust or air concentrations and serum concentrations in U.S. populations [4,[14][15][16][17]. On the other hand, extant research has not reported significant associations between indoor samples and biomarkers for chemicals primarily associated with food-based exposures, for example, bisphenol-A [18] and perfluorinated compounds [19]. For chemicals with both homeand food-based exposure pathways and short body half-lives (on the order of days), as is the case for many pesticides, a significant association between indoor samples and biomarkers is found less frequently or relatively weak compared to PBDEs [1,[20][21][22][23]. To better interpret these types of findings, we provide here a simulation study for various exposure scenarios to explore the role of the chemical properties and exposure conditions that are likely to give rise to a significant contribution from indoor exposures. We then assess for these situations the magnitude and variance of the associated correlation coefficients between biomarker and indoor levels.
The objectives of this study are (1) to generate simulated correlation coefficients between environmental measurements and biomarkers with different contributions of home-based exposure to total exposure and different day-to-day and population variability of intake from both residential (home) environments and food, (2) to interpret the contribution of home-based exposure to human body burden for two hypothetical compounds whose half-lives are on the order of days and years, and (3) to determine how the pattern of variability in exposure attributes impacts the resulting correlation coefficients linking biomarker levels to exposure media concentrations.

1. Overview
In this study, our first step is to synthetically generate daily environmental concentrations and food exposure concentrations based on variations of day-to-day intake from residential environments and food as well as different relative contributions of home-based and food-based exposure. As different chemicals are likely to have different relative contributions from the homebased and food-based exposure pathways, we conducted our simulations across the full range of relative contributions between the two pathways to address all plausible scenarios for various compounds. We combine the simulated home-based exposures associated with indoor environmental concentrations and food concentrations, assuming that the total intake results only from home-based exposure and food ingestion. From these inputs we estimate time-dependent biomarker concentrations using a onecompartment pharmacokinetic model. We then computed correlation coefficients between simulated environmental and biomarker concentrations.
In order to facilitate numerous simulations, several simplifications are made regarding (1) a representative environmental medium for home exposure, (2) a distribution of environmental (inhalation/dermal) and food intake, and (3) sources of exposure. First, we select chemical concentrations from indoor wipe samples (C wipe ) as a way to represent home-based exposures that result from all potential exposure routes, including inhalation, nondietary dust ingestion, and dermal uptake. From these wipe concentrations, resulting home-based exposure (E home ) can be assumed to be linearly related to C wipe and E home and C wipe are assumed for simplicity to be equal. In addition, we assume that a contaminated food intake rate represents food exposures (E food ). Second, we select C wipe and E food from log-normal distributions of variability across both population and time [6]. Lastly, we assume that the total intake accounting for biomonitoring data results from E home and E food , excluding any other exposure pathways.
Calculating the correlation coefficient between environmental and biomarker measurements requires a number of steps. First, we generate synthetic wipe concentrations for a subject's home i on a day 1 (C wipe,i,1 ) and food exposure for a subject i on a day 1 (E food,i,1 ). Second, we generate a wipe concentration for a subject's home i on a given day j (C wipe,i,j ) by correlating it with a wipe sample on the previous day (C wipe,i,j-1 ). We then apply this approach for generating C wipe,i,j to generate synthetic food exposures for a subject i on a given day j (E food,i,j ). Third, we vary the contribution of home exposure to total exposure (X 1 ) to generate a different contribution of home and food exposures, based on the assumption that E home is linearly related and equal to C wipe . Fourth, we add E home,i,j and E food,i,j for a total daily intake rate for a subject i on a given day j. Fifth, time-dependent biological concentrations are estimated using a one-compartment pharmacokinetic model. Finally, we compute Pearson's correlation coefficients between wipe and biological (e.g., serum) concentrations for our simulated population of 500 on each of 30 days.

Monte Carlo Simulations
2.2.1. Simulated home and food exposures. We assumed that wipe concentrations across the population are log-normally distributed with mean (m wipe = 1.0 mg/g) and standard deviation expressed as a coefficient of variation (CV wipe_pop = m wipe /s wipe ). We used three different CVs-1.0, 2.0, and 4.0-in order to generate synthetic wipe concentrations for a subject's home i on a day 1 (C wipe,i,1 ). We estimated parameters (a, mean and b, standard deviation) of the associated normal distribution, ln (C wipe,i,1 ), by the following method of moments [24].
where a wipe_pop and b wipe_pop are the mean and standard deviation of ln (C wipe,i,1 ), respectively. We then used the following lognormal inverse cumulative distribution function (cdf) to generate wipe concentrations for 500 homes with a residential receptor population for the first day of exposure.
where C wipe,i,1 is the wipe concentration selected with probability of p from the inverse lognormal cdf with parameters a wipe_pop and b wipe_pop for a subject's home i on a day 1. Since the wipe concentration for a subject's home i on a given day j (C wipe,i,j ) is likely to be correlated to that on a previous day (C wipe,i,j-1 ), we used a log-Gaussian random walk to generate auto-correlated C wipe,i,j .
In other words, we first generated random numbers that are lognormally distributed using mean (m = 0) and standard deviation (s wipe_day = 1.0, 2.0, and 4.0). Then, we randomly multiplied 1 or 21 by the randomly generated numbers and computed cumulative sums. This allows us to approximate the temporal autocorrelation expected for the same house from day to day. In addition, since wipe concentrations should be positive, they were scaled up to assure positive values, maintaining the distribution of concentrations from random walk. The method to generate 'home' and 'food' exposures is the same, but the simulated numbers are different as we used a random number generator for each exposure source. Thus, for food exposures, Equations 1 through 3 are used to generate food exposures for each simulated subject i on a given day j (E food,i,j ) by replacing m wipe , s wipe_day , and CV wipe_pop with m food , s food_day , and CV food_pop . Auto-correlated wipe concentrations and food exposures are provided in Figure S1.

Biological concentrations.
Because we assumed that the biological levels result from different combinations of average home exposure (E home ) and food exposure (E food ), we computed the relative E food to E home ratio using the following equation.
where X 1 and X 2 are the percent contribution of exposure from home and food, respectively. Here, because we assumed that total exposure (E total ) is equal to the sum of E home and E food , the sum of X 1 and X 2 is 100%.
Using the different contributions to exposure from the home (X 1 ), we added E home,i,j and E food,i,j to obtain a total daily intake rate for a subject i on a given day j (I i,j ). Then, we used the onecompartment pharmacokinetic model described in Equation 5 to estimate time-dependent biological concentrations [3,25] using serum as the representative biological medium.
where C serum,i,j is the serum concentration of the compound for a subject i at time j (mg/L), k is an excretion rate coefficient of the compound (1/day), f is the fraction of the ingested compound present in the blood after absorption across the gastrointestinal tract and distribution throughout the body (unitless), V is the volume of blood (L), and I i,j is the intake rate of the compound for a subject i at time j (mg/day), summed from E home,i,j and E food,i,j . In this model, the excretion rate coefficient k can be expressed as ln(2)/t 1/2 where t 1/2 is the half-life of the compound in the human body. We assumed that the fraction f is assumed to be 1 for all compounds and the blood volume V is about 5 L for all subjects [26]. This approach can be applied for urine concentrations and can be adjusted as needed.

3. Sensitivity Analysis
Identifying the most important sources of overall exposure variability allows researchers to concentrate resources on obtaining the most important exposure data [6]. Thus, we conducted a sensitivity analysis to determine which sources of variability have relatively more influence on the R 2 value for a given home-exposure contribution. Four types of exposure variability, s wipe_day , CV wipe_pop , s food_day , and CV food_pop , were considered in our study. We computed the mean R 2 for compounds with short and long half-lives by varying one exposure variability (e.g., CV wipe_day ) from 0.2 to 4.0, but fixing other exposure variability at 1.0 and then repeated this computation for other variability.

1. Correlation Coefficient and Home Exposure Contribution
In this study, we applied various exposure scenarios to investigate the relationship between R 2 and a relative contribution of home exposure to total exposure for compounds with different biological half-lives. Figure 1 shows that the R 2 between wipe and serum concentrations increases with the increasing contribution of home exposure. Overall, as the home contribution increases, the gap between the median R 2 for a long half-life compound (empty box) and that for a short half-life compound (filled box) increases. In addition, the median R 2 is almost always larger for a compound with a short biological half-life compared to a compound with a long half-life when these compounds have the same average exposure contribution from the home environment. This is because biologic concentrations for the compound with a short half-life are more sensitive to home exposure with large variance, while concentrations for the compound with a long half-life remain relatively stable due to the longer body retention of the compound, which to a large extent buffers the variations. This result also indicates that for compounds primarily associated with food-based exposure, in other words, for those with little contribution from home exposure (e.g., BPA and outdoor use pesticides) [1,18,[20][21][22][23], the R 2 value becomes very small, as expected. In addition, for compounds with a large fraction of exposure resulting from indoor residential environments, such as PBDEs, the median R 2 at 90-100% of home contribution is approximately 0.6 [4,[14][15][16][17] as shown in Figure 1.
To look at the results in Figure 1 in a different point of view, we plot the percent of home exposure contribution with different R 2 values to reveal the relationship between the biological half-life of the compound and the relative contribution of home exposure in Figure 2. This figure illustrates that, although the R 2 between wipe and serum concentrations for two compounds with different halflives is within the same range, the relative contribution of home exposure to total exposure differs by up to 20% between compounds. For example, when the R 2 values for two compounds with different half-life values is between 0.3 and 0.4, the resulting contribution from the home environment for the short half-life compound is 20% smaller than that for the long half-life compound.
In actual exposure situations, we expect the day-to-day variability of wipe concentrations for semivolatile organic compounds to be small, due to their strong persistence on surface materials and dust [27]. In this study, we did not include the dayto-day variability associated with the relationship between the concentration in the home environment and the resulting exposure. There are two basic models for relating home concentrations to exposure. First, there are models that assume that exposure is driven by direct surface contacts, which are likely to have high day-to-day variability [28]. Second, there are models that assume that air-to-skin trans-dermal uptake becomes more significant than dermal uptake from surface contacts, and air-toskin transfer is likely to be less variable day to day [12,29]. These model choices are important because R 2 values will also be linked to whether intake is primarily associated with air-to-skin transdermal uptake or dermal uptake from surface contacts. Thus, under the same conditions used in Figures 1 and 2, but with equal contributions from home and food (X 1 = X 2 = 0.5), we also investigated relative changes of R 2 with different day-to-day variability of wipe concentrations (i.e., s wipe_day ) with results shown in Figure 3. The gap between the median R 2 for a long half-life compound (empty box) and a short half-life compound (filled box) increases with increasing s wipe_day . For all values of s wipe_day , the median R 2 for a compound with a short half-life is larger than that with a long half-life. This result indicates that dayto-day variability of wipe concentrations determines not only the magnitude of R 2 for both compounds, but also the relative magnitude of R 2 between compounds.

2. Influential Source of Variability on Correlation Coefficients
We determined the sensitivity of the mean R 2 value to each of the four different types of variability (i.e., s wipe_day , CV wipe_pop , s food_day , and CV food_pop ) across a range of scales of variability (e.g., CV = 0.2, 1.0, 2.0, and 4.0). Table 1 shows the mean R 2 at a specific variability for two compounds with different biological half-lives (t 1/2 ). In terms of changes in mean R 2 between compounds, a compound with a 2.3 year half-life is shown to be less sensitive to day-to-day variability of food concentrations (i.e., s food_day ) than one with a 3 day half-life and both compounds have similar sensitivity to day-to-day variability of wipe concentrations (i.e., s wipe_day ). In terms of changes in mean R 2 within compounds, R 2 is most sensitive to day-to-day variability of wipe concentrations for both compounds. In addition, the contributions of population variability of wipe concentrations and food exposures to changes in mean R 2 are minimal.

Implications/Limitations
Because some indoor contaminants are considered potential threats to human health, many studies have applied significant resources to examine the relationship between exposure to indoor pollutants and adverse health effects. However, these studies are potentially limited by the use of a single or a few environmental and biological samples. The significant implications of this situation are reflected in our results. Multi-day, multi-person sample analyses are costly and labor-intensive. In addition, the resulting R 2 values from these studies are not interpreted or poorly interpreted in terms of variability and contribution of exposure sources and the biological half-life of a compound. In this regard, the simulation study in this paper provides an important step towards interpreting the relative contribution of home-based exposure to human body burden for two compounds whose biological half-lives are significantly different (days versus years). Although these two compounds do not cover the full range of chemical substances, bracketing half lives allows us to quantify the  Table 1. Mean R 2 at a specific variability for four types of variability (coefficient of variation (CV) or standard deviation (s)) for two compounds with different half-lives (t 1/2 ). significance of source, measurement, and exposure pattern variability for disaggregating body burden. In particular, it shows that exposure variability and different contributions of exposure sources are more interconnected than commonly considered in many experimental studies. The work also brings to attention the need to understand the impact of a chemical half-life on the relationship between environmental exposures and biomonitoring data. The sensitivity of day-to-day variability of wipe concentrations and food exposures on the resulting R 2 values also points to the importance of understanding variability and contribution of exposure sources. Finally, future work includes computing the relative number of samples needed for various levels of confidence to disaggregate body burden for various types of compounds (half lives), environments, and exposure pathways. Despite the lack of experimental data, the simulated results provide key insights on the role of the variability and contribution of exposure sources and biological half-lives in quantifying a relationship between indoor exposure and human body burden. This approach will be useful for designing future exposure and epidemiologic studies that includes indoor environmental samples and biomonitoring data. Figure S1 Randomly selected example of auto-correlated wipe concentrations (top) and food exposures (bottom) from log-Gaussian random walk.