Tailoring Bayesian Additive Regression Trees (BART) for environmental mixture studies

Kaizong Ye; Zhen Chen; Shanshan Zhao

doi:10.1371/journal.pone.0348002

Abstract

Background

Various methods have been developed to investigate the complex and collective effects of environmental mixtures on human health. Tree ensemble methods, such as Bayesian Additive Regression Trees (BART), are known for their stability and accuracy in variable selection and outcome prediction for high-dimensional correlated data in the statistical literature, but their use has not been well studied for environmental mixtures.

Methods

We tailored the original BART model for environmental mixtures analysis to achieve both robust identification of toxic agents and accurate prediction of health outcomes. Our modified BART approach allowed for a smooth response surface and incorporated covariate adjustment for both continuous and binary outcomes. It supported both component-wise variable selection and hierarchical variable selection to accommodate scientifically meaningful groupings of chemicals. To facilitate interpretation, we used a Generalized Additive Model (GAM) approximation to quantify the marginal contributions of individual chemicals. The performance of the modified BART was evaluated through simulations and a case study with the National Health and Nutrition Examination Survey (NHANES) 2001–2002 data to examine the effects of persistent organic pollutants (POPs) on leukocyte telomere length. All results were compared with the Bayesian Kernel Machine Regression (BKMR), a widely used method in mixtures analysis.

Results

Our simulation studies demonstrated that the modified BART produced results comparable to or superior to BKMR in recovering the true exposure-response surface for both continuous and binary outcomes, with consistently above 0.7. Specifically, when chemical groups were considered, modified BART with hierarchical variable selection achieved higher (0.82–0.99 for continuous outcomes and 0.73–0.95 for binary outcomes) than BKMR (0.59–0.67 and 0.47–0.59, respectively), on independent test datasets. Modified BART also reduced the computational time by 70% to 99.8% compared to BKMR. Both methods effectively identified relevant chemical groups under hierarchical variable selection, but modified BART more effectively distinguished important components within groups. In the NHANES case study, three chemicals, including 2,3,4,7,8-pncdf, PCB126 and PCB169, were identified by modified BART as having near-linear positive effects on leukocyte telomere length based on GAM approximation plots.

Conclusions

Modified BART is a robust and scalable response surface model alternative to BKMR for analyzing environmental mixtures data. It is particularly advantageous for large datasets, binary outcomes, and grouped chemicals. GAM approximation provides practical insights into interpreting individual chemical effect estimated from complex response surface models.

Citation: Ye K, Chen Z, Zhao S (2026) Tailoring Bayesian Additive Regression Trees (BART) for environmental mixture studies. PLoS One 21(5): e0348002. https://doi.org/10.1371/journal.pone.0348002

Editor: Li Yang, Sichuan University, CHINA

Received: June 24, 2025; Accepted: April 8, 2026; Published: May 11, 2026

This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Data Availability: The date used for the case study can be found online at https://github.com/lizzyagibson/SHARP.Mixtures.Workshop. The modified BART package as well as the functions curated for our analyses can be accessed at https://github.com/Shiny1818/ModSoftBART_against_BKMR.

Funding: This study was supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences with grants ZIA ES103307 and ES103308 to SZ. Research of ZC was supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). The contributions of the NIH author(s) were made as part of their official duties as NIH federal employees, are in compliance with agency policy requirements, and are considered Works of the United States Government. However, the findings and conclusions presented in this paper are those of the author(s) and do not necessarily reflect the views of the NIH or the U.S. Department of Health and Human Services.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Humans are simultaneously exposed to a wide range of environmental chemicals through air, water, food, consumer products, and other pathways. Studying mixtures of these environmental exposures is crucial for advancing our understanding of both their individual and combined impacts on human health [1–3]. Key objectives of environmental mixtures studies include characterizing exposure patterns, identifying toxic agents within mixtures, and quantifying their individual and cumulative effects. It is also desirable to incorporate meaningful chemical grouping information into analysis, since chemicals within the same group often co-exist, share biological pathways, and respond similarly to interventions. However, environmental mixtures data are often of high-dimensional, highly correlated, and exhibit complex interactions and non-linear relationships with health outcomes. These features present major challenges for traditional regression methods. Therefore, it is important to develop specialized statistical methods to gain deeper insights into the health effects of environmental mixtures.

Various regression models have been used to explore the association between environmental mixtures and health outcomes, including regularized regressions, index models, and response surface models. Regularized regressions, such as elastic net regression [4] and LASSO (least absolute shrinkage and selection operator) [5], can be used to identify relevant chemicals through variable selection. Index models, such as weighted quantile sum regression (WQS) [6], quantile-based g-computation (qgcomp) [7], and partial linear single index model (PLSI) [8], use weighted indices of chemicals to explore their cumulative effects on health outcomes. Response surface models are a class of flexible models designed to explore the non-linear relationships between exposures and various outcomes in different research areas [9–16]. In particular, Bayesian Kernel Machine Regression (BKMR) [17,18] is one of the most popular approaches for mixtures analysis, primarily due to its model flexibility and visualization tools. It has been used in diverse contexts to investigate the health effects of air pollution [19,20], heavy metals [21], and endocrine disruptors [22], among others. However, kernel methods are generally computationally intensive, and BKMR can have convergence issues due to its reliance of Markov Chain Monte Carlo (MCMC), which can become particularly burdensome for high-dimensional or large datasets.

There is growing interest in the use of tree-based models in both statistical and biomedical researches [23–27], primarily due to their ability to recursively partition the predictor space [28] to improve estimation precision and capture complex non-linear relationships. However, individual trees can be unstable, motivating the development of ensemble methods that integrate multiple weak learners. Among these, Bayesian Additive Regression Trees (BART) model [29] uses MCMC sampling to approximate the posterior distribution over trees. Its variant, soft BART, extends BART by incorporating probabilistic splits and sparsity-inducing priors to generate smooth response surfaces for high-dimensional data [30,31]. These features make soft BART particularly suitable for mixtures analysis, where chemical effects tend to vary gradually rather than abruptly, and a small subset of exposures contributes most to the overall effect. BART also offers computational efficiency through its inherent feature selection and Bayesian backfitting MCMC algorithm [29], with fast prediction and robustness to missing data, outliers, and mixed data types [32]. Recent studies have leveraged BART to investigate health effects of multiple environmental exposures, highlighting its advantages in modeling complex interactions and cumulative risks from mixtures, whereas BKMR may face scalability issues [33–35].

In this study, we focus on tailoring the BART model for environmental mixture analysis to achieve both reliable identification of important chemicals and accurate prediction of health outcomes. Specifically, the modified BART accommodates both continuous and binary outcomes, allows for covariates adjustment, and incorporates biological grouping structures. Additionally, to facilitate interpretation of nonparametric response surface modeling results from both modified BART and BKMR, we introduce an lower-dimensional approximation technique based on the Generalized Additive Model (GAM) [36] to summarize the marginal effects of individual chemicals. We demonstrate the performance of the modified BART through extensive simulations and a case study using data from the National Health and Nutrition Examination Survey (NHANES) 2001–2002 cycle.

Methods

We denote the sample size as . For the () subject, we observe a health outcome , environmental exposures , and a set of covariates . We first focus on a continuous outcome to describe the original BART model and our proposed modified BART model, followed by an extension to a binary outcome.

Modified BART for environmental mixtures studies

The original BART model with a continuous outcome can be defined as

(1)

The flexible surface function is defined with an ensemble of trees as

and each tree is defined through as

Here ( is the tree index, is the total number of trees, denotes the decision tree topology structure, denotes all leaf nodes associated with , represents the predicted values at leaf , denotes all predicted values from the tree, and defines the association of to leaf node . Fig 1 illustrates a simple binary decision tree with predictors . The root node uses the splitting rule . If this condition is not satisfied, it proceeds to the right, reaching a leaf node with predicted value . Otherwise, it reaches an internal node with splitting rule . If this holds, the path descends left to the leaf node with predicted value ; otherwise it terminates at the leaf node with predicted value . Thus, we have , , , and defines the binary decision tree, which corresponds to a step function. BART further sums these binary decision trees to create a more complicated step function, which can approximate a non-linear surface.

Download:

Fig 1. Schematic plot of an example binary decision tree

with exposures

. Note: The predicted values of the leaf nodes are

, with the branch splitting rules as

,

, and

.

https://doi.org/10.1371/journal.pone.0348002.g001

To further improve the smoothness of the fitted response surface, Linero and Yang [30] proposed soft BART by replacing the indicator function with a soft decision rule, such as a logistic function , where is the cutpoint and is the bandwidth. Under this framework, within each tree, every subject is assigned to all leaves with probabilities determined by these logistic functions, and the estimate is an average of all the predicted values in these leaves weighted by the probabilities.

We make several modifications to soft BART to better suit environmental mixtures analysis, and name it the modified BART. Covariate adjustment is essential for epidemiological studies. Since these covariates are usually established risk factors, they need to remain in the model in fixed parametric forms rather than being combined with in the tree structure. Because the original soft BART model does not include a separate covariate term, we modify the model as

(2)

where is a vector of coefficients for the covariates , and is the transpose. These covariates can be continuous or binary. Here we include covariate effects linearly, but more complicated non-linear forms can be incorporated if desired.

This model is fitted via a Bayesian backfitting MCMC algorithm, similar to BART and Soft BART. Briefly, in each MCMC iteration, we first computes using the current tree structures, and fit a Bayesian linear regression of on to estimate . Then the algorithm cycles through the trees. For the ^th tree (, it first computes the partial residuals leaving it out as , and then run a Metropolis-Hastings step to propose a modification (e.g., grow, prune, or change-split). The acceptance ratio is calculated based on the likelihood ratio of observing these partial residuals given and , the ratio of observing and from the prior distribution, and the ratio of proposing versus proposing given . With soft BART, each subject is assigned to all the leaf nodes with logistic probabilities ; therefore, we integrate out the leaf parameters to compute a marginal likelihood used in the acceptance ratio calculation. If is accepted, the leaf parameters are updated using a Gaussian conjugate posterior; otherwise the parameters remain unchanged. The final step in each iteration updates other parameters, including the splitting probability for each exposure and variance parameters. This procedure is repeated many times to estimate model parameters, provide posterior inclusion probabilities (PIPs) for each exposure and predict outcomes.

Variable selection is accomplished through two mechanisms as in Linero and Yang [30]. First, we use a sparsity-inducing prior on the splitting probabilities at each exposure , and assume a Dirichlet prior . When is small, this prior encourages most splitting probabilities to be near zero and only a few exposures can be split. Second, we carefully choose the bandwidth parameter in the soft decision functions , where a smaller further restrict the splitting and reduces its influence on the leaf weights. These two mechanisms jointly promote sparse trees to achieve variable selection.

The modified BART also allows incorporation of scientifically meaningful chemical grouping information. Specifically, we adopt the variable grouping prior discussed in Linero and Yang [30] to capture the hierarchical relationships inherent in grouped exposure data. When exposures are partitioned into groups, with group containing components, the inclusion probability for the component in group is modeled as ( where is the group-level inclusion probability, and is the within-group conditional inclusion probability. We assume Dirichlet priors for both and to induce sparsity. This formulation yields variable selection at multiple levels, allowing identification of both important groups and important components within group.

For a binary outcome , we take a latent variable approach by assuming a continuous latent modeled as in Equation (2) and letting . In each MCMC iteration, given the current tree structures and parameters, we sample from a truncated normal distribution conditional on ,

where and are normal distributions truncated below and above 0, respectively. We then follow the same procedure above to update tree structures and parameters based on . After fitting, the continuous prediction of can be converted to as , where is the standard normal cumulative distribution function.

More technical details on the modified BART are provided in S1 File, and the method is implemented in the R package modBART (https://github.com/Shiny1818/ModSoftBART_against_BKMR).

Brief Overview of BKMR

In BKMR [17], in Equation (2) is modeled using a normal prior with mean 0 and a regularized Gaussian kernel defined as

where , is a regularization parameter, and are auxiliary parameters that controls the smoothness of . Inference is done through a Bayesian MCMC. Component-wise variable selection in BKMR is achieved by imposing a spike-and-slab prior on the parameters to identify important exposures within the mixture. When biological grouping information exists, BKMR can perform hierarchical variable selection with more technical details in Bobb et al.[17]. We used the R bkmr package (https://cran.r-project.org/web/packages/bkmr/index.html) to fit this model. For a binary outcome, a probit BKMR model [18] links a latent normal to the observed outcome through , similar to the modified BART model. When using the kmbayes function in the bkmr package to fit the probit BKMR, it is recommended to set to address issues with non-positive definite precision matrices, though this leads to much longer computational time.

Marginal approximation to facilitate interpretation of chemical individual effect

Both modified BART and BKMR are flexible response surface models that can capture complicated non-linear relationships between exposures and outcomes, but their fitted surfaces can be difficult to interpret. BKMR provides useful visualization tools to inspect the effects of various exposure combinations at pre-selected quantiles [17], but it does not give a single overall marginal effect for each exposure.

To enhance interpretability for both modified BART and BKMR, we introduce a generalized additive model (GAM) based low-dimensional approximation technique proposed by Woody et al.[36]. Let denote the estimated response surface from either modified BART or BKMR. We approximate this multivariate surface by a sum of individual response curves,

where are smooth functions (e.g., splines, kernels, tensor products) of exposure . In this study, we used thin plate regression splines for their practical performance and theoretical properties [37]. As discussed in Woody et al.[36], this approximation can be restricted to a subset of important exposures identified by modified BART or BKMR. This approximation decomposes a complex response surface into additive one-dimensional response curves to represent each exposure’s marginal effect averaged over the distribution of other exposures, an advantage over methods like partial dependence plots provided by BKMR that require fixing other exposures. Interactions between pre-specified exposures, such as , can also be included. Based on our experience, marginal effects and two-way interaction typically capture most variability for environmental mixtures data. We do not recommend higher order interactions because of interpretation difficulties and computational burden for GAMs. The effectiveness of this approximation is evaluated using an metric as outlined by Woody et al.[36], which measures how much variation in the posterior samples the approximation explains.

This approximation can be done through the gam function in R package mgcv (https://cran.r-project.org/web/packages/mgcv/index.html).

Simulation Settings

We conducted extensive simulation studies to evaluate the performance of modified BART in settings reflecting environmental mixtures analyses, in comparison with BKMR. For all simulations, we used total sample size and , equally divided into independent training and testing datasets (i.e., respectively). Simulations were repeated 500 times.

We generated exposures from , where is a covariance matrix structured as block-diagonal:

with , , , and representing the covariance matrices for four exposure groups of dimensions , , , and , respectively. Off-diagonal elements were in , in , in , and in to reflect moderate to high within group correlations. Exposures between groups were uncorrelated. We generated two covariates with and , with coefficients and as in Equation (2).

We investigated three distinct functions to generate a continuous outcome ,

where is a modified sigmoid function. This set of exposure-response functions capture non-linear main effects only (), linear main effects with interactions (), and non-linear main effects with interactions (). The modified sigmoid function was chosen due to its smoothness and wide use to model does-response relationship in toxicology [38]. Out of the 15 exposures, only 5 exposures ( in group 1 and in group 2) were assumed to be relevant to the outcome.

We fitted modified BART and BKMR on the training sets and obtained the predicted values and on both training and testing sets. The modified BART model was trained with 20 or 50 trees, while we also tried 100 and 200 trees which showed negligible performance gains. To evaluate model predictive performance, we regressed on the true , separately for the training and testing sets, and reported the average intercept, slope, standard error (SE) of the regression model, and across replications. We also reported MSE for in both training and testing sets. Variable selection accuracy was assessed through PIPs for each exposure under component-wise variable selection, and through group and conditional PIPs under hierarchical variable selection. To visualize marginal effects, we applied the GAM approximation to the fitted exposure-response surfaces from both models. As a reference, we provided true individual dose-response curves from by fixing other exposures at their quartiles.

We separately implemented a recent fast BKMR algorithm which uses random Fourier features to accelerate the Gaussian process in the original BKMR algorithm [39]. Fast BKMR with 20 and 200 basis functions were compared with modified BART and BKMR under the same simulation setting with . We also conducted a simulation with 50 exposures, with four groups of 15, 20, 10 and 5 exposures, to examine performance of modified BART in a higher dimensional setting with sparse signals. We kept the first two exposures in the first group and the first three within the second group to have effects on the outcome, while all the other simulations were kept the same. This introduced more irrelevant chemicals within each group. We reported the same metrics as previously discussed for these additional simulations.

For binary outcomes, we generated latent continuous variable from , and set , as discussed before. All other settings and fitting procedures matched the continuous case. To avoid the indefinite kernel issues in probit BKMR model, we set the argument to TRUE in the kmbayes function, as suggested by the BKMR authors. In a separate simulation, we used the default setting to explore the failure rate of kmbayes for binary outcomes. To access model predictiveness, we regressed on and reported average intercept, slope, standard error and across simulation replications, and additionally reported AUC for this binary outcome setting. For variable selection, component-wise and hierarchical variable selection PIPs were reported. We also compared the GAM approximations for both models, with the true individual response curve from as references.

All experiments for BKMR and modified BART were conducted in the same computational environment on a high-performance computing cluster with Intel Xeon Gold 6252 and Platinum 8276 CPUs, using up to 80 nodes on 224 CPUs. We reported overall computational time, including model fitting and prediction sampling, averaged across replications.

NHANES 2001–2002 data on the relationship between POPs and LTL

Gibson et al. [40] used data the 2001–2002 cycle National Health and Nutrition Examination Survey (NHANES) data to compare several mixtures analysis methods, including BKMR, for the relationship between leukocyte telomere length (LTL) and persistent organic pollutants (POPs). We used the same data and preprocessing procedure and restricted our analysis to individuals over the age of 20. We included 18 POPs with at least 60% of samples above the limit of detection. The 18 POPs exposures were categorized into three groups: non-dioxin-like polychlorinated biphenyls (non-dioxin-like PCBs), non-ortho PCBs, and toxic equivalent POPs (mPFD) [40,41]. The non-dioxin-like PCBs group included PCBs 74, 99, 138, 153, 170, 180, 187, and 194, the non-ortho PCBs group consisted of PCBs 126 and 169, and the mPFD group comprised of PCB 118, four dibenzo-furans (2,3,4,7,8-pncdf, 1,2,3,4,7,8-hxcdf, 1,2,3,6,7,8-hxcdf, 1,2,3,4,6,7,8-hxcdf), and three chlorinated dibenzo-p-dioxins (1,2,3,6,7,8-hxcdd, 1,2,3,4,6,7,8-hpcdd, 1,2,3,4,6,7,8,9-ocdd) [40]. Exposure values below the limits of detection (LOD) were imputed as . We adjusted for 13 covariates, including age, age², sex (male, female), race/ethnicity (non-Hispanic black, non-Hispanic, white, Mexican American, other), educational attainment (college or more, some college, high school graduate, less than high school), BMI (≥30, 25–29.9, < 25), serum cotinine, white blood cell count, percent lymphocytes, percent monocytes, percent neutrophils, percent eosinophils, and percent basophils [40]. In the final dataset, all 18 exposures, LTL, and serum cotinine were log-transformed and scaled. Covariates related to blood cell counts and distributions were also scaled. After excluding observations with incomplete exposure and covariate information, the final dataset comprised 1003 participants. The study was approved by the Institutional Review Board of the National Center for Health Statistics.

We fitted both modified BART and BKMR, with and without grouping information, to compare findings. For the modified BART, we trained with 20 and 50 trees and performed both component-wise and hierarchical variable selection. For BKMR, we used a 100-point knot matrix constructed from the chemical data to ensure adequate coverage of the input space as recommended by Gibson et al.[40]. For both models, we summarized the variable selection results with component-wise, group-level and within-group PIPs. Marginal chemical effects from both models were approximated and visualized using GAM plots. We also provided partial marginal effects estimates from BKMR by fixing other exposures at their 1^st, 2^nd and 3^rd quartiles for comparison.

Results

Simulation Results: Continuous Outcome

To assess prediction accuracy, we summarized the average intercept, slope, SE, and from regressing on true and MES for , from modified BART and BKMR across 500 replications. Results for component-wise variable selection are summarized in S1 Table, and results for hierarchical variable selection are in Table 1. Optimal performance is indicated by an intercept close to zero, a slope close to one, high , low SE, and low

Download:

Table 1. Simulation results for 15 exposures and a continuous outcome, with hierarchical variable selection for modified BART and BKMR.

https://doi.org/10.1371/journal.pone.0348002.t001

Under component-wise variable selection (S1 Table), BKMR performed slightly better than modified BART in nonlinear scenarios () with a small sample size (); however, the differences were minimal and became negligible as sample size increased. In linear settings (), both methods showed comparable performance, with results close to optimal.

Under hierarchical variable selection (Table 1), modified BART performed consistently better than BKMR in both training and testing datasets, with intercepts closer to zero, slopes closer to one, higher s, lower SEs and lower . Notably, modified BART approached near-optimal performance as sample size increased, whereas BKMR’s performance improved little. For instance, when , in both linear and nonlinear scenarios, the estimated slope and values for BKMR on the test sets remained around , whereas modified BART achieved for nonlinear settings and for linear settings. In addition, was substantially lower with modified BART (approximately 0.50 and 0.59 for and , 0.48 and 0.52 for , in training and testing sets, respectively) than with BKMR (approximately 1.0 and 1.1 for and , 0.79 and 0.81 for , respectively).

Modified BART also demonstrated better scalability in all simulations, especially under hierarchical variable selection, in which it ran 15–80 times faster than BKMR for and . For example, with 1000, modified BART took about 4 minutes with 20 trees and under 10 minutes with 50 trees averaged across 500 replicates, while BKMR took 276–352 minutes across the three (.) functions. Moreover, the number of trees (modBART-20 vs. modBART-50) in modified BART had minimal impact on results.

We also reported average PIPs to assess variable selection accuracy, for both the component-wise and hierarchical selections, with and across 500 replications in Table 2. Under component-wise variable selection, both methods accurately identifying important exposures, with similar PIPs (above 0.99 for all relevant exposures, and below 0.1 for all irrelevant exposures). Under hierarchical variable selection, both methods correctly assigned group PIPs near 1 to group 1 and 2, since from group 1 and from group 2 affected the outcome. Larger differences were observed in conditional PIPs within the relevant groups. Modified BART assigned PIPs close to 1 for important exposures and below 0.28 for unimportant ones, while BKMR tended to distribute conditional probabilities more uniformly across important exposures (around 0.5 for group 1 and around 0.33 for group 2) and 0 for unimportant exposures. PIPs for and are provided in S4 and S5 Tables, respectively. Overall patterns were similar to : both models successfully identified important exposures under component-wise variable selection and important groups under hierarchical variable selection. However, with component-wise variable selection, modified BART tended to assign slightly higher PIPs to some unimportant exposures. Under hierarchical variable selection, BKMR yielded conditional PIPs near zero or exactly zero for important exposures and , selecting only as important in group 2. This likely reflects differences in priors, with modified BART employs prior which allows equal chance of selection, whereas BKMR effectively favors only a single component from each selected group per iteration.

Download:

Table 2. Average PIPs for 15 exposures and a continuous outcome, with both component-wise and hierarchical variable selection for modified BART and BKMR. The true relationship is a non-linear main effect only model

, with

.

https://doi.org/10.1371/journal.pone.0348002.t002

Average marginal effects of individual exposures through GAM approximation for and with are displayed in S1-1 (), S1-2 () and S1-3 () Figures for component-wise variable selection, and in Fig 2 (), S2-1 () and S2-2 () Figures for hierarchical variable selection. With component-wise variable selection, all marginal effects were approximated well with both methods, recovering the corresponding linear or non-linear relationships from the true model. For reference, we also overlaid the true marginal effects for these exposures by fixing the other exposures at their quartiles. The approximated marginal effects from both models generally followed the overall trends of the reference curves, which is consistent with interpretation as averages over these curves. In Fig 2, with the true relationship as a non-linear main effects only model and hierarchical variable selection, the marginal effects from modified BART and BKMR differed slightly, particularly in the tails, consistent with the differences in Table 1. However, both methods correctly captured the non-linear marginal effects for these exposures under . However, consistent with the and hierarchical variable selection PIPs in S4 and S5 Tables, GAM plots showed that BKMR estimated near-zero marginal effects for across the entire range under and , while modified BART indicated clear effects (S2-1 and S2-2 Figs).

Download:

Fig 2. Average marginal effects for exposures

,

and

, in simulations with 15 exposures and a continuous outcome, using hierarchical variable selection for modified BART with 20 trees and BKMR. The true relationship is a non-linear main effects only model

, with

. Note: All simulations were replicated 500 times. The reference lines are true effects of each exposure by fixing all other exposures at their quartiles.

https://doi.org/10.1371/journal.pone.0348002.g002

Comparison results for modified BART, BKMR and fast BKMR with under component-wise variable selection are shown in S8 Table. With 20 basis functions, computation time was reduced to a level comparable to modified BART, but prediction accuracy was substantially worse, with estimated slope < 0.5 and R² < 0.6. Increasing the number of basis functions to 50, 100 and 200 increased computation time without improving prediction accuracy. Therefore, we did not pursue this approach further.

Hierarchical variable selection results with 50 exposures and are reported in S9 Table. Modified BART maintained superior prediction and variable selection accuracy, and computational scalability relative to BKMR.

Simulation Results: Binary Outcome

Results for binary outcomes with both component-wise variable selection (S2 Table) and hierarchical variable selection (Table 3) showed patterns similar to those for continuous outcomes. Under component-wise variable selection, BKMR yielded slopes slightly closer to 1 and larger for nonlinear and , while the results were similar with . However, the computational advantages of modified BART were even more pronounced. For example, when , run time was only about 2 minutes for modified BART with 20 trees, while BKMR required 106–136 minutes. When sample size increased to , BKMR became impractical with run time between 750 and 1020 minutes, while modified BART remained under 4 minutes. We also evaluated the default in kmbayes function. BKMR failed in more than 50% of replications at and in all replications at and (S3 Table).

Download:

Table 3. Simulation results for 15 exposures and a binary outcome, with hierarchical variable selection for modified probit BART and probit BKMR.

https://doi.org/10.1371/journal.pone.0348002.t003

With hierarchical variable selection, modified BART consistently outperformed BKMR across all settings, particularly at larger sample sizes. Slopes for modified BART were mostly around 0.9 in both training and testing datasets, while they were around 0.5 for BKMR. AUC values were also consistently higher for modified BART (0.82–0.93) than for BKMR (0.77–0.87). Computational time again was substantially shorter for modified BART and increase slowly with sample size. For example, when , run time was under 3 minutes for modified BART with 20 trees, while BKMR required more than 150 minutes. With , BKMR required over 1300 minutes, whereas modified BART remained under 5 minutes.

PIPs for with (Table 4) further highlight the similarity of the two methods in identifying key exposures under component-wise variable selection and important chemical groups under hierarchical variable selection. However, modified BART showed better discrimination within groups, assigning conditional PIPs above 0.97 for all relevant exposures. Although it also assigned conditional PIP around 0.55 to two irrelevant exposures within group 1 and 2, likely due to the randomness induced by the prior, these were readily distinguishable from the truly important exposures with PIPs near 1. In contrast, BKMR tended to distribute importance across all relevant exposures, making it less straightforward to distinguish relevant from irrelevant exposures. PIPs for other settings are provided in the S6 and S7 Tables with similar patterns as the continuous settings.

Download:

Table 4. Average PIPs for 15 exposures and a binary outcome, with both component-wise and hierarchical variable selection for modified probit BART and probit BKMR. The true relationship is a non-linear main effect only model

, with

.

https://doi.org/10.1371/journal.pone.0348002.t004

Fig 3 shows GAM approximations of marginal effects for selected exposures and under with hierarchical variable selection. Both methods recovered the general shapes of the true partial effects similarly for and , but modified BART more closely followed the nonlinear trends for across quartiles, whereas BKMR smoothed over variation in these regions and produced flatter estimates. Results for other settings are provided in the Supplemental S3-1 to S3-3 for component-wise variable selection ( to ) and S4 -1 () and S4 -2 () Figures for hierarchical variable selection. Consistent with findings above, marginal effects from both models were largely comparable under component-wise variable selection, aside from some divergence in the tails. Under hierarchical variable selection, however, more substantial differences emerged in both shape and magnitude of the marginal effects, with the modified BART more closely matching the reference curves than BKMR. This is most apparently in S4-1 F ig under the linear function , where the marginal effect of from modified BART was nearly parallel to the three reference lines as expected, while BKMR showed a smaller slope.

Download:

Fig 3. Average marginal effects for exposures

,

and

, in simulations with 15 exposures and a binary outcome, using hierarchical variable selection for modified probit BART with 20 trees and probit BKMR. The true relationship is a non-linear main effects only model

, with

. Note: All simulations were replicated 500 times. The reference lines are true effects of each exposure by fixing all other exposures at their quartiles.

https://doi.org/10.1371/journal.pone.0348002.g003

NHANES data analysis results

We applied both modified BART and BKMR to the NHANES 2001–2002 data, to assess associations between 18 POPs and the outcome of interest, log-LTL. Results are summarized in Table 5.

Download:

Table 5. PIPs for 18 POPs and log-LTL from the NHANES 2001-2002 data, with both component-wise and hierarchical variable selection for modified BART and BKMR.

https://doi.org/10.1371/journal.pone.0348002.t005

When grouping was ignored, both models general agreed across chemicals, assigning similar component PIPs. The most important chemical was 2,3,4,7,8-pncdf (PIPs = 0.708 for modified BART and 0.830 for BKMR), with PCB126 as the next most important (PIPs = 0.398 for modified BART and 0.463 for BKMR).

When grouping was incorporated, findings were consistent with Gibson et al.[40]. Group-level PIPs were broadly consistent across methods, with both identifying the mPFD group as the most important (group PIPs = 0.804 for modified BART and 0.863 for BKMR), followed by the non-ortho PCBs (group PIPs = 0.774 for modified BART and 0.677 for BKMR). Within the mPFD group, 2,3,4,7,8-pncdf was identified as the most important chemical (conditional PIPs = 0.715 for modified BART and 0.876 for BKMR). Both methods also identified PCB126 within the non-ortho PCBs group as an important exposure (conditional PIPs = 0.837 for modified BART and 0.650 for BKMR). Modified BART additionally assigned a higher conditional PIP = 0.671 to PCB169. Overall, the conditional PIP patterns were consistent with the simulation studies, where BKMR tended to favor the most important exposures within significant groups and modified BART produced more discriminative conditional PIPs for each exposure. Notably, conditional PIPs for PCB118 differed substantially (0.489 for modified BART vs. 0.052 for BKMR). As noted in Gibson et al. [40], PCB118 is a key mono-ortho dioxin-like PCBs whose toxicity differs from the other furans and dioxins in the mPFD group. In this analysis, we decided to retain the same grouping definition to align with Gibson et al. [40]. Despite this imperfect grouping, modified BART assigned PCB118 a much higher conditional PIP than BKMR, which may be desirable in practice.

Marginal effect estimates through GAM approximation with hierarchical variable selection are shown in Fig 4. The reference lines represent partial dependence curves derived from BKMR’s fitted results, without centering of the sampled posterior estimates. Most exposures showed approximately linear relationship, consistent with Gibson et al. [40]. For the most important exposure 2,3,4,7,8-pncdf, both modified BART and BKMR showed near-linear trajectories, although modified BART suggested a smaller effect size. For PCB 126 and PCB169, modified BART and BKMR produced similar marginal effect curves. Results from component-wise variable selection are provided in Supplementary S5 Fig. Most exposures showed similar curves, with minor tail divergences in some cases. For 2,3,4,7,8-pncdf, the overall patterns were consistent with the hierarchical variable selection results, but differences in marginal effect magnitudes were less pronounced.

Download:

Fig 4. Marginal effects of 18 POPs on log-LTL from the NHANES 2001-2002 data, using hierarchical variable selection for modified BART with 20 trees and BKMR.

Note: All chemicals were log-transformed and scaled. The reference lines are partial dependency curves from BKMR by fixing all other exposures at their quartiles.

https://doi.org/10.1371/journal.pone.0348002.g004

Conclusion and discussion

In this paper, we introduce and extend BART, a Bayesian tree ensemble method, to enhance the analytical toolkit for environmental chemical mixtures analysis. While the technical framework underlying our approach has been discussed in the statistical literature, our modified BART model allows for smooth exposure-response surfaces, adjustment for covariates, and hierarchical variable selection for grouped exposures. This extension enhances the capacity of BART to handle the complex structure of chemical mixtures, which is especially important in environmental epidemiology, where exposures often co-occur and interact in ways that challenge conventional models.

A major benefit of our approach is its ability to model group-level and component-level sparsity simultaneously. Unlike BKMR, which assumes equal probabilities for exposure inclusion within a group, our use of Dirichlet priors allows flexible group structures and improved differentiation of important mixture components. This is crucial for mixture studies, where identifying the “bad actor” within a mixture is essential for informing potential public health interventions. Our model structure avoids dilution of important signals and leads to clearer identification of key contributors to health outcomes. Across both continuous and binary outcomes, performance evaluations consistently demonstrated that modified BART outperformed BKMR in the context of hierarchical variable selection, and BKMR tended to underestimate conditional PIPs for exposures, potentially leading to unreliable conclusions. Moreover, the modified BART model offers notable advantages over BKMR in computational efficiency. By leveraging efficient C++ implementation, our model substantially reduces computation time, making it more practical for analyzing large-scale exposure data.

A further advantage of our approach is its robustness for binary outcomes, which are frequently encountered in environmental and public health research (e.g., presence or absence of disease). The default probit BKMR algorithm often fails with binary outcomes, especially with moderate and large sample sizes. Alternative strategies to avoid such failure can lead to degraded performance and substantially increased computation time. In comparison, our modified probit BART model maintains stable performance and reasonable computation time, highlighting its reliability in real-world mixtures applications.

Another contribution of our proposed framework is the integration of a low-dimensional approximation via GAM fitting. This technique improves the interpretability of accurate but complicated nonparametric response-surface models, including modified BART and BKMR, and makes it easier to understand marginal effects of individual exposures. This is particularly valuable in communicating results to policymakers and stakeholders in public health.

Although there are concerns regarding overfitting with machine learning methods, BART-based methods are specifically designed to mitigate overfitting and limit model complexity through strong Bayesian regularization. In particular, BART imposes priors that heavily penalize deep trees, ensuring that individual trees are shallow and contribute weakly to the overall fit. It also applies shrinkage to terminal node parameters to reduce overfitting within each tree. In addition, BART averages over a large number of trees, which reduces variance and stabilizes estimates. Together, these built-in mechanisms substantially reduce the risk of overfitting. We assessed predictive performance of modified BART with independent testing datasets in the simulations, and the results showed no evidence of overfitting.

Several algorithms have been proposed to mitigate the computational burden of the original BKMR, including fast BKMR [39], which uses a Fourier approximation to the kernels and Hamiltonian Monte Carlo for posterior sampling, and a variational inference algorithm [42]. We applied fast BKMR to our simulation settings with (S8 Table). With 20 basis functions, computation time was reduced to a level comparable to modified BART, but prediction accuracy was substaintially worse, with estimated slope < 0.5 and R² < 0.6. Increasing the number of basis functions to 50, 100 and 200 increased computation time without improving prediction accuracy. Although fast BKMR is a promising approach for reducing computational burden, the accuracy of the Fourier approximation depends on several factors, including correlations among exposures, number of exposures, sample size and sparsity level. In addition, variable selection is not available in the current fast BKMR implementation. For the variation inference algorithm of BKMR, we were unable to obtain code and therefore did not include it in comparisons. We consider both approaches to have the potential for improving BKMR scalability, and would like to revisit comparisons when robust implementations become available with desirable features, such as variable selection and multiple outcome types.

For many environmental studies, it is important to adjust for confounders. However, we did not include confounder adjustment in the current manuscript, because BART focuses on variable selection and prediction and is not inherently a causal model. Although BART can be used to establish causal relationships, this requires working explicitly within a causal inference framework, which can be particularly complicated in the mixtures setting and is beyond the scope of this manuscript. It is not easy to establish a Directed Acyclic Graph (DAG) to provide a basis for confounder identification. Multiple exposures may influence the outcome through different biological pathways, and each pathway may be confounded by a distinct set of confounders. These confounder sets may overlap, but their effects can differ substantially across pathways. Whether confounders have overall effects on the outcome depends on which exposures and pathways are important, which cannot be decided prior to fitting BART. In additionally, including all potential confounders into BART could dramatically increase the number of input variables, resulting in unstable performance and heavy computational burden. Moreover, because exposures and potentially confounders would be modeled simultaneously, exposure effects could be incorrectly attributed to confounders, which is undesirable given that a primary goal of mixtures analysis is to identify toxic agents. Hahn et al. (2020) [43] also noted that regularized prediction models generally do not perform well when confounding is present, and instead proposed Bayesian Causal Forest (BCF), a BART-based causal inference method that allows clear separation of exposure effects and confounding effects. We are interested in extending such methods to mixtures analysis with explicit confounder adjustment.

In summary, we developed a modified BART model for estimating complex exposure-response relationships in environmental mixtures with covariate adjustment, improved handling of grouped exposures, greater computational efficiency, and robust performance across outcome types. Modified BART is a practical alternative to the BKMR, particularly with binary outcomes, hierarchical variable selection and larger sample sizes. To aid interpretation, we introduce a complementary post hoc approach using low-dimensional approximation via GAM fitting, which clarifies individual exposure effects without altering the core model. Future work will explore extending the model to survival outcomes and confounder adjustment to broaden its applicability in public health research.

Supporting information

S1 File. Details of modified BART.

https://doi.org/10.1371/journal.pone.0348002.s001

(DOCX)

S1 Table. Simulation results for 15 exposures and a continuous outcome, with component-wise variable selection for modified BART and BKMR.

https://doi.org/10.1371/journal.pone.0348002.s002

(DOCX)

S2 Table. Simulation results for 15 exposures and a binary outcome, with component-wise variable selection for modified probit BART and probit BKMR.

https://doi.org/10.1371/journal.pone.0348002.s003

(DOCX)

S3 Table. Simulation results for 15 exposures and a binary outcome with component-wise variable selection for modified probit BART and default probit BKMR with .

https://doi.org/10.1371/journal.pone.0348002.s004

(DOCX)

S4 Table. Average PIPs for 15 exposures and a continuous outcome, with both component-wise and hierarchical variable selection for modified probit BART and probit BKMR.

The true relationship is a linear main effects with interactions model , with .

https://doi.org/10.1371/journal.pone.0348002.s005

(DOCX)

S5 Table. Average PIPs for 15 exposures and a continuous outcome, with both component-wise and hierarchical variable selection for modified probit BART and probit BKMR.

The true relationship is a non-linear main effects and interactions model , with .

https://doi.org/10.1371/journal.pone.0348002.s006

(DOCX)

S6 Table. Average PIPs for 15 exposures and a binary outcome, with both component-wise and hierarchical variable selection for modified probit BART and probit BKMR.

The true relationship is a linear main effects and interactions model , with .

https://doi.org/10.1371/journal.pone.0348002.s007

(DOCX)

S7 Table. Average PIPs for 15 exposures and a binary outcome, with both component-wise and hierarchical variable selection for modified probit BART and probit BKMR.

The true relationship is a non-linear main effects and interactions model , with .

https://doi.org/10.1371/journal.pone.0348002.s008

(DOCX)

S8 Table. Simulation results for 15 exposures and a continuous outcome, with component-wise variable selection for modified BART, BKMR and fast BKMR.

https://doi.org/10.1371/journal.pone.0348002.s009

(DOCX)

S9 Table. Simulation results for 50 exposures and a continuous outcome, with hierarchical variable selection for modified BART and BKMR.

https://doi.org/10.1371/journal.pone.0348002.s010

(DOCX)

S1-1 Fig. Average marginal effects for exposures. , and , in simulations with 15 exposures and a continuous outcome, using component-wise variable selection for modified BART with 20 trees and BKMR.

The true relationship is a non-linear main effects only model , with .

https://doi.org/10.1371/journal.pone.0348002.s011

(DOCX)

S1-2 Fig. Average marginal effects for exposures , and , in simulations with 15 exposures and a continuous outcome, using component-wise variable selection for modified BART with 20 trees and BKMR.

The true relationship is a linear main effects and interactions model , with .

https://doi.org/10.1371/journal.pone.0348002.s012

(DOCX)

S1-3 Fig. Average marginal effects for exposures , and , in simulations with 15 exposures and a continuous outcome, using component-wise variable selection for modified BART with 20 trees and BKMR.

The true relationship is a non-linear main effects and interactions model , with.

https://doi.org/10.1371/journal.pone.0348002.s013

(DOCX)

S2-1 Fig. Average marginal effects for exposures , and , in simulations with 15 exposures and a continuous outcome, using hierarchical variable selection for modified BART with 20 trees and BKMR.

The true relationship is a linear main effects and interactions model , with .

https://doi.org/10.1371/journal.pone.0348002.s014

(DOCX)

S2-2 Fig. Average marginal effects for exposures , and , in simulations with 15 exposures and a continuous outcome, using hierarchical variable selection for modified BART with 20 trees and BKMR.

The true relationship is a linear main effects and interactions model , with .

https://doi.org/10.1371/journal.pone.0348002.s015

(DOCX)

S3-1 Fig. Average marginal effects for exposures , and , in simulations with 15 exposures and a binary outcome, using component-wise variable selection for modified probit BART with 20 trees and probit BKMR.

The true relationship is a non-linear main effects only model , with .

https://doi.org/10.1371/journal.pone.0348002.s016

(DOCX)

S3-2 Fig. Average marginal effects for exposures , and , in simulations with 15 exposures and a binary outcome, using component-wise variable selection for modified probit BART with 20 trees and probit BKMR.

The true relationship is a linear main effects and interactions model , with .

https://doi.org/10.1371/journal.pone.0348002.s017

(DOCX)

S3-3 Fig. Average marginal effects for exposures , and , in simulations with 15 exposures and a binary outcome, using component-wise variable selection for modified probit BART with 20 trees and probit BKMR.

The true relationship is a non-linear main effects and interactions model , with .

https://doi.org/10.1371/journal.pone.0348002.s018

(DOCX)

S4-1 Fig. Average marginal effects for exposures , and , in simulations with 15 exposures and a binary outcome, using hierarchical variable selection for modified probit BART with 20 trees and probit BKMR.

The true relationship is a linear main effects and interactions only model , with .

https://doi.org/10.1371/journal.pone.0348002.s019

(DOCX)

S4-2 Fig. Average marginal effects for exposures , and , in simulations with 15 exposures and a binary outcome, using hierarchical variable selection for modified probit BART with 20 trees and probit BKMR.

The true relationship is a non-linear main effects and interactions model , with .

https://doi.org/10.1371/journal.pone.0348002.s020

(DOCX)

S5 Fig. Marginal effects of 18 POPs on log-LTL from the NHANES 2001–2002 data, using component-wise variable selection for modified BART with 20 trees and BKMR.

https://doi.org/10.1371/journal.pone.0348002.s021

(DOCX)

Acknowledgments

We thank Dr. Daniel Zilber for his valuable assistance with revising the manuscript.

References

1. Hamra GB, Buckley JP. Environmental exposure mixtures: questions and methods to address them. Curr Epidemiol Rep. 2018;5(2):160–5. pmid:30643709
- View Article
- PubMed/NCBI
- Google Scholar
2. Joubert BR, Kioumourtzoglou M-A, Chamberlain T, Chen HY, Gennings C, Turyk ME, et al. Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME) Program: Novel and Expanded Statistical Methods. Int J Environ Res Public Health. 2022;19(3):1378. pmid:35162394
- View Article
- PubMed/NCBI
- Google Scholar
3. Braun JM, Gennings C, Hauser R, Webster TF. What can epidemiological studies tell us about the impact of chemical mixtures on human health? Environmental Health Perspectives. 2016;124(1):A6–9.
- View Article
- Google Scholar
4. Zou H, Hastie T. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2005;67(2):301–20.
- View Article
- Google Scholar
5. Tibshirani R. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1996;58(1):267–88.
- View Article
- Google Scholar
6. Carrico C, Gennings C, Wheeler DC, Factor-Litvak P. Characterization of Weighted Quantile Sum Regression for Highly Correlated Data in a Risk Analysis Setting. J Agric Biol Environ Stat. 2015;20(1):100–20. pmid:30505142
- View Article
- PubMed/NCBI
- Google Scholar
7. Keil AP, Buckley JP, O’Brien KM, Ferguson KK, Zhao S, White AJ. A Quantile-Based g-Computation Approach to Addressing the Effects of Exposure Mixtures. Environ Health Perspect. 2020;128(4):47004. pmid:32255670
- View Article
- PubMed/NCBI
- Google Scholar
8. Wang Y, Wu Y, Jacobson MH, Lee M, Jin P, Trasande L, et al. A family of partial-linear single-index models for analyzing complex environmental exposures with continuous, categorical, time-to-event, and longitudinal health outcomes. Environ Health. 2020;19(1):96. pmid:32912175
- View Article
- PubMed/NCBI
- Google Scholar
9. Abdul Basit M, Imran M, Khan SA, Alhushaybari A, Sadat R, Ali MR. Partial differential equations modeling of bio-convective sutterby nanofluid flow through paraboloid surface. Sci Rep. 2023;13(1):6152. pmid:37061555
- View Article
- PubMed/NCBI
- Google Scholar
10. Basit MA, Imran M, Akgül A, Khan Hassani M, Alhushaybari A. Mathematical analysis of heat and mass transfer efficiency of bioconvective Casson nanofluid flow through conical gap among the rotating surfaces under the influences of thermal radiation and activation energy. Results in Physics. 2024;63:107863.
- View Article
- Google Scholar
11. Basit MA, Imran M, Mohammed WW, Ali MR, Hendy AS. Thermal analysis of mathematical model of heat and mass transfer through bioconvective Carreau nanofluid flow over an inclined stretchable cylinder. Case Studies in Thermal Engineering. 2024;63:105303.
- View Article
- Google Scholar
12. Basit MA, Imran M, Safdar R, Tahir M, Ali MR, Hendy AS, et al. Thermally radiative bioconvective nanofluid flow on a wavy cylinder with buongiorno model: A sensitivity analysis using response surface methodology. Case Studies in Thermal Engineering. 2024;55:104178.
- View Article
- Google Scholar
13. Imran M, Basit MA, Yasmin S, Khan SA, Elagan SK, Akgül A, et al. A proceeding to numerical study of mathematical model of bioconvective Maxwell nanofluid flow through a porous stretching surface with nield/convective boundary constraints. Sci Rep. 2024;14(1):1873. pmid:38253571
- View Article
- PubMed/NCBI
- Google Scholar
14. Amar E, Popov V, Sharma VM, Andreev Batat S, Halperin D, Eliaz N. Response Surface Methodology (RSM) Approach for Optimizing the Processing Parameters of 316L SS in Directed Energy Deposition. Materials (Basel). 2023;16(23):7253. pmid:38067997
- View Article
- PubMed/NCBI
- Google Scholar
15. Qu C, Yu S, Luo L, Zhao Y, Huang Y. Optimization of ultrasonic extraction of polysaccharides from Ziziphus jujuba Mill. by response surface methodology. Chem Cent J. 2013;7(1):160. pmid:24059696
- View Article
- PubMed/NCBI
- Google Scholar
16. Louhıchı G, Bousselmı L, Ghrabı A, Khounı I. Process optimization via response surface methodology in the physico-chemical treatment of vegetable oil refinery wastewater. Environ Sci Pollut Res Int. 2019;26(19):18993–9011. pmid:29987464
- View Article
- PubMed/NCBI
- Google Scholar
17. Bobb JF, Valeri L, Claus Henn B, Christiani DC, Wright RO, Mazumdar M, et al. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics. 2015;16(3):493–508. pmid:25532525
- View Article
- PubMed/NCBI
- Google Scholar
18. Bobb JF, Claus Henn B, Valeri L, Coull BA. Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. Environ Health. 2018;17(1):67. pmid:30126431
- View Article
- PubMed/NCBI
- Google Scholar
19. Li H, Deng W, Small R, Schwartz J, Liu J, Shi L. Health effects of air pollutant mixtures on overall mortality among the elderly population using Bayesian kernel machine regression (BKMR). Chemosphere. 2022;286(Pt 1):131566. pmid:34293557
- View Article
- PubMed/NCBI
- Google Scholar
20. Zhao N, Smargiassi A, Hudson M, Fritzler MJ, Bernatsky S. Investigating associations between anti-nuclear antibody positivity and combined long-term exposures to NO2, O3, and PM2.5 using a Bayesian kernel machine regression approach. Environ Int. 2020;136:105472. pmid:31991236
- View Article
- PubMed/NCBI
- Google Scholar
21. Kupsco A, Kioumourtzoglou M-A, Just AC, Amarasiriwardena C, Estrada-Gutierrez G, Cantoral A, et al. Prenatal Metal Concentrations and Childhood Cardiometabolic Risk Using Bayesian Kernel Machine Regression to Assess Mixture and Interaction Effects. Epidemiology. 2019;30(2):263–73. pmid:30720588
- View Article
- PubMed/NCBI
- Google Scholar
22. Frenoy P, Perduca V, Cano-Sancho G, Antignac J-P, Severi G, Mancini FR. Application of two statistical approaches (Bayesian Kernel Machine Regression and Principal Component Regression) to assess breast cancer risk in association to exposure to mixtures of brominated flame retardants and per- and polyfluorinated alkylated substances in the E3N cohort. Environ Health. 2022;21(1):27. pmid:35216589
- View Article
- PubMed/NCBI
- Google Scholar
23. Banerjee M, George J, Song EY, Roy A, Hryniuk W. Tree-based model for breast cancer prognostication. J Clin Oncol. 2004;22(13):2567–75. pmid:15226324
- View Article
- PubMed/NCBI
- Google Scholar
24. Banerjee M, Noone A. Tree‐Based Methods for Survival Data. Wiley Series in Probability and Statistics. Wiley. 2007. 265–85.
- View Article
- Google Scholar
25. Hu L, Li L. Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series. Int J Environ Res Public Health. 2022;19(23):16080. pmid:36498153
- View Article
- PubMed/NCBI
- Google Scholar
26. Grinsztajn L, Oyallon E, Varoquaux G. Why Do Tree-Based Models Still Outperform Deep Learning on Typical Tabular Data? In: Advances in Neural Information Processing Systems 35, 2022. 507–20.
- View Article
- Google Scholar
27. Kern C, Klausch T, Kreuter F. Tree-based Machine Learning Methods for Survey Research. Surv Res Methods. 2019;13(1):73–93. pmid:32802211
- View Article
- PubMed/NCBI
- Google Scholar
28. Linero AR. A review of tree-based Bayesian methods. CSAM. 2017;24(6):543–59.
- View Article
- Google Scholar
29. Chipman HA, George EI, McCulloch RE. BART: Bayesian additive regression trees. Ann Appl Stat. 2010;4(1).
- View Article
- Google Scholar
30. Linero AR, Yang Y. Bayesian Regression Tree Ensembles that Adapt to Smoothness and Sparsity. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2018;80(5):1087–110.
- View Article
- Google Scholar
31. Linero AR. SoftBart: soft Bayesian additive regression trees. In: 2022. https://arxiv.org/abs/2210.16375
- View Article
- Google Scholar
32. Malehi AS, Jahangiri M, Vizureanu P. Classic and bayesian tree-based methods. Enhanced Expert Systems. 2019:27–51.
33. Park SK, Zhao Z, Mukherjee B. Construction of environmental risk score beyond standard linear models using machine learning methods: application to metal mixtures, oxidative stress and cardiovascular disease in NHANES. Environ Health. 2017;16(1):102. pmid:28950902
- View Article
- PubMed/NCBI
- Google Scholar
34. Vuong AM, Xie C, Jandarov R, Dietrich KN, Zhang H, Sjödin A, et al. Prenatal exposure to a mixture of persistent organic pollutants (POPs) and child reading skills at school age. Int J Hyg Environ Health. 2020;228:113527. pmid:32521479
- View Article
- PubMed/NCBI
- Google Scholar
35. Zhang T, Geng G, Liu Y, Chang HH. Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM2.5 Components. Atmosphere (Basel). 2020;11(11):1233. pmid:34322279
- View Article
- PubMed/NCBI
- Google Scholar
36. Woody S, Carvalho CM, Murray JS. Model Interpretation Through Lower-Dimensional Posterior Summarization. Journal of Computational and Graphical Statistics. 2020;30(1):144–61.
- View Article
- Google Scholar
37. Wood SN. Generalized additive models: an introduction with R. 2017.
38. Ritz C, Baty F, Streibig JC, Gerhard D. Dose-Response Analysis Using R. PLOS ONE. 2016;10(12):e0146021.
- View Article
- Google Scholar
39. Zhang D, Eick SM, Chang HH. Approximate Bayesian Kernel Machine Regression via Random Fourier Features for Estimating Joint Health Effects of Multiple Exposures. In: 2025. https://arxiv.org/abs/1411.4000
- View Article
- Google Scholar
40. Gibson EA, Nunez Y, Abuawad A, Zota AR, Renzetti S, Devick KL, et al. An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length. Environ Health. 2019;18(1):76. pmid:31462251
- View Article
- PubMed/NCBI
- Google Scholar
41. Mitro SD, Birnbaum LS, Needham BL, Zota AR. Cross-sectional associations between exposure to persistent organic pollutants and leukocyte telomere length among U.S. adults in NHANES, 2001-2002. Environ Health Perspect. 2016;124(5):651–8.
- View Article
- Google Scholar
42. Small R, Coull BA. A variational inference algorithm for BKMR in the cross-sectional setting. arXiv preprint. 2018.
- View Article
- Google Scholar
43. Hahn PR, Murray JS, Carvalho CM. Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion). Bayesian Anal. 2020;15(3).
- View Article
- Google Scholar

[ref1] 1. Hamra GB, Buckley JP. Environmental exposure mixtures: questions and methods to address them. Curr Epidemiol Rep. 2018;5(2):160–5. pmid:30643709
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Joubert BR, Kioumourtzoglou M-A, Chamberlain T, Chen HY, Gennings C, Turyk ME, et al. Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME) Program: Novel and Expanded Statistical Methods. Int J Environ Res Public Health. 2022;19(3):1378. pmid:35162394
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Braun JM, Gennings C, Hauser R, Webster TF. What can epidemiological studies tell us about the impact of chemical mixtures on human health? Environmental Health Perspectives. 2016;124(1):A6–9.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref4] 4. Zou H, Hastie T. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2005;67(2):301–20.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref5] 5. Tibshirani R. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1996;58(1):267–88.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref6] 6. Carrico C, Gennings C, Wheeler DC, Factor-Litvak P. Characterization of Weighted Quantile Sum Regression for Highly Correlated Data in a Risk Analysis Setting. J Agric Biol Environ Stat. 2015;20(1):100–20. pmid:30505142
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref7] 7. Keil AP, Buckley JP, O’Brien KM, Ferguson KK, Zhao S, White AJ. A Quantile-Based g-Computation Approach to Addressing the Effects of Exposure Mixtures. Environ Health Perspect. 2020;128(4):47004. pmid:32255670
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref8] 8. Wang Y, Wu Y, Jacobson MH, Lee M, Jin P, Trasande L, et al. A family of partial-linear single-index models for analyzing complex environmental exposures with continuous, categorical, time-to-event, and longitudinal health outcomes. Environ Health. 2020;19(1):96. pmid:32912175
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref9] 9. Abdul Basit M, Imran M, Khan SA, Alhushaybari A, Sadat R, Ali MR. Partial differential equations modeling of bio-convective sutterby nanofluid flow through paraboloid surface. Sci Rep. 2023;13(1):6152. pmid:37061555
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref10] 10. Basit MA, Imran M, Akgül A, Khan Hassani M, Alhushaybari A. Mathematical analysis of heat and mass transfer efficiency of bioconvective Casson nanofluid flow through conical gap among the rotating surfaces under the influences of thermal radiation and activation energy. Results in Physics. 2024;63:107863.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref11] 11. Basit MA, Imran M, Mohammed WW, Ali MR, Hendy AS. Thermal analysis of mathematical model of heat and mass transfer through bioconvective Carreau nanofluid flow over an inclined stretchable cylinder. Case Studies in Thermal Engineering. 2024;63:105303.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref12] 12. Basit MA, Imran M, Safdar R, Tahir M, Ali MR, Hendy AS, et al. Thermally radiative bioconvective nanofluid flow on a wavy cylinder with buongiorno model: A sensitivity analysis using response surface methodology. Case Studies in Thermal Engineering. 2024;55:104178.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref13] 13. Imran M, Basit MA, Yasmin S, Khan SA, Elagan SK, Akgül A, et al. A proceeding to numerical study of mathematical model of bioconvective Maxwell nanofluid flow through a porous stretching surface with nield/convective boundary constraints. Sci Rep. 2024;14(1):1873. pmid:38253571
View Article
PubMed/NCBI
Google Scholar

[44] View Article

[45] PubMed/NCBI

[46] Google Scholar

[ref14] 14. Amar E, Popov V, Sharma VM, Andreev Batat S, Halperin D, Eliaz N. Response Surface Methodology (RSM) Approach for Optimizing the Processing Parameters of 316L SS in Directed Energy Deposition. Materials (Basel). 2023;16(23):7253. pmid:38067997
View Article
PubMed/NCBI
Google Scholar

[48] View Article

[49] PubMed/NCBI

[50] Google Scholar

[ref15] 15. Qu C, Yu S, Luo L, Zhao Y, Huang Y. Optimization of ultrasonic extraction of polysaccharides from Ziziphus jujuba Mill. by response surface methodology. Chem Cent J. 2013;7(1):160. pmid:24059696
View Article
PubMed/NCBI
Google Scholar

[52] View Article

[53] PubMed/NCBI

[54] Google Scholar

[ref16] 16. Louhıchı G, Bousselmı L, Ghrabı A, Khounı I. Process optimization via response surface methodology in the physico-chemical treatment of vegetable oil refinery wastewater. Environ Sci Pollut Res Int. 2019;26(19):18993–9011. pmid:29987464
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref17] 17. Bobb JF, Valeri L, Claus Henn B, Christiani DC, Wright RO, Mazumdar M, et al. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics. 2015;16(3):493–508. pmid:25532525
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref18] 18. Bobb JF, Claus Henn B, Valeri L, Coull BA. Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. Environ Health. 2018;17(1):67. pmid:30126431
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref19] 19. Li H, Deng W, Small R, Schwartz J, Liu J, Shi L. Health effects of air pollutant mixtures on overall mortality among the elderly population using Bayesian kernel machine regression (BKMR). Chemosphere. 2022;286(Pt 1):131566. pmid:34293557
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref20] 20. Zhao N, Smargiassi A, Hudson M, Fritzler MJ, Bernatsky S. Investigating associations between anti-nuclear antibody positivity and combined long-term exposures to NO2, O3, and PM2.5 using a Bayesian kernel machine regression approach. Environ Int. 2020;136:105472. pmid:31991236
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref21] 21. Kupsco A, Kioumourtzoglou M-A, Just AC, Amarasiriwardena C, Estrada-Gutierrez G, Cantoral A, et al. Prenatal Metal Concentrations and Childhood Cardiometabolic Risk Using Bayesian Kernel Machine Regression to Assess Mixture and Interaction Effects. Epidemiology. 2019;30(2):263–73. pmid:30720588
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref22] 22. Frenoy P, Perduca V, Cano-Sancho G, Antignac J-P, Severi G, Mancini FR. Application of two statistical approaches (Bayesian Kernel Machine Regression and Principal Component Regression) to assess breast cancer risk in association to exposure to mixtures of brominated flame retardants and per- and polyfluorinated alkylated substances in the E3N cohort. Environ Health. 2022;21(1):27. pmid:35216589
View Article
PubMed/NCBI
Google Scholar

[80] View Article

[81] PubMed/NCBI

[82] Google Scholar

[ref23] 23. Banerjee M, George J, Song EY, Roy A, Hryniuk W. Tree-based model for breast cancer prognostication. J Clin Oncol. 2004;22(13):2567–75. pmid:15226324
View Article
PubMed/NCBI
Google Scholar

[84] View Article

[85] PubMed/NCBI

[86] Google Scholar

[ref24] 24. Banerjee M, Noone A. Tree‐Based Methods for Survival Data. Wiley Series in Probability and Statistics. Wiley. 2007. 265–85.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref25] 25. Hu L, Li L. Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series. Int J Environ Res Public Health. 2022;19(23):16080. pmid:36498153
View Article
PubMed/NCBI
Google Scholar

[91] View Article

[92] PubMed/NCBI

[93] Google Scholar

[ref26] 26. Grinsztajn L, Oyallon E, Varoquaux G. Why Do Tree-Based Models Still Outperform Deep Learning on Typical Tabular Data? In: Advances in Neural Information Processing Systems 35, 2022. 507–20.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref27] 27. Kern C, Klausch T, Kreuter F. Tree-based Machine Learning Methods for Survey Research. Surv Res Methods. 2019;13(1):73–93. pmid:32802211
View Article
PubMed/NCBI
Google Scholar

[98] View Article

[99] PubMed/NCBI

[100] Google Scholar

[ref28] 28. Linero AR. A review of tree-based Bayesian methods. CSAM. 2017;24(6):543–59.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref29] 29. Chipman HA, George EI, McCulloch RE. BART: Bayesian additive regression trees. Ann Appl Stat. 2010;4(1).
View Article
Google Scholar

[105] View Article

[106] Google Scholar

[ref30] 30. Linero AR, Yang Y. Bayesian Regression Tree Ensembles that Adapt to Smoothness and Sparsity. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2018;80(5):1087–110.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref31] 31. Linero AR. SoftBart: soft Bayesian additive regression trees. In: 2022. https://arxiv.org/abs/2210.16375
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref32] 32. Malehi AS, Jahangiri M, Vizureanu P. Classic and bayesian tree-based methods. Enhanced Expert Systems. 2019:27–51.

[ref33] 33. Park SK, Zhao Z, Mukherjee B. Construction of environmental risk score beyond standard linear models using machine learning methods: application to metal mixtures, oxidative stress and cardiovascular disease in NHANES. Environ Health. 2017;16(1):102. pmid:28950902
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref34] 34. Vuong AM, Xie C, Jandarov R, Dietrich KN, Zhang H, Sjödin A, et al. Prenatal exposure to a mixture of persistent organic pollutants (POPs) and child reading skills at school age. Int J Hyg Environ Health. 2020;228:113527. pmid:32521479
View Article
PubMed/NCBI
Google Scholar

[119] View Article

[120] PubMed/NCBI

[121] Google Scholar

[ref35] 35. Zhang T, Geng G, Liu Y, Chang HH. Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM2.5 Components. Atmosphere (Basel). 2020;11(11):1233. pmid:34322279
View Article
PubMed/NCBI
Google Scholar

[123] View Article

[124] PubMed/NCBI

[125] Google Scholar

[ref36] 36. Woody S, Carvalho CM, Murray JS. Model Interpretation Through Lower-Dimensional Posterior Summarization. Journal of Computational and Graphical Statistics. 2020;30(1):144–61.
View Article
Google Scholar

[127] View Article

[128] Google Scholar

[ref37] 37. Wood SN. Generalized additive models: an introduction with R. 2017.

[ref38] 38. Ritz C, Baty F, Streibig JC, Gerhard D. Dose-Response Analysis Using R. PLOS ONE. 2016;10(12):e0146021.
View Article
Google Scholar

[131] View Article

[132] Google Scholar

[ref39] 39. Zhang D, Eick SM, Chang HH. Approximate Bayesian Kernel Machine Regression via Random Fourier Features for Estimating Joint Health Effects of Multiple Exposures. In: 2025. https://arxiv.org/abs/1411.4000
View Article
Google Scholar

[134] View Article

[135] Google Scholar

[ref40] 40. Gibson EA, Nunez Y, Abuawad A, Zota AR, Renzetti S, Devick KL, et al. An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length. Environ Health. 2019;18(1):76. pmid:31462251
View Article
PubMed/NCBI
Google Scholar

[137] View Article

[138] PubMed/NCBI

[139] Google Scholar

[ref41] 41. Mitro SD, Birnbaum LS, Needham BL, Zota AR. Cross-sectional associations between exposure to persistent organic pollutants and leukocyte telomere length among U.S. adults in NHANES, 2001-2002. Environ Health Perspect. 2016;124(5):651–8.
View Article
Google Scholar

[141] View Article

[142] Google Scholar

[ref42] 42. Small R, Coull BA. A variational inference algorithm for BKMR in the cross-sectional setting. arXiv preprint. 2018.
View Article
Google Scholar

[144] View Article

[145] Google Scholar

[ref43] 43. Hahn PR, Murray JS, Carvalho CM. Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion). Bayesian Anal. 2020;15(3).
View Article
Google Scholar

[147] View Article

[148] Google Scholar