Skip to main content
Advertisement
  • Loading metrics

One health at the last mile: Multi-scale predictors of Schistosoma japonicum infection in southwest china across two decades of control

  • William W. Zou,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Environmental and Occupational Health, Colorado School of Public Health, University of Colorado Anschutz, Aurora, Colorado, United States of America

  • Elise N. Grover,

    Roles Data curation, Methodology, Software, Supervision, Writing – review & editing

    Affiliation Department of Environmental and Occupational Health, Colorado School of Public Health, University of Colorado Anschutz, Aurora, Colorado, United States of America

  • Liu Yang,

    Roles Conceptualization, Data curation, Funding acquisition, Project administration, Resources, Supervision, Writing – review & editing

    Affiliation Sichuan Center for Disease Control and Prevention, Chengdu, Sichuan, China

  • Elizabeth J. Carlton

    Roles Conceptualization, Data curation, Funding acquisition, Project administration, Resources, Supervision, Writing – original draft, Writing – review & editing

    elizabeth.carlton@cuanschutz.edu

    Affiliation Department of Environmental and Occupational Health, Colorado School of Public Health, University of Colorado Anschutz, Aurora, Colorado, United States of America

Abstract

In China, schistosomiasis is targeted for elimination. As the country approaches elimination, it is critical to evaluate how the dynamics of transmission are changing in remaining pockets of disease. We have been studying areas of schistosomiasis reemergence and persistence in Sichuan, China since 2007. This study used gradient boosting machines to identify key predictors of infection across two periods, 2007–2010, a period when schistosomiasis had reemerged, and 2016–2019, a period when schistosomiasis was approaching elimination. We also evaluated how key predictors of infection have shifted over time and whether combinations of predictors amplified risk. We considered predictors describing agriculture, domestic animals, socio-economic status, water and sanitation infrastructure, and demographics at individual, household and village-level scales. Our re-emergence and elimination models demonstrated strong predictive performances (AUC-PR = 0.92 and AUC = 0.85, respectively). In both periods, a person’s age and village level agricultural practices including the average area of dry crops, rice planted, and night soil use, were among the most influential factors. Village-level factors dominated in 2007–2010, while household and individual predictors increased in predictive importance in 2016–2019. Between 2007–2010 and 2016–2019, there were increases in the importance of household agricultural practices such as the area of dry crops and rice cultivated, and household cat and dog ownership, while the importance of factors describing water and sanitation infrastructure decreased. In the elimination period, our models found the combination of high village dry crop cultivation and lack of improved sanitation amplified infection probability. Our findings suggest adding precision interventions targeting high-risk households on top of existing community-wide measures may accelerate schistosomiasis elimination. Practitioners should consider adding agricultural, sanitation and animal infection data to end-game surveillance programs, while researchers evaluate the consistency of these findings in other low-endemic settings and explore causal pathways to inform adaptive, locally tailored strategies.

Author summary

Schistosomiasis is a parasitic disease that has been a target of disease control efforts globally, with China aiming to eliminate the disease as a public health problem. In China, disease control efforts have been successful in reducing the spread and prevalence of the disease, though there are remaining pockets of low levels of transmission. Our study compared the most important predictors of schistosomiasis infection risk between two periods, 2007–2010, a period when schistosomiasis had reemerged, and 2016–2019, a period when schistosomiasis was approaching elimination. We found village-level factors were the most important predictors of infection in the earlier and later periods, while household and individual-level increased in importance in the later period. For example, the area rice and other crops cultivated were positively associated with infection. The importance of potential animal hosts such as ownership of cats and dogs also increased over time. We also found that the age of peak disease risk shifted from 40-60 to >80 years of age over our 13-year study period. Our results indicate that the factors behind disease may be changing, potentially due to the selective pressures of decades of disease control and largescale socioeconomic changes such as urbanization.

Introduction

As the World Health Organization (WHO) pushes for global schistosomiasis elimination by 2030, there is an urgency to understand the epidemiological dynamics that characterize the final stages of disease control [1]. Schistosomiasis is caused by multiple human-infecting Schistosoma species, with Schistosomiasis mansoni found in Africa and the Americas, S. haematobium in Africa, and S. japonicum in Asia [2]. Together, these parasites impose a substantial global disease burden, driven by persistent transmission in settings shaped by poverty, surface water contact, and limited sanitation infrastructure. WHO’s goal for schistosomiasis is “elimination as a public health problem,” operationalized as reducing the prevalence of heavy-intensity infections to <1% [1]. The edge of elimination for schistosomiasis remains open and under-explored: do the drivers of transmission shift? Do interventions need to be adapted as prevalence declines? Identifying who remains at risk and which factors sustain transmission is important for effective control efforts in the face of changing ecological, economic, behavioral, and demographic conditions.

China offers a strong case study for investigating these questions. Intestinal schistosomiasis caused by S. japonicum is a zoonotic disease system, with humans and a wide range of non-human mammals capable of serving as definitive hosts, potentially contributing to environmental contamination and human risk [3]. In the S. japonicum life cycle, eggs excreted in mammalian feces hatch in freshwater and infect amphibious Oncomelania snails, which eventually release cercariae and perpetuate mammalian infection risk upon contact with contaminated water [4]. Once hyperendemic for intestinal schistosomiasis with a seroprevalence of 34.8% in 1982, China has achieved largescale reductions through a national control program involving mass drug administration, snail control, and targeted treatment of bovine reservoirs [5].By 2020, seroprevalence had dropped to 2.4%, and the number of endemic counties had declined substantially [6]. Despite this progress, schistosomiasis remains endemic in 113 counties as of 2023 [7], including the rural and mountainous regions of Sichuan province, where seroprevalence remains at 1.12% [8]. While humans and bovines have long been the primary hosts targeted for control, emerging evidence suggests that other animals, such as dogs, pigs and rodents, may also contribute to ongoing transmission [910]. Simultaneously, broader socio-economic shifts, including an aging rural population and land-use change, may be reshaping patterns of exposure in ways that legacy interventions do not fully address [1112]. Understanding how these factors impact infections, whether their importance has shifted over time, and the extent to which different exposures interact to amplify risk is critical for refining China’s control strategy and for informing global efforts in similar late-stage landscapes.

To capture these evolving dynamics, we used infection, household and demographic data from 2007 to 2019 in rural Sichuan, China and analyzed it using Gradient Boosting Machines (GBMs) to compare predictors of infection during two critical time periods: 2007–2010, when schistosomiasis reemerged in parts of Sichuan [13], and 2016–2019, when the country approached its elimination targets [14]. These methods are well-suited for evaluating a suite of potential risk factors, modeling complex, non-linear relationships and interactions between predictors, and build on prior work by Grover et al. and Buchwald et al. [1517]. Machine learning approaches offer new opportunities to uncover key predictors of infection alone and in combination, and explore how the relative importance of environmental, agricultural, water, sanitation and hygiene (WASH), and socio-economic variables has changed in response to decades of control [18]. By evaluating shifts in predictor importance, we aim to provide insight into the changing dynamics of transmission and identify optimal conditions for human infections to occur. While our analysis is specific to S. japonicum in Sichuan, China, we also aim to provide a model for identifying key predictors of infection in other contexts approaching elimination that can be used to inform surveillance strategies approaching and the design and evaluation of targeted interventions.

Materials and methods

Ethics statement

This study was approved by the Sichuan Institutional Review Board (China), the University of California, Berkeley (USA), Committee for the Protection of Human Subjects, and the Colorado Multiple Institutional Review Board (USA). All participants provided written, informed consent. Children provided assent and their parents or guardians provided written informed permission for their children to participate in the study.

Study region and design

Data were collected in 2007, 2010, 2016, and 2019 in Sichuan, China, as part of an ongoing study of schistosomiasis reemergence and persistence in areas targeted for elimination. The study has focused on areas where the disease was suspected to be present despite aggressive local control measures, adding new areas of suspected transmission over time. It is not intended to provide a representative sample of Sichuan province. Details of village selection and survey methods have been described in detail elsewhere [16] and in the appendix.

Human infection data

For each of the study years, infection status was determined by examining up to three stool samples over three consecutive days using the miracidium hatching test, examining 30 grams of stool per sample [13]. In 2007 and 2010 one stool sample was also tested using the Kato-Katz thick smear procedure examining three slides per sample [13]. Infection surveys were conducted in November and December of 2007, 2010 and 2019, and in June and July of 2016 [13]. People were classified as infected if any of the tests were positive. Individuals who tested positive were notified and referred to the local schistosomiasis control station for treatment.

Infection risk factors

Data on potential risk factors were collected from household and demographic surveys. During the summers of 2007, 2010, 2016 and 2019, the head of each participating household completed a survey with close-ended questions regarding socioeconomic status, domestic and farm animal ownership, sanitation and water access, and agricultural practices. Demographic surveys were conducted as part of the census and administered to the participant directly, or to the head of household. In 2010, demographics surveys were only administered to new participants. All surveys were conducted in the local dialect by trained staff from the Sichuan Center for Disease Control and Prevention and the county’s Schistosomiasis Control Stations.

We selected candidate predictors of human infection based on six categories that we hypothesized could contribute to human S. japonicum infection risk: 1) agricultural practices, 2) animal reservoirs, 3) individual demographics characteristics, 4) occupational risk factors, 5) socioeconomic status (SES), and 6) WASH indicators (Table 1). Additionally, we included a county predictor to account for differences in schistosomiasis control program administration and other differences between counties not captured in the aforementioned risk factors.

thumbnail
Table 1. Description of candidate predictors for S. japonicum infection by year, including variables that describe agricultural practices, potential animal reservoirs, individual characteristics, occupation, socio-economic status, and access to water, sanitation and hygiene (WASH) infrastructure.

https://doi.org/10.1371/journal.pntd.0013573.t001

In the study region, common agricultural practices include growing a range of crops throughout the year (e.g., rice, corn, vegetables, wheat, rapeseed) and using night soil (a mix of human and animal waste extracted from stool pits), which is applied as an agricultural fertilizer, in addition to chemical fertilizers. We included estimates of total annual crop area, and the annual volume of night soil applied to crops as predictors of human infection. Because rice farming involves flooding fields in ways that can lead to snail habitat formation and distinct water contact patterns, we also categorized crop area and night soil use into two groups: “rice crops” and “dry crops” (i.e., all other crops).

We generated a nine-point composite asset score as a proxy for SES, following Grover et al. [16]. Household assets were derived from the household survey, wherein the head of household was asked whether they owned any of `the following eight assets: tractors, televisions, air conditioners, refrigerators, computers, cars or trucks, motorcycles, and washing machines. The asset score also included an indicator of whether the home was constructed from either brick or concrete (as opposed to adobe). The composite score was calculated by summing up all reported assets for a given household, yielding a score between zero and nine.

We developed village-level variables that summarized agricultural, animal reservoir, SES and WASH related risks to provide broader context on community-wide exposure and risk factors and capture environmental and socio-economic influences that may impact schistosomiasis infection risk beyond the individual or household scale. Village-level variables were constructed from household survey data collected from all other households located in the same village, excluding data from the household itself to avoid interdependence between household-level and village-level variables. We aggregated continuous household variables to village-mean values and binary household variables to a village-proportion value.

We established criteria for including, removing, or collapsing variables. Only predictors that were available in all study years were included in this analysis. We decided a priori to exclude or transform variables with less than 10% or over 90% of observations in a single category. For example, for variables like occupation where the original formulation included several rare (<10%) categories, we opted for a binary formulation of the variable (farmer vs. non-farmer). We also removed highly collinear variables. For example, “House material” was excluded from this analysis because it was already a component of our SES asset score variable. We evaluated multicollinearity using Pearson correlation coefficients and planned to select representative predictors from clusters of predictors with values >0.7 or <-0.7 but no variables had correlations above the threshold [19; S2 Fig]. The final dataset included 25 predictors: one county-level, four individual-level, ten household-level, and ten village-level variables (Table 1).

Statistical analysis

We used GBMs to identify key predictors of schistosomiasis infection risk, evaluate their relationships with infection outcomes, and assess interactions between predictors. GBMs combine multiple regression trees into an ensemble, enabling them to fit complex, non-linear relationships, which is particularly useful for capturing the non-linear dynamics of disease transmission. Compared to traditional regression models, GBMs can provide stronger predictive performance when modeling biological processes like schistosomiasis transmission, which often involve complex interdependencies [15]. Models were constructed in R using the “gbm” and “dismo” packages [20,21].

We bifurcated the data to evaluate potential changes in infection risk factors between 2007–2010 and 2016–2019. The first period (2007–2010) covers the “reemergence” period, the period shortly after S. japonicum reemergence was recognized in the region, and a period when infections were relatively higher (human infection prevalence 8.63% in 2007 and 2010). The second period (2016–2019) covers the “elimination” period, a period when infections were increasingly rare (human infection prevalence 4.51% in 2016 and 2019) and S. japonicum elimination was being aggressively pursued.

We partitioned the two datasets into spatially balanced (similar proportion of observations across villages) and temporally balanced (similar proportion of observations across the years) training (70%), evaluation (20%) and test (10%) sets for the analysis to prevent target leakage and improve the generalizability of the results. This was performed using the “BalancedSampling” package in R [22]. We employed 5-fold cross-validation on our training sets so that the model’s performance was assessed across different subsets of the data to reduce the risk of overfitting and improve generalizability.

To address class imbalance in our outcome (8.58% infection in the reemergence period; 4.51% infection in the elimination period), we over-sampled the minority class using Random Walk Oversampling (RWO), implemented via the “imbalance” package [23], which generated synthetic infected observations based on the variance and mean of the infected observations.

The datasets also contained missing values across several predictors (3.27% of data were missing due to ambiguous or incomplete responses from survey participants). We addressed missing data in the training datasets using the “randomForest” package [24] to impute missing values based on the median value for continuous variables and the mode value for binary variables.

Optimal model hyperparameters including learning rate (0.01-0.3), tree complexity (1–5), and number of trees (900–1000) were selected from a range of values using a grid search which iterated over one hundred combinations of hyperparameter values and allowed us to identify the strongest combination for each model. We assessed model performance using Area Under the Precision-Recall (AUC-PR) curves, sensitivity, specificity, accuracy, and kappa values. We evaluated our models based on AUC-PR it is robust to imbalanced datasets [25]. To account for model uncertainty, we refit the full modeling pipeline across 20 bootstrapped iterations using different random seeds and used the set of performance metrics to generate confidence intervals. In addition, because most imputation methods include a stochastic component changing the random seed can yield slightly different imputed values, especially when missingness is non-trivial, so refitting the full pipeline across 20 bootstrap iterations with different seeds tests whether results are robust to imputation-induced variability.

Our primary analytical goal was to rank predictor importance by relative importance, defined as the proportion of the total reduction in squared error that each variable contributes to the models. As secondary goals, we aimed to describe (i) the shape of non-linear marginal effects and (ii) the strength of two-way interactions in relation to S. japonicum infection risk.

For the six most influential predictors from each time period, we graphed these functions as Partial Dependence Plots (PDPs) using the pdp package [26]. Uncertainty around each curve was quantified with 95% confidence bands generated from 1,000 bootstrap replicates of the training data and reflect the range of predicted infection probabilities at each predictor value. These plots display the average predicted outcome (in this case, probability of infection) as a function of a single predictor, marginalizing the joint distribution of all other variables in the study. The resulting curves reflect nonlinear relationships between our predictors and the marginal probability of infection, as estimated by our BRT models.

We examined pairwise interactions between predictors because transmission dynamics are shaped by complex interdependencies among environmental, socio-economic, and biological factors. Pairwise interactions were quantified using Elith et al.’s customized “Boosted Regression Tree (BRT)” package [15]. All possible pairwise interactions were made available to the model, but they were not all forced into the final fit. Instead, only those splits that lower out-of-bag deviance while applying shrinkage, a small tree depth, and stochastic subsampling were included in the final BRT model. Regularization settings were included to limit model complexity and overfitting and improve generalizability. The most important interactions were identified using interaction size, a value that functions similarly to relative importance in that it quantifies the strength of pairwise interactions based on the additional variance explained by allowing the two variables to interact, beyond their additive effects alone. The interaction sizes are unitless and do not follow a fixed scale, allowing them to identify relatively strong interactions within a single model. However, they are not directly comparable across models or to relative importance scores, as they reflect localized interaction effects rather than global predictive contributions. We selected an interaction size of >5 as a threshold for reporting. We created three-dimensional partial plots using the “BRT” package for the top three most important interactions to examine the direction of these relationships.

Results

Our analysis included 3,033 observations from people aged 6–96 years old across 2 rural counties and 51 villages in Sichuan province, collected from 2007 to 2019. In the study villages, populations ranged from 7-70 households and 12–156 residents. There was a marked decline in infections from 2007 to 2019, even with our sampling strategy focused on areas thought to have ongoing transmission: in 2007, 8.46% of the 1,974 people tested were infected, and by 2019, 1.02% of the 689 people tested were infected (Table 1).

Our models assessing predictors of schistosomiasis infection during the re-emergence and elimination periods demonstrated strong predictive performance, with mean AUC-PR values of 0.92 and 0.85, respectively. Both models also had high mean specificity (98% and 96%) and accuracy (0.97 and 0.95) (Table 2). Mean model sensitivity and Kappa values were higher for the re-emergence period (Sensitivity = 0.90, Kappa = 0.95) than the elimination period (Sensitivity = 0.79, Kappa = 0.88).

thumbnail
Table 2. Predictive performance of the reemergence (2007-2010) and elimination models (2016-2019) with 95% confidence intervals across 20 bootstrapped iterations.

https://doi.org/10.1371/journal.pntd.0013573.t002

In the reemergence period (2007–2010), the three most important variables (ranked 1–25 with 1 being the most important) were all village-level variables: dry crop area (V), rice area (V) and night soil rice (V) (Fig 1). Age (I), improved sanitation (V), dry crop area (H) and bovine ownership (V) were also strong predictors of infection risk. Of the top seven predictors, five were village-level, one was at the household-level, and one was at the individual-level. The least important were all household-level predictors (improved sanitation, cat ownership, well water usage, and dog ownership).

thumbnail
Fig 1. Change in the ranked importance of predictors from 2007-2010 (reemergence period) to 2016-2019 (elimination period).

Red lines indicate a decrease in ranked importance from the reemergence period to the elimination period. Blue lines indicate an increase.

https://doi.org/10.1371/journal.pntd.0013573.g001

During the elimination period (2016–2019), dry crop area (V), night soil rice (V), age (I) and dry crop area (H) remained strong predictors of infection (Fig 2). Meanwhile, there were large increases (change of ≥7 ranks) in the ranked importance of cat ownership (H), education (I), and dog ownership (H), assets (V), and improved sanitation (H), all of which were previously ranked low. Of the seven most important predictors, three were village-level, two were household-level, and two were individual-level predictors. The least influential predictors (ranked ≥20) were well water usage (H & V), occupation (I), county, bovine ownership (H), sex (I), night soil dry crops (H), cat ownership (V).

thumbnail
Fig 2. Partial dependence plots (PDP) of the six most important predictors of human schistosomiasis infection risk in 2007-2010 (reemergence model).

The PDPs display the change in the average predicted infection risk as predictors vary over their marginal distribution while holding all other variables constant. Fitted curves (dashed lines), smoothing splines (solid blue lines) and 95% confidence intervals based on 1000 bootstrap replicates of the data set (shading) are shown. The full distribution of the predictors is displayed as rug ticks on the top of the plot.

https://doi.org/10.1371/journal.pntd.0013573.g002

We saw evidence that the relationships between predictors and infection risk displayed a mixture of linear and non-linear associations. In 2007–2010, the marginal change in the probability of infection increased when dry crop area (V) exceeded 8 Mus, or when rice crop area (V) exceeded 2.5 Mus (Fig 2). We found evidence of a positive monotonic relationship between infection risk and night soil use on rice (V), rising from an infection probability of 0.01% when 0 buckets of night soil were used on rice crops, to 0.70% when 67 buckets of night soil were used. Infection risk peaked at 0.14% for individuals between the ages of 40–60. Improved sanitation (V) had a net negative, albeit non-linear association with infection.

In 2016–2019, infection probability was stagnant when dry crop area (V) and night soil rice (V) were less than 10 and 23 mu respectively, after which point there was a steep increase in infection probability, peaking at 1.7% at 14 Mus of dry crops (V), and 0.43% at 25 mu of rice (Fig 3). We saw evidence that infection risk was higher in individuals over 80 years of age, and for individuals in households planting >10 Mus of dry crops. Larger areas of rice crops at the village and household level were also associated with increased infection risk.

thumbnail
Fig 3. Partial dependence plots (PDP) of the top six predictors of human schistosomiasis infection risk in 2016-2019 (elimination model).

The PDP (blue line) displays the change in the average predicted infection risk as predictors vary over their marginal distribution while holding all other variables constant. Fitted curves (dashed lines), smoothing splines (solid blue lines) and 95% confidence intervals based on 1000 bootstrap replicates of the data set (shading) are shown. The full distribution of the predictors is displayed as rug ticks on the top of the plot.

https://doi.org/10.1371/journal.pntd.0013573.g003

Table 3 presents the strongest pairwise interactions in 2007–2010 based on our BRT analysis, showing the estimated combined effects of predictors on human schistosomiasis infection risk as measured by their interaction size. The interaction between dry crop area (V) and improved sanitation (V) was the most important interaction, followed by night soil rice crops (V) and dry crop area (H), and bovines (V) and well water (V). The interaction between dry crop area (V) and improved sanitation (V) had a moderate negative relationship (Fig 4). Infection risk was highest for individuals living in households that simultaneously reported planting moderately large areas of dry crops (15 – 17 mu) and were surrounded by households reporting high night soil usage on rice crops (≥60 buckets). Infection risk was also higher for individuals living in villages with a higher presence of bovines (average household ownership >1.5) and higher well water usage (100%).

thumbnail
Table 3. Pairwise interactions that were the strongest predictors of S. japonicum infection (interaction size > 5) in the reemergence period, 2007-2010.

https://doi.org/10.1371/journal.pntd.0013573.t003

thumbnail
Fig 4. Three-dimensional partial dependence plots of the three most important pairwise interactions in 2007-2010.

Each panel shows the GBM’s fitted value (z-axis; higher values indicate higher predicted S. japonicum infection risk) as a function of two predictors (x- and y-axes) with all other covariates averaged over their observed distributions. Left-to-right: (A) village-level dry crop area and village-level improved sanitation; (B) village-level night soil use to rice fields and dry crop area; (C) village-level bovine presence and village-level well-water use.

https://doi.org/10.1371/journal.pntd.0013573.g004

Table 4 presents the highest ranked pairwise interactions in 2016–2019 based on our BRT analysis, showing the combined effects of various predictors on human schistosomiasis infection risk, measured by their interaction size. The interaction between the dry crop area (V) and improved sanitation (H) was the most important interaction, followed by dogs (H) and age (I), then dry crop area (V) and age (I). Three of the six most important pairwise interactions included dry crop area (V) and age (I). Infection risk was highest for those over 80 who also owned dogs, or those over 80 who also lived in villages where the reported area of dry crop land was high (>10 mu) (Fig 5).

thumbnail
Table 4. Pairwise interactions that were the strongest predictors of S. japonicum infection (interaction size > 5) in the elimination period, 2016-2019.

https://doi.org/10.1371/journal.pntd.0013573.t004

thumbnail
Fig 5. Three-dimensional partial dependence plots of the three most important pairwise interactions in 2016-2019.

Each panel shows the GBM’s fitted value (z-axis; higher values indicate higher predicted S. japonicum infection risk) as a function of two predictors (x- and y-axes), with all other covariates averaged over their observed distributions. Left-to-right: (A) village-level dry crop area and household improved sanitation; (B) household dog presence and individual age; (C) village-level dry crop area and individual age.

https://doi.org/10.1371/journal.pntd.0013573.g005

Discussion

Our findings suggest that the environmental and socioeconomic predictors of S. japonicum transmission are dynamic, with predictor importance shifting as the disease moves from reemergence to near elimination, but there was consistency in the predictive contribution of agricultural land use. Dry crop area (V), rice area (V), night soil rice (V) consistently emerged as a strong predictor across both time periods, indicating that farming practices likely maintain a role in schistosome exposure. Dry crops such as maize, wheat, and rapeseed are often grown on raised embankments or in rotation with paddy rice, creating seasonally moist pockets that may allow snail hosts to persist even when surrounding fields appear dry and may also support other potential reservoirs [27]. Rice paddies, by contrast, require farmers and livestock to spend prolonged periods in water, increasing exposure opportunities [28]. Night soil use may contribute to infection risk by depositing viable S. japonicum eggs directly into snail habitats while providing soil and water with organic matter that further boosts snail survival [28]. Night soil use emerged as one of the most influential predictors in both the re-emergence and elimination phases of our analysis. Its persistent importance aligns with other studies showing associations between night soil use and human schistosomiasis infections [29].

While broader indicators of village-level agriculture (e.g., village-level dry crop area) were stronger predictors during the reemergence period, individual and household-level variables (e.g., household-level dry crop area and rice crop area) rose in importance during the elimination period. This trend may reflect a transition from widespread environmental exposure to more specific, household-based transmission risks, suggesting that granular control strategies may be needed as the S. japonicum approaches elimination. The importance of variables describing sanitation also shifted over time. In the reemergence period, village-level improved sanitation (e.g., the proportion of households in the village with improved toilets) was moderately predictive of infection risk, suggesting that broad community-level improved sanitation access shaped transmission dynamics during this time. However, the importance of village-level sanitation was markedly lower in the elimination period when far more participants lived in households with improved sanitation (19% in 2007 vs 55% in 2019). In contrast, household-level sanitation (access to a biogas or three-compartment toilet) became a more important predictor during the elimination phase. This shift may reflect a narrowing of transmission pathways: as widespread environmental contamination declines due to village-level improvements, remaining infections may be increasingly influenced by practices and exposure within a given household which may be directly related to sanitation or sanitation may be a proxy for other variables such as poverty and/or water contact. It may also suggest that community-level sanitation interventions reach a point of saturation and limit their predictive value when high levels of coverage are attained.

The unexpected rise in importance of cat and dog ownership during the elimination period also requires further exploration. One hypothesis is that cats and dogs play distinct roles in household-level transmission ecology. For example, cat ownership might be protective if they control rodent populations that serve as S. japonicum reservoir hosts. Recent research indicates that rodents may have become important zoonotic reservoirs of S. japonicum in endemic regions of China, with increased infection prevalence in mountainous regions such as Sichuan [9]. Conversely, dogs may be associated with higher risk due to their limited role in rodent control and their own susceptibility to S. japonicum infection [10]. Dogs are more likely than cats to accompany people outdoors, enter irrigation channels or streams, and spend time in or near water, which could increase their exposure to cercariae in snail-infested habitats and, by extension, their potential to contribute to transmission. It would be worthwhile to evaluate the consistency of these findings in other contexts. Concurrent testing of humans and other animal hosts in this context would be ideal for future studies.

Variables that were most important for predicting schistosomiasis infection at the individual-level were age in the reemergence and elimination period. Age generally had a positive relationship with infection risk in both time periods, but peak risk shifted rightward to older individuals over time, from a peak risk that occurred at 40–60 years of age in the reemergence period to a peak risk that occurred beyond 80 years of age in the elimination period. One possible explanation for this is the rapid urbanization of the country that may have led to younger individuals working in urban areas while older individuals stayed in rural villages to continue farming. The age distribution between the two periods supports this hypothesis, with a more right-skewed distribution in the earlier years and a more left-skewed distribution in the later period (rug ticks, Figs 2,3; S1) showing aging village populations. Recent literature on the sustainability of farms in China provides further evidence for this finding, with younger farmers realizing “higher incomes by working in non-agricultural sectors in cities” [30]. Our prior work found that travel was associated with lower water contact [17], suggesting that the older populations who are left in the villages are bearing the highest water contact levels. The elimination period also saw education become a more important predictor of schistosomiasis infection, with our results suggesting that those who experienced lower to no levels of schooling had a slightly higher infection risk. The mechanism behind this may be that individuals who received less education may be more likely to work in the agricultural sector and thus risk exposure to S. japonicum.

Although our models achieved good predictive performance, several limitations merit consideration. First, class imbalance, particularly the small number of infected cases in 2019 (n = 7), could bias estimates and limit generalizability. As such, our findings from 2019 should be interpreted as hypothesis-generating. We addressed imbalance using Random Walk Oversampling, but synthetic examples may not fully capture true heterogeneity in infection risk. Second, while we used cross-validation and bootstrapping to prevent overfitting, the ensemble nature of boosted models means that overfitting cannot be entirely ruled out. Third, missingness in key predictors may have introduced bias despite imputation, particularly in variables derived from self-reported surveys. Fourth, many variables were drawn from household surveys that were administered with changing survey instruments (e.g., different survey questions regarding seasonal crops) over the past decade, although we tried to ensure variable calculations were comparable over the study period. Fifth, the comparability of infection estimates across years may be influenced by differences in survey timing (summer in 2016, fall in all other years) and small changes in diagnostic protocols (in 2007 and 2010 we screened all participants with Kato-Katz and miracidial hatch testing, and later limited Kato-Katz testing to those with at least one positive hatch test because of the high labor and low added value of the test in our population). Because schistosomiasis transmission risk and praziquantel treatment schedules can vary seasonally, differences in survey timing could have influenced measured infection outcomes and, in turn, model predictions for 2016. We did not have sufficient repeated within-year infection measurements to formally adjust for seasonality.

Sixth, the generalizability of our results to other endemic regions is uncertain; the agroecological and control context of our study area may differ substantially from other settings in China or Southeast Asia and should be considered in future studies. Our goal was to provide an analytical model that could be replicated in other contexts. Further validation with external datasets is necessary to assess the generalizability of our findings to other areas in China and beyond. Seventh, this study does not provide province-representative prevalence estimates for Sichuan, as it was designed to explore characteristics of persistence in higher-risk settings, selecting and adding villages to the sample where transmission was suspected despite local control efforts.

A key strength of using GBM is its ability to model complex, non-linear relationships and interactions across multi-scale predictors, while simultaneously adjusting for a wide set of measured covariates that may confound observed associations. In this sense, the analysis may serve as a useful step toward causal understanding and, ultimately, the design of targeted interventions by identifying predictors and conditions that are consistently associated with infection. However, GBM results should be interpreted with caution: variable importance and response patterns reflect predictive contribution within the observed data structure rather than causal effects. As with any observational analysis, residual confounding from unmeasured or poorly measured factors, measurement error, and potential selection or surveillance biases may influence inferred predictor rankings. Consequently, the findings are best viewed as evidence to prioritize mechanisms and interventions for more targeted causal studies rather than definitive estimates of causal impact.

Overall, our findings support the need for adaptive, context-specific control strategies as schistosomiasis moves toward elimination. Agricultural exposures remain central but increasingly heterogeneous, while the importance of sanitation and companion animal presence appears to evolve over time. Beyond China, this modeling framework is directly applicable to other endemic regions, including Brazil and sub-Saharan Africa, where programs must increasingly prioritize targeted interventions as prevalence declines. By integrating routinely collected infection data with environmental, WASH, and livelihood indicators, similar analyses could help identify persistent transmission “hot spots,” and begin to tailor surveillance and elimination strategies to the most important drivers of infection in local transmission environments. A next step for future analyses would be to translate these predictors into an implementation-ready risk stratification tool and evaluate, from a program perspective, whether risk-based targeting reduces costs and staff burden while maintaining sensitivity compared with population-wide surveillance in low-prevalence settings.

Conclusion

Our findings provide insights into the shifting epidemiology of S. japonicum infection in Sichuan province across two different phases of disease control. Our boosted regression models allowed us to explore the associations and interactions between agricultural, socioeconomic, and individual risk factors at multiple spatial scales. The shift from village-level predictors being dominant during the reemergence period to a more balanced distribution of influential predictors across village, household, and individual-levels during the elimination period suggests that as transmission declines, more localized and individualized factors play a greater role in determining infection risk. This aligns with broader trends of urbanization and shifting agricultural practices in China, where younger individuals are leaving rural areas, leading to an older and potentially more exposed farming population.

These findings are intended to inform surveillance and control prioritization rather than to estimate causal effects. In an elimination setting, the persistent importance of agriculture-related indicators, particularly night soil and paddy-associated measures, supports focusing case detection and environmental monitoring in villages and households where these practices remain. More broadly, our results reinforce the need for adaptive strategies that evolve with the epidemiological phase of control: community-wide interventions may be most impactful during reemergence, while elimination may require supplemental targeted interventions.

Supporting information

S1 File. Sampling Strategy.

A comprehensive description of our sampling strategy.

https://doi.org/10.1371/journal.pntd.0013573.s001

(PDF)

S1 Table. A summary and description of the predictors included in our analysis.

https://doi.org/10.1371/journal.pntd.0013573.s002

(PDF)

S1 Fig. Age distribution of participants: histograms for each of the four survey years.

https://doi.org/10.1371/journal.pntd.0013573.s003

(PNG)

S2 Fig. Heatmap of pearson correlation coefficients between all predictors retrieved from village-level, household and individual demographic and infection surveys.

https://doi.org/10.1371/journal.pntd.0013573.s004

(PNG)

Acknowledgments

We thank the late Dr. Robert Spear for his mentorship and early work on schistosomiasis in Sichuan that laid the foundation for this study.

References

  1. 1. World Health Organization. Ending the neglect to attain the Sustainable Development Goals: A road map for neglected tropical diseases 2021–2030. 2020. Available from: https://www.who.int/publications/i/item/9789240010352
  2. 2. Zhang Y, Ming Y. Burden of schistosomiasis in global, regional, and national 1990-2019: A systematic analysis for the Global Burden of Disease Study 2019. Travel Med Infect Dis. 2024;61:102751. pmid:39173939
  3. 3. Rudge JW, Webster JP, Lu DB, Wang TP, Fang GR, Basáñez MG. Identifying host species driving transmission of schistosomiasis japonica, a multihost parasite system, in China. Proceedings of the National Academy of Sciences. 2013;110(28):11457–62.
  4. 4. Nelwan ML. Schistosomiasis: Life Cycle, Diagnosis, and Control. Curr Ther Res Clin Exp. 2019;91:5–9. pmid:31372189
  5. 5. Zhou Y, Zheng M, Gong Y, Huang J, Wang J, Xu N, et al. Changing seroprevalence of schistosomiasis japonica in China from 1982 to 2020: A systematic review and spatial analysis. PLoS Negl Trop Dis. 2024;18(9):e0012466. pmid:39226311
  6. 6. Zhang L, Li S, Xu J, Cao C, Li S. Epidemic characteristics of schistosomiasis—China, 2016–2023. China CDC Wkly. 2025;7(14):467.
  7. 7. Zhang L, He J, Yang F, Dang H, Li Y, Guo S, et al. Progress of schistosomiasis control in People’s Republic of China in 2023. Zhongguo Xuexi Chongbing Fangzhi Zazhi. 2024;36(3):221.
  8. 8. Chen PU, Yu ZH, Jiajia WA, Nannan WA, Jingye SH, Liang XU, et al. Effectiveness of the integrated schistosomiasis control programme in Sichuan Province from 2015 to 2023. Chinese Journal of Schistosomiasis Control. 2025;37(3):284.
  9. 9. Zou H-Y, Yu Q-F, Qiu C, Webster JP, Lu D-B. Meta-analyses of Schistosoma japonicum infections in wild rodents across China over time indicates a potential challenge to the 2030 elimination targets. PLoS Negl Trop Dis. 2020;14(9):e0008652. pmid:32877407
  10. 10. Carabin H, McGarvey ST, Sahlu I, Tarafder MR, Joseph L, DE Andrade BB, et al. Schistosoma japonicum in Samar, the Philippines: infection in dogs and rats as a possible risk factor for human infection. Epidemiol Infect. 2015;143(8):1767–76. pmid:25274409
  11. 11. Spear RC. Internal versus external determinants of Schistosoma japonicum transmission in irrigated agricultural villages. J R Soc Interface. 2012;9(67):272–82. pmid:21752808
  12. 12. Li Q, Huang J, Luo R, Liu C. China’s Labor Transition and the Future of China’s Rural Wages and Employment. China and World Economy. 2013;21(3):4–24.
  13. 13. Carlton EJ, Bates MN, Zhong B, Seto EYW, Spear RC. Evaluation of mammalian and intermediate host surveillance methods for detecting schistosomiasis reemergence in southwest China. PLoS Negl Trop Dis. 2011;5(3):e987. pmid:21408127
  14. 14. Wang W, Bergquist R, King CH, Yang K. Elimination of schistosomiasis in China: Current status and future prospects. PLoS Negl Trop Dis. 2021;15(8):e0009578. pmid:34351907
  15. 15. Elith J, Leathwick JR, Hastie T. A working guide to boosted regression trees. J Anim Ecol. 2008;77(4):802–13. pmid:18397250
  16. 16. Grover E, Paull S, Kechris K, Buchwald A, James K, Liu Y, et al. Predictors of bovine Schistosoma japonicum infection in rural Sichuan, China. Int J Parasitol. 2022;52(8):485–96. pmid:35644269
  17. 17. Buchwald AG, Grover E, Van Dyke J, Kechris K, Lu D, Liu Y, et al. Human Mobility Associated With Risk of Schistosoma japonicum Infection in Sichuan, China. Am J Epidemiol. 2021;190(7):1243–52. pmid:33438003
  18. 18. Gong Y-F, Zhu L-Q, Li Y-L, Zhang L-J, Xue J-B, Xia S, et al. Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt. Infect Dis Poverty. 2021;10(1):88. pmid:34176515
  19. 19. Faust CL, Castellanos AA, Peel AJ, Eby P, Plowright RK, Han BA, et al. Environmental variation across multiple spatial scales and temporal lags influences Hendra virus spillover. Journal of Applied Ecology. 2023;60(7):1457–67.
  20. 20. Ridgeway G, Edwards D, Kriegler B, Schroedl S, Southworth H, Greenwell B. Generalized Boosted Regression Models. 2024.
  21. 21. Hijmans RJ, Phillips S, Leathwick J, Elith J. dismo: Species Distribution Modeling. 2024.
  22. 22. Grafström A, Prentius W, Lisic J. Balanced and spatially balanced sampling. 2024.
  23. 23. Cordón I. Imbalance: Preprocessing algorithms for imbalanced datasets. 2022.
  24. 24. Liaw A, Wiener M, Breiman L, Cutler A. randomForest: Breiman and Cutler’s Random Forests for classification and regression. 2024.
  25. 25. Sofaer HR, Hoeting JA, Jarnevich CS. The area under the precision‐recall curve as a performance metric for rare binary events. Methods Ecol Evol. 2019;10(4):565–77.
  26. 26. Greenwell BM. pdp: Partial Dependence Plots. 2024.
  27. 27. Lv C, Li Y-L, Deng W-P, Bao Z-P, Xu J, Lv S, et al. The Current Distribution of Oncomelania hupensis Snails in the People’s Republic of China Based on a Nationwide Survey. Trop Med Infect Dis. 2023;8(2):120. pmid:36828536
  28. 28. Seto EYW, Lee YJ, Liang S, Zhong B. Individual and village-level study of water contact patterns and Schistosoma japonicum infection in mountainous rural China. Trop Med Int Health. 2007;12(10):1199–209. pmid:17956502
  29. 29. Carlton EJ, Liu Y, Zhong B, Hubbard A, Spear RC. Associations between schistosomiasis and the use of human waste as an agricultural fertilizer in China. PLoS Negl Trop Dis. 2015;9(1):e0003444. pmid:25590142
  30. 30. Ren C, Zhou X, Wang C, Guo Y, Diao Y, Shen S, et al. Ageing threatens sustainability of smallholder farming in China. Nature. 2023;616(7955):96–103. pmid:36813965