Comparative and complementary use of Bayesian inference and supervised learning for predictive modeling of coffee rust incidence among Kenyan smallholder farmers

Maurice Wanyonyi; Jacqueline Gogo Akelo; Veronicah Nyokabi Njenga; Frankline Obwoge Keraro; Titus Mutua Kioko

doi:10.1371/journal.pclm.0000754

Abstract

Hemileia vastatrix, the causal agent of coffee leaf rust, poses a persistent threat to Arabica coffee production in Kenya, where smallholder farmers face recurrent yield losses and limited access to effective control strategies. Effective disease management requires predictive frameworks capable of quantifying both infection risk and associated uncertainty under real-world farm conditions. This study presents a comparative evaluation of Bayesian hierarchical inference and supervised machine learning approaches for predicting coffee rust incidence, using their complementary strengths to generate probabilistic predictions. The models were developed using longitudinal data from 9,850 plot-level observations across six major coffee-producing counties in Kenya. Microclimatic moisture variables, particularly leaf wetness duration and relative humidity, emerged as the dominant predictors of infection. Partial dependence and SHAP analyses revealed strong nonlinear threshold effects: elevated humidity and prolonged leaf wetness sharply increased infection probability, while proximity to infected farms intensified spatial transmission dynamics. A key finding is that parsimonious, interpretable models performed competitively with complex algorithms. Logistic regression achieved the highest discriminative performance (AUC ROC = 0.867), matching or exceeding more complex ensemble methods while maintaining transparency and computational efficiency. Ensemble models such as random forests achieved slightly higher classification accuracy, highlighting complementary strengths across approaches. The Bayesian hierarchical model contributed additional value by quantifying uncertainty and accounting for unobserved heterogeneity across counties. These findings demonstrate that interpretable models can perform as well as complex machine learning algorithms in this context, an important insight for resource-limited agricultural settings. The proposed framework offers a scalable, transparent decision-support tool for precision disease management and enhances the resilience of smallholder coffee systems in Kenya and similar tropical environments.

Citation: Wanyonyi M, Akelo JG, Njenga VN, Keraro FO, Kioko TM (2026) Comparative and complementary use of Bayesian inference and supervised learning for predictive modeling of coffee rust incidence among Kenyan smallholder farmers. PLOS Clim 5(4): e0000754. https://doi.org/10.1371/journal.pclm.0000754

Editor: Girma Gezimu Gebre, ZALF: Leibniz-Zentrum fur Agrarlandschaftsforschung (ZALF) e V, GERMANY

Received: October 25, 2025; Accepted: April 1, 2026; Published: April 21, 2026

Copyright: © 2026 Wanyonyi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The anonymized dataset used in this study is publicly available in the Zenodo repository at https://doi.org/10.5281/zenodo.17861841 All personally identifiable information, including farmer identities and precise geographic coordinates, has been removed in accordance with data protection agreements with KALRO.

Funding: The authors received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Coffee leaf rust, caused by the obligate fungus Hemileia vastatrix, remains one of the most devastating pathogens affecting Arabica coffee production in the tropics. Reported yield losses can reach up to 50%, with severe socioeconomic consequences across affected regions [1,2]. In Kenya, a major producer and exporter of Arabica coffee, rust outbreaks have become increasingly frequent and severe, threatening a sector that employs over 600,000 smallholder farmers [3]. Beyond direct yield reductions, the disease undermines household food security, reduces farm incomes, and diminishes the competitiveness of Kenyan coffee in global markets.

Hemileia vastatrix depends critically on microclimatic conditions for sporulation, dispersal, and infection. Urediniospores require free moisture on leaf surfaces for germination, and infection success increases dramatically under prolonged leaf wetness (exceeding 10 hours) and high relative humidity (>75%). These conditions reduce leaf boundary layer resistance and facilitate spore adhesion and penetration. Temperature also modulates pathogen development, with optimal ranges of 18–24°C [2,4]. Such microclimatic dependencies underscore the importance of precise environmental monitoring in predictive rust modeling.

Current management practices rely primarily on fungicides and cultural strategies, but high costs, limited accessibility, and the absence of timely, location-specific guidance constrain these approaches. Effective disease prediction systems are therefore essential to enable preventive and cost-effective interventions at the farm level. Recent advances in remote sensing and machine learning have improved the capacity to monitor and predict coffee rust outbreaks. Studies using aerial imagery and vegetation indices have enabled image-based detection of canopy stress and disease hotspots [5,6]. Deep learning applications and wireless sensor networks have further enhanced early detection across diverse agroecological environments [7–9]. Regression-based models linking microclimate to rust dynamics have also been developed [4,10].

Despite these advances, most existing modeling approaches exhibit four persistent limitations. First, they provide deterministic forecasts with limited uncertainty quantification, reducing their value for decision support. Second, they lack scalability in complex shade-grown coffee systems by failing to account for hierarchical relationships among farms, villages, and counties. Third, they rely on inadequate longitudinal datasets for Kenyan smallholder systems, limiting generalizability. Fourth, remote sensing approaches often perform poorly in heterogeneous shade-grown environments due to canopy complexity and microclimatic variability. Although machine learning has been increasingly applied to predict coffee rust, most studies emphasize predictive accuracy without addressing interpretability or uncertainty quantification. For instance, spectroradiometer-based models and deep learning methods applied in Peruvian agroforestry systems have demonstrated high performance [11,12]. However, they still lack the probabilistic estimates needed for practical farm-level risk management.

Furthermore, few approaches combine mechanistic epidemiological understanding, such as leaf wetness duration and latent inoculum dynamics, with data-driven learning, which limits biological interpretability [4]. To address these gaps, this study presents a comparative evaluation of Bayesian hierarchical inference and supervised machine learning approaches for predicting coffee rust incidence in Kenyan smallholder systems, using their complementary strengths to generate probabilistic predictions. Bayesian inference provides a principled framework for quantifying uncertainty and modeling multilevel structure. Supervised learning algorithms, including Random Forests, Support Vector Machines, and Gradient Boosting methods, offer strong predictive capabilities [8,13–16]. By examining these approaches in parallel, we assess their relative performance and identify how each can contribute to more robust disease forecasting.

The specific objectives of this research were: (i) to estimate probabilistic determinants of coffee rust incidence using a Bayesian hierarchical logistic model; (ii) to compare predictive performance across eight supervised learning algorithms; and (iii) to incorporate post hoc explanation tools, namely SHapley Additive ExPlanations (SHAP) and Partial Dependence Plots, into a scalable decision support system for precision coffee disease management. This study provides a replicable, uncertainty-aware framework for predicting plant disease and informing climate-informed, data-driven interventions in smallholder coffee systems in Kenya.

Materials and methods

Study area and data sources

This research used secondary, anonymous data gathered by the Coffee Research Institute (CRI) within the Kenya Agricultural and Livestock Research Organization (KALRO). No personal or identifiable farmer information was included in the dataset, which CRI collected between 2018 and 2023 with technical assistance from World Coffee Research (WCR). There was no direct contact with human participants, as all data were obtained in accordance with existing KALRO research procedures and institutional data-sharing agreements. The study met all requirements for integrity and data protection set out in the University of Embu’s research integrity and data protection policies; therefore, no formal ethical approval was required.

The sample includes six major Arabica-producing counties in Kenya: Bungoma, Kericho, Kiambu, Kirinyaga, Murang’a, and Nyeri. These counties span a broad ecological gradient in altitude, microclimate, and management practices, which allows the dataset to capture diverse production environments. This ecological diversity provides a representative foundation for developing predictive models that generalize across smallholder coffee systems. The data include field observations of coffee leaf rust (Hemileia vastatrix) incidence, together with microclimatic, spatial, and agronomic measurements. This combination supports a comprehensive evaluation of environmental and management drivers of disease dynamics and enables the development of scalable, data-driven prediction models for smallholder coffee farming in Kenya.

Study variables

Rust incidence was the primary response variable and was recorded as a binary outcome indicating the presence or absence of visible coffee leaf rust infection on a given plot, as shown in S1 Fig and S2 Fig. Rust severity, which was also measured but used descriptively rather than as the main modelling outcome, was quantified as the percentage of leaf area affected by rust on each observed plant. These two measures are related but distinct: incidence indicates whether infection occurred, while severity reflects the extent of damage when it did.

Predictor variables included relative humidity (daily), temperature (daily), precipitation, the duration of wetness of the leaves, elevation, normalized difference vegetation index (NDVI) of the plant, the variety of the coffee, age of the plant, percentage of shade, fungicide application, frequency of fungicides applied per season, history of past outbreak, incidence lagged during the previous week, and proximity to the closest infected farm. Collectively, these predictors represent the environmental and agronomic factors known to influence disease dynamics.

Data preprocessing

Data preprocessing involved several steps to ensure consistency and analytical validity. First, units were standardised, and gaps in meteorological records were filled using kriging at the county level. Continuous predictor variables were then normalised to zero mean and unit variance. Categorical variables, such as coffee variety and fungicide use, were encoded using one-hot and binary encodings. To prevent temporal leakage, lagged predictors were constructed based on values from preceding observation periods. Exploratory analyses identified spatial clustering in the data, which informed the use of spatiotemporal cross-validation during model evaluation to account for spatial dependence. Multicollinearity was assessed using variance inflation factors (VIFs), and predictors with VIFs greater than 10 were considered to exhibit substantial multicollinearity.

Class distribution and balancing using SMOTE

The dataset contained 9,850 plot-time observations of rust incidence, with 6,295 observations classified as infected and 3,555 as noninfected. This imbalance posed a risk of biasing supervised learning models toward the majority class and reducing their ability to detect positive cases.

Methodological considerations for SMOTE application

The Synthetic Minority Oversampling Technique (SMOTE) generates synthetic samples of the minority class by interpolating between existing minority observations and their nearest neighbours in the feature space [13,14]. Formally, given a minority sample x_i and one of its nearest neighbours x_nn, SMOTE generates a new synthetic sample x_new as:

(1)

where is a random scalar drawn from a uniform distribution. This method increases minority class representation while preserving the structure of the feature space and avoiding simple duplication.

We acknowledge that applying SMOTE to spatiotemporal epidemiological data requires careful consideration, as synthetic samples may distort underlying spatial and temporal dependencies. To address this concern, we implemented several safeguards:

SMOTE was applied only after the training-test split, ensuring that no synthetic data influenced test set evaluation. All performance metrics were computed exclusively on the original, unmodified test sets to prevent inflated predictive performance estimates.
SMOTE was applied within each cross-validation fold during model development. For each fold of the five-fold cross-validation, SMOTE was applied only to the training subset, while the validation fold remained unchanged. This approach prevented data leakage and ensured unbiased performance estimation.
SMOTE was used exclusively for supervised machine learning models (logistic regression, random forest, gradient boosting methods, support vector machine, artificial neural networks, and naive Bayes). The Bayesian hierarchical model was trained on the original, imbalanced data without SMOTE augmentation to preserve the probabilistic structure of the data-generating process.
Temporal and spatial structure was respected by applying SMOTE only within training subsets defined by the spatiotemporal cross-validation framework, ensuring that synthetic samples were generated from observations within the same fold.

After applying SMOTE with these safeguards, the training dataset contained an equal number of positive and negative observations (4,925 each). This balanced distribution allowed supervised learning models to learn from data with improved class representation while maintaining the integrity of the spatiotemporal structure.

Supervised learning algorithms

Several supervised learning algorithms were applied to the dataset. These include logistic regression, naïve Bayes classifier, random forests, gradient boosting methods (XGBoost, LightGBM, CatBoost), support vector machines (SVMs), and artificial neural networks (ANNs). For logistic regression, the model took the form:

(2)

where Y_it is the incidence for farm i at time t, and X_ijt are the predictors. The naïve Bayes classifier was specified as:

(3)

assuming conditional independence among predictors. Random forest models were constructed as ensembles of B classification trees:

(4)

Gradient boosting methods iteratively improved performance by fitting weak learners to residuals:

(5)

where is the learning rate. For support vector machines, the optimization problem was defined as:

(6)

subject to . The artificial neural network employed a single hidden layer with rectified linear unit (ReLU) activation:

Data split and model training

The dataset was split into training and test sets at an 80:20 ratio. All model development, including feature engineering and resampling, was performed exclusively on the training set to prevent information leakage. Model performance was evaluated on the held-out test set.

Within the training set, a spatiotemporal cross-validation strategy was implemented for hyperparameter tuning and internal validation across all supervised learning algorithms. Spatial folds were defined using county boundaries, resulting in six spatial blocks corresponding to the six sampled counties. Temporal variation was incorporated by dividing each county block into five temporal folds, ensuring that the training and validation sets differed in both space and time. This approach reduced the risk of over-optimistic error estimates arising from spatial and temporal data dependencies.

Hyperparameter optimization was performed separately for each supervised learning algorithm within the cross-validation framework. For logistic regression, the regularization strength was tuned. For random forest, the number of trees, maximum depth, and minimum sample splits were optimized. For gradient boosting models (XGBoost, LightGBM, and CatBoost), learning rate, number of boosting iterations, maximum depth, and subsampling parameters were tuned. For artificial neural networks, the number of hidden units, activation functions, and learning rate were selected through cross-validation. For support vector machines, the cost parameter and kernel-specific parameters were optimized, while the smoothing parameter was tuned for naive Bayes, where applicable. The optimal hyperparameters were selected based on cross-validated performance on the training data.

Bayesian hierarchical logistic model for incidence

To model the probability of coffee rust incidence while accounting for heterogeneity across counties, we specified a Bayesian hierarchical logistic regression. The observation model was given as

where y_i indicates whether rust incidence was observed on plot i and p_i is the probability of incidence.

The linear predictor included both fixed environmental and management effects as well as county-level random effects:

(7)

Here, x_i includes relative humidity, temperature, precipitation, leaf wetness, elevation, NDVI, shade percentage, plant age, fungicide use, fungicide frequency, coffee variety, past outbreak history, lagged incidence, and distance to the nearest infected farm. The county-level random effect was modeled as

(8)

capturing unobserved heterogeneity across counties.

We placed weakly informative priors on all model parameters:

Posterior inference was carried out using Hamiltonian Monte Carlo with four chains, 2000 iterations per chain, and a 1000 iteration warmup. Convergence diagnostics () were close to 1 for all parameters, and effective sample sizes were sufficiently large to ensure stable estimates. The posterior summaries of fixed and random effects are reported together with their 95% highest density intervals (HDI).

To evaluate the adequacy of the model, posterior predictive checks were performed by generating replicated datasets from the fitted model and comparing the simulated and observed incidence distributions. The replicated data closely matched the observed patterns, indicating that the model was well calibrated. Model fit and out of sample predictive performance were further assessed using the Widely Applicable Information Criterion (WAIC) and Leave One Out Cross Validation.

Model evaluation and scalability analysis

Model performance was assessed using several complementary criteria. Discriminative ability was evaluated using the AUC-ROC and AUC-PR. Calibration was measured using the calibration analysis, and overall accuracy was computed as the proportion of correctly classified observations. The spatiotemporal block cross-validation scheme ensured that training and test folds were independent in both space and time. Model outputs were also compared with the threshold-based forecasting rules currently used in rust management to provide a relative benchmark.

Computational scalability was examined to determine the practicality of applying the models in resource-limited agricultural extension settings. The scalability metrics included model size at the end of training, time per training iteration, inference time for batches of 1000 predictions, and peak memory use during training. All machine learning models were implemented in Python using the scikit learn library, and scalability measurements were obtained within a standardized computational environment.

Results

Descriptive statistics

Table 1 summarizes the key agroecological and management predictors used to model coffee leaf rust incidence on Kenyan smallholder farms. The analysis included 9,850 plot-time observations spanning diverse environmental and agronomic conditions. Average daily relative humidity was 74.86% (SD = 8.07), and mean daily temperature was 19.26°C (SD = 2.40), reflecting the moderate thermal conditions typical of Arabica coffee growing areas in the Kenyan highlands. Mean precipitation was 4.85 mm per day, and average leaf wetness duration was 11.57 hours per day, both indicating microclimatic conditions favorable for Hemileia vastatrix sporulation and infection.

Download:

Table 1. Descriptive statistics of agroecological and management variables used for modeling coffee rust incidence and severity.

https://doi.org/10.1371/journal.pclm.0000754.t001

Mean elevation was 1,651 m above sea level (SD = 289.76), consistent with the upper midland zones where coffee is cultivated in Kenya. Average plant age was approximately eight years (M = 7.99, SD = 5.60), indicating that most coffee stands were in their prime productive period. Mean shade coverage was 35% (SD = 17.56), suggesting moderate canopy complexity that can influence disease microclimate by modifying humidity and light interception. Fungicides were applied in 45.2% of observations, with a mean application frequency of 1.13 times per season (SD = 1.48). A history of past rust outbreaks was reported for 17.8% of farms, and mean lagged weekly incidence was 0.19, indicating moderate temporal continuity in disease occurrence. Mean Normalized Difference Vegetation Index (NDVI) was 0.59 (SD = 0.08), reflecting generally healthy canopy conditions. Average distance to the nearest infected farm was 811 m (SD = 813.42), revealing substantial spatial heterogeneity in rust pressure. Mean incidence across the dataset was 0.36 (SD = 0.48), and mean rust severity was 19.98% (SD = 28.64), indicating considerable variation in infection intensity across farms and seasons.

These descriptive patterns underscore the complex, multifactorial nature of coffee rust epidemiology in smallholder systems. The observed variation in microclimate and management practices suggests that effective prediction models must account for both environmental gradients and farmer decision making regarding fungicide use, shade management, and spatial distribution.

Coffee rust severity distribution analysis

S3 Fig shows the distribution of coffee rust severity, measured as the percentage of leaf area infected by Hemileia vastatrix, across 9,850 observations from smallholder farms. The severity distribution exhibited strong right skew, with most plots concentrated at the lower end of the scale (0–20%). This pattern aligns with the class imbalance in incidence data, where 63.9% of plots were classified as rust-free. Mean severity was 19.98% (SD = 28.64), indicating substantial variation in disease expression across farms. Although low severity cases predominated, a notable proportion of moderate to severe infections (20–100%) was observed, demonstrating persistent high disease pressure in some areas. The extended upper tail of the distribution, reaching 100% severity, represents instances of complete leaf area loss and highlights the potential for catastrophic yield reductions in uncontrolled or severely affected situations.

The observed severity distribution is consistent with known coffee rust epidemiology. Environmental factors including high humidity, prolonged leaf wetness, and frequent precipitation create favorable conditions for spore germination and infection, while management factors such as fungicide application, shade control, and plant age also influence disease progression. Spatial proximity to previously infected farms further amplifies infection intensity, underscoring the importance of local dispersal dynamics. The strong skew and concentration of low severity in most farms imply that a subset of highly affected farms contributes disproportionately to the overall disease burden. These findings support spatially targeted interventions, early detection, and risk based fungicide scheduling to prevent outbreak development and reduce economic losses.

Exploratory analysis of key predictors by disease incidence

S4 Fig presents boxplots comparing the distribution of key environmental and spatial predictors between plots with (1) and without (0) recorded coffee rust infection. The variables examined include daily relative humidity, leaf wetness duration, precipitation, NDVI, shade percentage, and distance to the nearest infected farm. These represent the microclimatic and canopy structural factors hypothesized to influence disease development.

Clear differences emerged between infected and non-infected groups for most predictors, indicating strong microclimatic regulation of coffee rust epidemics. Relative humidity and leaf wetness duration were substantially higher in infected plots, suggesting that prolonged leaf moisture is a dominant factor in spore germination and infection. Infected plots exhibited median relative humidity above 75%, compared to approximately 72% in non-infected plots, and median leaf wetness duration exceeded 12 hours per day in infected plots versus lower durations in healthy plots—conditions known to favor Hemileia vastatrix sporulation.

Precipitation patterns followed similar trends, with higher rainfall observed in infected plots. This confirms that moisture and wet canopy conditions promote rust development, and that regular rainfall episodes sustain environments conducive to fungal growth through elevated humidity. NDVI values were slightly higher in infected plots, indicating denser or more vigorous canopies that generate humid microenvironments supporting pathogen survival.

Shade percentage showed a modest positive association with infection, suggesting that increased canopy cover may reduce evaporative drying and prolong leaf wetness. Conversely, distance to the nearest infected farm was substantially shorter for infected plots, highlighting spatial contagion and probable pathogen transmission through windborne urediniospores or rain splash.

These boxplots indicate that higher relative humidity, longer leaf wetness duration, and shorter distance to infected farms are the strongest discriminators between healthy and diseased plots. The results align with previous epidemiological studies and support the inclusion of these predictors in subsequent regression and machine learning models.

Class distribution and SMOTE balancing

The dataset exhibited substantial class imbalance prior to applying the Synthetic Minority Oversampling Technique (SMOTE). Non-infected plots numbered 6,295 (63.9%), while infected plots numbered 3,555 (36.1%), as shown in Fig 1. This imbalance risked biasing supervised learning models toward the majority class, potentially underestimating disease risk for minority cases.

Download:

Fig 1. Class distribution before SMOTE application.

https://doi.org/10.1371/journal.pclm.0000754.g001

SMOTE generates synthetic samples of the minority class by interpolating between existing minority observations and their nearest neighbors in feature space. This approach increases minority class representation while preserving the intrinsic structure of the data, avoiding simple duplication. After applying SMOTE to the training data only, the class distribution was balanced with 5,036 instances per class (Fig 2), based on an 80:20 train-test split where the original training set contained approximately 5,036 majority class observations (80% of 6,295). This balanced distribution enhanced the classifiers’ ability to detect positive rust incidence, particularly for infrequent or emerging outbreaks, as reflected in subsequent improvements in recall and F1 scores across algorithms. All performance metrics were computed on the original, unmodified test data to prevent inflated estimates.

Download:

Fig 2. Class distribution after SMOTE application to training data (5,036 instances per class).

https://doi.org/10.1371/journal.pclm.0000754.g002

Logistic regression analysis

Binary logistic regression with robust standard errors examined the influence of microclimatic, agronomic, and spatial factors on coffee rust probability. Table 2 presents coefficient estimates, confidence intervals, and odds ratios.

Download:

Table 2. Binary logistic regression predicting coffee rust incidence from microclimatic, agronomic, and spatial predictors.

https://doi.org/10.1371/journal.pclm.0000754.t002

Moisture related microclimatic variables strongly increased infection risk. Higher daily relative humidity (, p < .001, OR = 2.70), greater precipitation (, p < .001, OR = 1.59), and longer leaf wetness duration (, p < .001, OR = 3.03) were all positively associated with disease probability. These findings confirm that prolonged leaf wetness and high humidity create favorable conditions for pathogen germination and infection.

Plant characteristics also influenced disease risk. Older plants (, p < .001, OR = 1.32) and denser shade cover (, p < .001, OR = 1.25) were linked to higher rust incidence, likely because they sustain microclimates that favor pathogen survival. Vegetation vigor, measured by NDVI, showed a smaller but significant positive effect (, p < .001, OR = 1.17).

Higher temperatures (, p < .001, OR = 0.77) and greater distance to infected farms (, p < .001, OR = 0.53) reduced disease likelihood, suggesting that warmer conditions and spatial isolation limit pathogen spread. Fungicide use significantly lowered disease odds (, p < .001, OR = 0.65), confirming its effectiveness when applied appropriately. Application frequency showed only a marginal effect (, p = .060, OR = 0.91), indicating limited benefits from additional applications beyond optimal levels.

Cultivar differences were evident. Ruiru 11 exhibited lower infection probability compared to traditional cultivars (, p < .001, OR = 0.59), consistent with its known genetic resistance. SL28 (, p = .035, OR = 1.24) and SL34 (, p = .064, OR = 1.23) showed higher susceptibility, reflecting cultivar specific vulnerability. Past outbreak history (, p < .001, OR = 1.69) and lagged incidence (, p < .001, OR = 2.97) strongly predicted current infection, emphasizing temporal persistence and localized carryover effects.

Variance inflation factors (Table 3) were all below 5, indicating no substantial multicollinearity. The highest association was between fungicide use and application frequency, but this did not compromise model stability.

Download:

Table 3. Variance Inflation Factors (VIF) for predictors in the logistic regression model.

https://doi.org/10.1371/journal.pclm.0000754.t003

Bayesian hierarchical logistic regression

We fitted a Bayesian hierarchical logistic regression model to estimate the effects of environmental, agronomic, and spatial factors on coffee rust incidence while accounting for county level heterogeneity. The model included county level random intercepts to allow partial pooling of information across the six sampled counties. Table 4 presents posterior summaries for all parameters.

Download:

Table 4. Posterior summaries from Bayesian hierarchical logistic regression predicting coffee rust incidence.

https://doi.org/10.1371/journal.pclm.0000754.t004

The Bayesian model incorporated the same core predictors as the logistic regression, with the addition of several composite variables derived from field measurements:

Management Intensity: A composite score reflecting overall management activity, combining fungicide application frequency, pruning practices, and weeding frequency (standardized composite)
Farm Density: Number of coffee farms within a 1 km radius, derived from spatial coordinates
Canopy Structure: Derived from LiDAR measurements of canopy height variability and vertical complexity
Soil Moisture Retention: Estimated from soil texture analysis and organic matter content, categorized into low, medium, and high retention classes

These additional variables were included in the Bayesian model to better capture farm level heterogeneity and spatial processes that the hierarchical structure can accommodate, while the supervised learning models focused on the core predictor set for comparability across algorithms.

All parameters showed excellent convergence ( for all parameters). The estimated county level variance was small (, 94% HDI [0.000, 0.125]), indicating that most spatial variation in coffee rust incidence was captured by the fixed effects. Posterior distributions were narrow, reflecting high confidence in key predictors.

Microclimatic moisture strongly increased infection probability, consistent with the logistic regression results. Higher daily relative humidity (, 94% HDI [0.932, 1.077]), greater precipitation (, 94% HDI [0.414, 0.532]), and longer leaf wetness duration (, 94% HDI [1.038, 1.188]) were associated with elevated rust risk. Plant age (, 94% HDI [0.213, 0.329]), shade (, 94% HDI [0.149, 0.268]), and NDVI (, 94% HDI [0.092, 0.209]) also showed positive effects.

Higher daily temperatures (, 94% HDI [-0.337, -0.152]) and greater distance to infected farms (, 94% HDI [-0.716, -0.574]) reduced infection likelihood. The composite variables all showed positive associations: management intensity (, 94% HDI [0.051, 0.460]), farm density (, 94% HDI [0.021, 0.476]), canopy structure (, 94% HDI [0.345, 0.647]), and soil moisture retention (, 94% HDI [0.928, 1.236]).

Note on coefficient interpretation

We observed that the coefficient for Past Outbreak History differed in sign between the logistic regression (, positive) and the Bayesian model (, negative). This apparent contradiction arises from differences in model specification and variable coding. In the logistic regression, Past Outbreak History was entered as a simple binary indicator (0/1). In the Bayesian model, this variable was interacted with management intensity and soil moisture retention to better capture conditional effects. The negative main effect in the Bayesian model represents the baseline effect when interacting variables are at their mean values, while the positive effect in the logistic regression represents the unconditional marginal effect. This highlights the importance of interpreting coefficients within their specific model contexts rather than as direct equivalents.

Posterior diagnostics and convergence assessment

Posterior inference used Hamiltonian Monte Carlo with four chains, each running 2,000 iterations including a 1,000 iteration warmup. Convergence was assessed through trace plots, posterior predictive checks, and information criteria.

S5 Fig shows posterior density and trace plots for the intercept and county level variance component. Trace plots indicate well mixed chains with no trends or autocorrelation, suggesting stable exploration of the posterior distribution. Density plots are continuous and unimodal, confirming convergence and absence of sampling anomalies. The posterior mean for is approximately -1.0, representing baseline log odds of infection when predictors are at mean values. The narrow posterior for county level variance indicates minimal residual heterogeneity after accounting for environmental and management effects.

S6 Fig shows pairwise correlation between and county level standard deviation. No strong correlations are evident, indicating that intercept estimates remain stable across different levels of county variance, supporting the hierarchical model specification.

Posterior predictive checks (S7 Fig) compared observed incidence distributions with replicated datasets generated from the fitted model. The close match between observed and predicted distributions indicates that the model adequately represents the underlying data generating process and is well calibrated for probabilistic prediction.

Model fit and out of sample predictive performance were assessed using the Widely Applicable Information Criterion (WAIC) and Leave One Out Cross Validation (LOO-CV). Table 5 summarizes these results. Both criteria produced low estimates of deviance with small standard errors, indicating strong generalizability and minimal overfitting. All Pareto-k diagnostics were below 0.7, confirming reliable importance sampling. Effective sample sizes exceeded 1,000 for all parameters, and values were 1.00, indicating adequate mixing and independence of posterior draws.

Download:

Table 5. Model fit statistics from Leave-One-Out Cross-Validation (LOO-CV) and Widely Applicable Information Criterion (WAIC) for the Bayesian hierarchical logistic regression model.

https://doi.org/10.1371/journal.pclm.0000754.t005

Note on evaluation consistency

The Bayesian model was evaluated using LOO-CV and WAIC, which do not explicitly account for the spatial structure inherent in the data. This differs from the spatiotemporal block cross-validation used for the supervised learning models. While the county level random effects in the Bayesian model capture some spatial heterogeneity, the evaluation metrics themselves do not enforce spatial separation between training and test observations. This methodological inconsistency should be considered when comparing predictive performance across approaches.

Performance evaluation of machine learning algorithms

Table 6 presents predictive performance for eight supervised learning algorithms. Metrics include accuracy, sensitivity (recall), specificity, precision, F1 score, and Matthews correlation coefficient (MCC), providing a comprehensive assessment of discriminative ability and generalization.

Download:

Table 6. Performance metrics for supervised learning models predicting coffee rust incidence.

https://doi.org/10.1371/journal.pclm.0000754.t006

Random Forest achieved the highest overall accuracy (0.7843) and specificity (0.8173), correctly identifying non-infected plots more reliably than other models. However, its sensitivity (0.7257) was lower than logistic regression, indicating some underdetection of infected cases. Logistic Regression achieved the highest sensitivity (0.7904) and the best balance across metrics (F1 = 0.7210, MCC = 0.5464), while maintaining interpretability and computational efficiency—key advantages for applied agricultural settings.

Support Vector Machine (SVM) and CatBoost also demonstrated strong performance, with F1 scores exceeding 0.70 and MCC values above 0.52, indicating good classification of both positive and negative instances. LightGBM and XGBoost showed slightly lower sensitivity, suggesting higher classification thresholds. Naive Bayes produced moderate results (accuracy = 0.7548, F1 = 0.6894), consistent with its independence assumptions. Artificial Neural Network (ANN) showed the lowest accuracy (0.7452), which may reflect the relatively modest dataset size and the single hidden layer architecture employed.

Overall, ensemble and linear models outperformed deep or probabilistic classifiers in this context. Logistic Regression, Random Forest, SVM, and CatBoost offered the strongest tradeoffs among accuracy, sensitivity, and specificity, making them suitable candidates for operational coffee rust risk prediction in smallholder systems.

Note on cross-validation implementation

Although the Methods section describes a spatiotemporal block cross-validation strategy with six spatial blocks (counties) and five temporal folds, the results in Table 6 are presented as aggregate performance metrics averaged across folds. This aggregation was performed to provide an overview of model performance, but we acknowledge that this masks potential variation across counties and time periods. Future work should report stratified performance metrics to better characterize spatial and temporal heterogeneity in model performance.

Fig 3 presents ROC curves for all eight supervised learning models. ROC curves plot true positive rate against false positive rate across classification thresholds, with area under the curve (AUC) providing a summary measure of discriminative ability.

Download:

Fig 3. Receiver Operating Characteristic (ROC) curves for all machine learning models predicting coffee rust incidence.

Higher AUC values indicate stronger discrimination between infected and non-infected plots.

https://doi.org/10.1371/journal.pclm.0000754.g003

All models demonstrated AUC values above 0.80, indicating predictive power substantially better than random chance. Logistic Regression achieved the highest AUC (0.867), showing superior discrimination between infected and non-infected plots across varying agroecological conditions. This aligns with its strong sensitivity and F1 score from Table 6, confirming that logistic regression balances detection accuracy with interpretability.

Ensemble methods also performed well: SVM (AUC = 0.854), CatBoost (0.850), Random Forest (0.848), and LightGBM (0.845) all showed strong discrimination, reflecting their ability to capture nonlinear relationships among predictors. ANN exhibited the lowest AUC (0.808), suggesting limited generalization consistent with its lower accuracy in Table 6. Naive Bayes achieved an AUC of 0.828, reasonable given its simplifying independence assumptions.

These results demonstrate that all models provide useful discrimination for early warning and decision support systems. Logistic Regression offers the best combination of interpretability and predictive power, while ensemble learners provide competitive performance with greater computational demands.

Precision–recall (PR) curve analysis

Fig 4 shows Precision-Recall curves for all eight models. PR curves focus on the positive (infected) class, making them particularly informative for imbalanced data. Precision measures the proportion of predicted positives that are truly positive, while recall (sensitivity) measures the proportion of actual positives correctly identified.

Download:

Fig 4. Precision–Recall (PR) curves for all machine learning models predicting coffee rust incidence.

Higher AUC_PR values indicate better tradeoffs between precision and recall in detecting infected plots.

https://doi.org/10.1371/journal.pclm.0000754.g004

All models achieved AUC-PR values above 0.70, indicating adequate detection of true infections across probability thresholds. Logistic Regression again led with AUC-PR of 0.792, demonstrating excellent balance between detecting infected plots (recall) and maintaining prediction accuracy (precision). This strong performance aligns with its high AUC-ROC and underscores its operational utility.

Ensemble learners performed well: SVM (0.774), CatBoost (0.769), LightGBM (0.762), and Random Forest (0.759) showed consistent precision-recall tradeoffs across microclimatic conditions. XGBoost (0.748) and Naive Bayes (0.723) exhibited slightly lower precision at higher recall values, indicating greater false positive risk. ANN had the lowest AUC-PR (0.714), reflecting lower precision stability and potential overfitting.

These results confirm that Logistic Regression, SVM, and CatBoost provide stable detection of rare infections, making them suitable for early warning systems where minimizing false alerts is important.

Calibration analysis

Model calibration was assessed by comparing predicted probabilities with observed frequencies of coffee rust incidence. Fig 5 shows calibration curves for all eight models; the diagonal dashed line represents perfect calibration where predicted probabilities match empirical outcomes.

Download:

Fig 5. Calibration curves for supervised learning models.

The diagonal dashed line represents perfect calibration. Brier scores: Logistic Regression (0.182), CatBoost (0.189), Random Forest (0.194), LightGBM (0.195), XGBoost (0.197), SVM (0.199), Naive Bayes (0.208), ANN (0.215).

https://doi.org/10.1371/journal.pclm.0000754.g005

Logistic Regression and CatBoost demonstrated the best calibration, with curves closely following the diagonal across the probability range. This indicates well calibrated probabilistic outputs suitable for decision support applications where accurate risk probabilities are essential. Their Brier scores (0.182 and 0.189, respectively) confirmed strong calibration.

Random Forest showed slight overestimation in the mid to high probability range (0.5-0.9), indicating overconfident predictions of infection presence (Brier score = 0.194). XGBoost and LightGBM exhibited modest underestimation in lower probability bins but approached the diagonal at higher bins, with Brier scores of 0.197 and 0.195 respectively. SVM and Naive Bayes showed reasonable but less consistent calibration in the middle range, likely due to class overlap and kernel boundary smoothing (Brier scores = 0.199 and 0.208). ANN displayed the most variable calibration, underestimating low probabilities and overestimating high probabilities (Brier score = 0.215), suggesting that post hoc calibration methods such as Platt scaling or isotonic regression would improve its probabilistic outputs.

Overall, inherently probabilistic models like Logistic Regression and CatBoost showed superior calibration, enhancing their suitability for disease forecasting and early warning applications in smallholder coffee systems.

Model scalability and deployment feasibility

Table 7 presents computational scalability metrics for all eight supervised learning algorithms, including training time, model size, inference latency, and peak RAM usage. These metrics are critical for assessing deployment feasibility in resource constrained smallholder agricultural settings.

Download:

Table 7. Computational scalability metrics.

https://doi.org/10.1371/journal.pclm.0000754.t007

Traditional linear models were most computationally efficient. Logistic Regression required minimal training time (0.0135 s), model size (0.0010 MB), and inference latency (0.2204 ms per 1,000 predictions), with moderate RAM usage (565.97 MB). Naive Bayes was similarly efficient, with the shortest training time (0.0064 s), comparable model size (0.0014 MB), and fast inference (0.2823 ms per 1,000 predictions). These characteristics make both models suitable for on device or edge computing applications.

Among ensemble methods, XGBoost and CatBoost offered favorable accuracy-scalability tradeoffs. Both achieved subsecond inference times and model sizes below 0.35 MB, demonstrating that optimized gradient boosting implementations can deliver strong predictive performance with manageable computational demands. LightGBM also showed practical training times (0.2682 s) and moderate inference latency (1.9435 ms per 1,000 predictions), with reasonable memory footprint (580.63 MB).

Random Forest and SVM were most resource intensive. Random Forest had the largest model size (18.92 MB) and slowest inference (7.7146 ms per 1,000 predictions). SVM required the longest training time (18.1222 s), limiting its practicality for real time or embedded applications. ANN showed intermediate scalability: slower training (12.4286 s) but reasonable inference latency (0.3396 ms per 1,000 predictions) and modest model size (0.1189 MB), suggesting that optimized lightweight architectures could be viable.

Note on Bayesian model scalability

The Bayesian hierarchical model, which required Hamiltonian Monte Carlo with four chains and 2,000 iterations, was excluded from this scalability analysis. Its computational demands are substantially higher than any supervised learning model evaluated, with training times on the order of hours rather than seconds. While this limits its use for real time prediction, its primary value lies in uncertainty quantification and inference rather than operational deployment.

Partial dependence analysis

Partial dependence plots (PDPs) show the marginal effects of key environmental predictors on coffee rust probability, holding all other variables constant. Figs 6–8 display PDPs for NDVI, daily relative humidity, and leaf wetness duration. All predictors were standardized (mean = 0, SD = 1) to enable direct comparison of relative effects.

Download:

Fig 6. Partial dependence of coffee rust incidence probability on NDVI.

https://doi.org/10.1371/journal.pclm.0000754.g006

Download:

Fig 7. Partial dependence of coffee rust incidence probability on daily relative humidity.

https://doi.org/10.1371/journal.pclm.0000754.g007

Download:

Fig 8. Partial dependence of coffee rust incidence probability on leaf wetness duration.

https://doi.org/10.1371/journal.pclm.0000754.g008

NDVI showed a weak but consistent positive relationship with infection probability (Fig 6). Probability increased from approximately 0.32 to 0.38 across the NDVI range, representing a relative increase of about 19% (Table 8). This modest gradient suggests that denser canopies with higher NDVI create more humid microclimates that extend leaf wetness and facilitate infection. However, the shallow slope indicates that vegetation density is secondary to direct moisture variables.

Download:

Table 8. Comparative impact of environmental predictors on coffee rust incidence probability. Relative increases refer to changes in predicted probability relative to the model baseline, not absolute percentage probabilities.

https://doi.org/10.1371/journal.pclm.0000754.t008

Daily relative humidity demonstrated a strong exponential relationship with infection probability (Fig 7). The PDP shows a relative increase of approximately 267% across the standardized humidity gradient (0.15 to 0.55), meaning infection likelihood more than doubles as humidity increases from low to high levels relative to baseline. The sharp increase beyond a humidity standard of 0.5 indicates a clear threshold effect, consistent with the biological requirement for high atmospheric moisture to stimulate spore germination.

Leaf wetness duration was the most influential predictor (Fig 8). Infection probability increased nearly linearly across the wetness range from 0.10 to 0.65, corresponding to a relative increase of approximately 550% over baseline. This near linear relationship underscores the central role of surface moisture in urediniospore germination and leaf penetration, identifying prolonged wetness as the primary driver of epidemic formation.

These patterns, summarized in Table 8, indicate that microclimatic monitoring should prioritize leaf wetness duration and relative humidity thresholds. Early warning systems can be improved by tracking humidity and wetness levels above which infection risk increases sharply. While NDVI contributes less directly, canopy management remains important for indirectly moderating microclimate and reducing conditions favorable for rust development.

SHAP analysis

SHAP (SHapley Additive exPlanations) analysis quantified each predictor’s contribution to model predictions, enabling global and local interpretability. Fig 9 presents the SHAP summary plot, ranking predictors by importance and showing the direction of their effects.

Download:

Fig 9. SHAP summary plot showing the ranked importance and directional effect of predictors on coffee rust incidence.

https://doi.org/10.1371/journal.pclm.0000754.g009

Moisture related variables dominated feature importance. Leaf wetness duration and daily relative humidity showed the largest positive effects, confirming their central role in the moisture dependent infection cycle of Hemileia vastatrix. Infection probability increased markedly when relative humidity exceeded approximately 75% and when leaf wetness persisted beyond ten hours.

Spatial and temporal variables followed in importance. Distance to infected farm (negative direction) confirmed that closer proximity increases inoculum dispersal through wind and rain splash. Precipitation (positive) sustains canopy moisture and aids spore transmission. Lagged incidence (positive) captures pathogen persistence and local inoculum buildup over time.

Management variables showed moderate influence. Fungicide frequency (negative) indicates effective pathogen suppression with repeated applications. Shade percentage (positive) suggests that denser shade prolongs leaf wetness. Plant age (positive) reflects inoculum accumulation and reduced physiological resistance in older trees.

Temperature showed mixed effects, consistent with nonlinear relationships reflecting optimal ranges for pathogen activity. NDVI contributed modestly through microclimatic modulation. Cultivar type and elevation ranked lowest in importance, suggesting that environmental and management heterogeneity exert greater influence than fixed plant characteristics.

Table 9 summarizes directional effects with biological interpretations. The consistency between SHAP results, partial dependence analysis, and Bayesian posterior estimates supports the ecological validity of the comparative modeling approach. SHAP’s ability to reveal nonlinear and interaction effects enhances interpretability beyond traditional regression, particularly for identifying critical humidity and wetness thresholds relevant for early warning systems.

Download:

Table 9. Summary of SHAP value interpretations for key predictors of coffee rust incidence.

https://doi.org/10.1371/journal.pclm.0000754.t009

Discussion

Environmental and spatial drivers

This study provides an evidence-driven assessment of environmental and spatial predictors of coffee leaf rust in Kenyan smallholder systems. Rust dynamics depended primarily on microclimatic moisture, particularly relative humidity and leaf wetness duration, as well as spatial proximity to infection sources. These findings align with established coffee rust epidemiology and extend the evidence base to East African production systems where longitudinal data have been limited.

The dominance of microclimatic moisture agrees with [2], who showed that urediniospore germination requires surface wetness. Continuous wetness and precipitation further regulate infection [4,10]. The nonlinear thresholds observed, where risk rises sharply above 75% relative humidity and ten hours of leaf wetness, are consistent with results from Latin American and Asian systems. Spatial and temporal processes also contributed. The negative association between disease occurrence and distance to infected farms confirms clustered transmission patterns [5,7], while lagged effects align with temporal autocorrelation [8].

Management and agronomic implications

Farm management practices significantly influenced disease outcomes. supports the protective effect of timely fungicide application [3]. Shade and canopy vigor showed mixed effects. Dense canopies promote humidity and infection persistence, while moderate shading improves microclimatic balance [11,12]. Older coffee trees exhibited higher susceptibility, consistent with progressive inoculum accumulation [1].

Methodological contributions

This study contributes to coffee rust prediction by combining Bayesian hierarchical inference and supervised learning. Previous research emphasized detection accuracy without uncertainty quantification [5–7]. Our approach explicitly communicates uncertainty while maintaining predictive performance. The convergence of logistic regression, ensemble learners, and Bayesian posterior estimates supports internal consistency.

A key insight is that interpretable models performed competitively with complex algorithms. Logistic regression achieved the highest AUC-ROC (0.867) while remaining transparent and computationally efficient, an important finding for resource-limited settings where model interpretability facilitates adoption.

Limitations and future work

Several limitations warrant acknowledgment. First, generalizability is constrained by the absence of external validation. The model was developed and tested using data from six Kenyan counties without independent validation in other regions or countries. Second, although spatiotemporal cross-validation was implemented, the results in Table 6 reflect aggregate performance rather than variation across counties or time periods. Future work should report stratified performance metrics.

Third, the use of SMOTE for class balancing introduces potential methodological concerns. Although SMOTE was applied with safeguards, including post-split application, use within folds, and preservation of temporal ordering, it remains imperfect for spatiotemporal epidemiological data. Synthetic samples may violate spatial and temporal dependencies, and SMOTE was applied only to supervised models, not the Bayesian analysis. Alternative approaches, such as cost-sensitive learning or weighted likelihoods, should be explored.

Fourth, residual spatial dependence may persist despite hierarchical structuring. Spatial autoregressive models or mechanistic dispersal kernels could better characterize cross-farm spore movement. Fifth, microclimatic variables were measured at daily scales, overlooking sub-daily fluctuations critical for infection initiation. Higher resolution measurements, such as hourly humidity and continuous leaf wetness, would improve precision.

Sixth, potential observation biases, including variability in farmer reporting, fungicide application recall, and visual rust assessment, were not formally quantified. Seventh, probabilistic calibration, while assessed using Brier scores, could be further improved through post hoc calibration methods such as Platt scaling or isotonic regression.

Finally, socioeconomic and behavioral variables, such as extension access, risk perception, and input affordability, were unavailable but likely influence control adoption. Future work should address these gaps through expanded data collection, external validation studies, and integration of mechanistic dispersal models.

Conclusion

Summary of key findings

This study demonstrates that the comparative and complementary use of Bayesian hierarchical inference and supervised machine learning provides an interpretable, uncertainty-aware framework for predicting coffee leaf rust in Kenyan smallholder systems. Microclimatic moisture, particularly leaf wetness duration and relative humidity, emerged as the dominant drivers of infection. Partial dependence and SHAP analyses confirmed the presence of nonlinear moisture thresholds beyond which infection risk increases sharply. Spatial proximity to infected farms and lagged incidence highlighted the contagious nature of the disease.

A key finding is that parsimonious, interpretable models performed competitively with complex algorithms. Logistic regression achieved the highest discriminative performance (AUC-ROC = 0.867) while maintaining transparency and computational efficiency, an important insight for resource-limited agricultural settings, where model interpretability facilitates adoption by extension services and farmers.

Methodological and practical implications

This study’s primary contribution is not an integrated hybrid model but rather a rigorous comparative evaluation of Bayesian and machine learning approaches, each offering distinct strengths. Bayesian inference quantified uncertainty and accounted for county-level heterogeneity; supervised learning algorithms provided strong predictive capability; and post hoc explanation tools (SHAP, partial dependence) enhanced interpretability.

The framework is practical for deployment: logistic regression and naive Bayes are sufficiently lightweight for mobile or edge-based applications, while ensemble methods (CatBoost, XGBoost) can support cloud-based regional forecasting. This scalability, combined with interpretability, positions the approach for real-world decision support.

Broader significance

While developed for Kenyan coffee systems, the comparative framework, emphasizing transparency, uncertainty quantification, and rigorous validation, offers a template for disease prediction in other data-constrained agricultural contexts. However, transferability to other regions or crops requires external validation, which remains a priority for future work. The study provides a foundation for climate-informed, data-driven interventions that can enhance resilience in smallholder farming systems.

Supporting information

S1 Fig. Coffee leaf with coffee leaf rust disease.

Visual example of Hemileia vastatrix infection showing characteristic orange-brown rust pustules on the leaf surface. Symptoms include circular to irregular lesions with powdery orange urediniospores on the underside of leaves.

https://doi.org/10.1371/journal.pclm.0000754.s001

(TIF)

S2 Fig. Coffee leaf without coffee leaf rust disease.

Healthy coffee leaf showing no visible signs of Hemileia vastatrix infection, with uniform green coloration and no sporulating lesions.

https://doi.org/10.1371/journal.pclm.0000754.s002

(TIF)

S3 Fig. Distribution of coffee rust severity (percentage of leaf area affected) among Kenyan smallholder farms.

The severity distribution exhibited strong right skew, with most plots concentrated at the lower end of the scale (0–20%). Mean severity was 19.98% (SD = 28.64), indicating substantial variation in disease expression across farms. The extended upper tail reaching 100% severity represents instances of complete leaf area loss.

https://doi.org/10.1371/journal.pclm.0000754.s003

(TIF)

S4 Fig. Boxplots showing the distribution of key microclimatic and spatial variables across plots with (1) and without (0) coffee rust incidence.

Relative humidity and leaf wetness duration were substantially higher in infected plots, while distance to the nearest infected farm was substantially shorter for infected plots, highlighting spatial contagion and probable pathogen transmission through windborne urediniospores or rain splash.

https://doi.org/10.1371/journal.pclm.0000754.s004

(TIF)

S5 Fig. Posterior density (left) and trace (right) plots for the intercept () and county level variance parameter ().

Trace plots indicate well mixed chains with no trends or autocorrelation, suggesting stable exploration of the posterior distribution. Density plots are continuous and unimodal, confirming convergence and absence of sampling anomalies.

https://doi.org/10.1371/journal.pclm.0000754.s005

(TIF)

S6 Fig. Posterior pairwise relationship between the intercept () and county level standard deviation ().

No strong correlations are evident, indicating that intercept estimates remain stable across different levels of county variance, supporting the hierarchical model specification.

https://doi.org/10.1371/journal.pclm.0000754.s006

(TIF)

S7 Fig. Posterior predictive check comparing observed coffee rust incidence distribution with posterior predictive distribution.

The close match between observed and predicted distributions indicates that the model adequately represents the underlying data generating process and is well calibrated for probabilistic prediction.

https://doi.org/10.1371/journal.pclm.0000754.s007

(TIF)

Acknowledgments

The authors are grateful to the Coffee Research Institute (CRI) of the Kenya Agricultural and Livestock Research Organization (KALRO), Nairobi, for providing access to complete field and microclimate data, which was utilized for this study. We especially acknowledge the efforts of the CRI data management and field surveillance teams, who have worked to gather, organize, and preserve high-quality longitudinal data in the participating counties.

References

1. Tadesse Y, Amare D, Kesho A. Coffee leaf rust disease and climate change. World J Agric Sci. 2021;17(1):418–29.
- View Article
- Google Scholar
2. Talhinhas P, Batista D, Diniz I, Vieira A, Silva DN, Loureiro A, et al. The coffee leaf rust pathogen Hemileia vastatrix: one and a half centuries around the tropics. Mol Plant Pathol. 2017;18(8):1039–51. pmid:27885775
3. Gichuru E, Alwora G, Gimase J, Kathurima C. Coffee Leaf Rust (Hemileia vastatrix) in Kenya—A Review. Agronomy. 2021;11(12):2590.
- View Article
- Google Scholar
4. Merle I, Tixier P, Virginio Filho E de M, Cilas C, Avelino J. Forecast models of coffee leaf rust symptoms and signs based on identified microclimatic combinations in coffee-based agroforestry systems in Costa Rica. Crop Protection. 2020;130:105046.
- View Article
- Google Scholar
5. Marin DB, Ferraz GA e S, Santana LS, Barbosa BDS, Barata RAP, Osco LP, et al. Detecting coffee leaf rust with UAV-based vegetation indices and decision tree machine learning models. Computers and Electronics in Agriculture. 2021;190:106476.
- View Article
- Google Scholar
6. Geronimo-Isidro F, Figueroa-Jimenez JJ, Gabayno-Laguatan NA, Lacuesta-Jalotjot TEJ, Cabangbang-Jaranilla JT, Macabago SAB, et al. Spatiotemporal modeling of Hemileia vastatrix using multiple machine learning algorithms: implications for disease surveillance of the coffee leaf rust disease in the Philippines. J Plant Pathol. 2025;107(4):2011–26.
- View Article
- Google Scholar
7. Velásquez D, Sánchez A, Sarmiento S, Toro M, Maiza M, Sierra B. A method for detecting coffee leaf rust through wireless sensor networks, remote sensing, and deep learning: Case study of the Caturra variety in Colombia. Applied Sciences. 2020;10(2):697.
- View Article
- Google Scholar
8. Araaf RT, Minn A, Ahamed T. Coffee Leaf Rust Disease Detection and Implementation of an Edge Device for Pruning Infected Leaves via Deep Learning Algorithms. Sensors (Basel). 2024;24(24):8018. pmid:39771754
9. Hitimana E, Kuradusenge M, Sinayobye OJ, Ufitinema C, Mukamugema J, Murangira T, et al. Revolutionizing Coffee Farming: A Mobile App with GPS-Enabled Reporting for Rapid and Accurate On-Site Detection of Coffee Leaf Diseases Using Integrated Deep Learning. Software. 2024;3(2):146–68.
- View Article
- Google Scholar
10. Pozza EA, Santos ÉR dos, Gaspar NA, Vilela XM de S, Alves M de C, Colares MRN. Coffee Rust Forecast Systems: Development of a Warning Platform in a Minas Gerais State, Brazil. Agronomy. 2021;11(11):2284.
- View Article
- Google Scholar
11. Chemura A, Mutanga O, Sibanda M, Chidoko P. Machine learning prediction of coffee rust severity on leaves using spectroradiometer data. Trop plant pathol. 2017;43(2):117–27.
- View Article
- Google Scholar
12. Ocaña-Zuñiga C, Quiñones-Huatangari L, Barboza E, Peña NC, Zamora SH, Ojeda JMP. Coffee Rust Severity Analysis in Agroforestry Systems Using Deep Learning in Peruvian Tropical Ecosystems. Agriculture. 2024;15(1):39.
- View Article
- Google Scholar
13. Wanyonyi M, Morris ZN, Musyoka FM, Kitavi DM. Enhanced machine learning and hybrid ensemble approaches for Coronary Heart Disease prediction. PLoS One. 2025;20(12):e0328338. pmid:41452880
14. Wanyonyi M, Morris ZN, Musyoka FM, Makaa D, Kitavi DM. Balancing interpretability and accuracy: A comparative evaluation of logistic regression and machine learning approaches for coronary heart disease risk prediction in low-resource settings. IAENG International Journal of Computer Science. 2026.
- View Article
- Google Scholar
15. Mutinda JK, Kyalo TM, Mukolwe JA, Munyao JN, Omondi MA, Nzomo WN, et al. Explainable AI for Breast Cancer Diagnosis: Comparative Analysis of ML Models Using Random Forest Feature Selection and SHAP Interpretability. Asian J Res Com Sci. 2025;18(10):30–46.
- View Article
- Google Scholar
16. Jepkoech J, Mugo DM, Kenduiywo BK, Too EC. Arabica coffee leaf images dataset for coffee leaf disease detection and classification. Data Brief. 2021;36:107142. pmid:34095388

[ref1] 1. Tadesse Y, Amare D, Kesho A. Coffee leaf rust disease and climate change. World J Agric Sci. 2021;17(1):418–29.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Talhinhas P, Batista D, Diniz I, Vieira A, Silva DN, Loureiro A, et al. The coffee leaf rust pathogen Hemileia vastatrix: one and a half centuries around the tropics. Mol Plant Pathol. 2017;18(8):1039–51. pmid:27885775
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Gichuru E, Alwora G, Gimase J, Kathurima C. Coffee Leaf Rust (Hemileia vastatrix) in Kenya—A Review. Agronomy. 2021;11(12):2590.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref4] 4. Merle I, Tixier P, Virginio Filho E de M, Cilas C, Avelino J. Forecast models of coffee leaf rust symptoms and signs based on identified microclimatic combinations in coffee-based agroforestry systems in Costa Rica. Crop Protection. 2020;130:105046.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref5] 5. Marin DB, Ferraz GA e S, Santana LS, Barbosa BDS, Barata RAP, Osco LP, et al. Detecting coffee leaf rust with UAV-based vegetation indices and decision tree machine learning models. Computers and Electronics in Agriculture. 2021;190:106476.
View Article
Google Scholar

[15] View Article

[16] Google Scholar

[ref6] 6. Geronimo-Isidro F, Figueroa-Jimenez JJ, Gabayno-Laguatan NA, Lacuesta-Jalotjot TEJ, Cabangbang-Jaranilla JT, Macabago SAB, et al. Spatiotemporal modeling of Hemileia vastatrix using multiple machine learning algorithms: implications for disease surveillance of the coffee leaf rust disease in the Philippines. J Plant Pathol. 2025;107(4):2011–26.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref7] 7. Velásquez D, Sánchez A, Sarmiento S, Toro M, Maiza M, Sierra B. A method for detecting coffee leaf rust through wireless sensor networks, remote sensing, and deep learning: Case study of the Caturra variety in Colombia. Applied Sciences. 2020;10(2):697.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref8] 8. Araaf RT, Minn A, Ahamed T. Coffee Leaf Rust Disease Detection and Implementation of an Edge Device for Pruning Infected Leaves via Deep Learning Algorithms. Sensors (Basel). 2024;24(24):8018. pmid:39771754
View Article
PubMed/NCBI
Google Scholar

[24] View Article

[25] PubMed/NCBI

[26] Google Scholar

[ref9] 9. Hitimana E, Kuradusenge M, Sinayobye OJ, Ufitinema C, Mukamugema J, Murangira T, et al. Revolutionizing Coffee Farming: A Mobile App with GPS-Enabled Reporting for Rapid and Accurate On-Site Detection of Coffee Leaf Diseases Using Integrated Deep Learning. Software. 2024;3(2):146–68.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref10] 10. Pozza EA, Santos ÉR dos, Gaspar NA, Vilela XM de S, Alves M de C, Colares MRN. Coffee Rust Forecast Systems: Development of a Warning Platform in a Minas Gerais State, Brazil. Agronomy. 2021;11(11):2284.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref11] 11. Chemura A, Mutanga O, Sibanda M, Chidoko P. Machine learning prediction of coffee rust severity on leaves using spectroradiometer data. Trop plant pathol. 2017;43(2):117–27.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref12] 12. Ocaña-Zuñiga C, Quiñones-Huatangari L, Barboza E, Peña NC, Zamora SH, Ojeda JMP. Coffee Rust Severity Analysis in Agroforestry Systems Using Deep Learning in Peruvian Tropical Ecosystems. Agriculture. 2024;15(1):39.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref13] 13. Wanyonyi M, Morris ZN, Musyoka FM, Kitavi DM. Enhanced machine learning and hybrid ensemble approaches for Coronary Heart Disease prediction. PLoS One. 2025;20(12):e0328338. pmid:41452880
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref14] 14. Wanyonyi M, Morris ZN, Musyoka FM, Makaa D, Kitavi DM. Balancing interpretability and accuracy: A comparative evaluation of logistic regression and machine learning approaches for coronary heart disease risk prediction in low-resource settings. IAENG International Journal of Computer Science. 2026.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref15] 15. Mutinda JK, Kyalo TM, Mukolwe JA, Munyao JN, Omondi MA, Nzomo WN, et al. Explainable AI for Breast Cancer Diagnosis: Comparative Analysis of ML Models Using Random Forest Feature Selection and SHAP Interpretability. Asian J Res Com Sci. 2025;18(10):30–46.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref16] 16. Jepkoech J, Mugo DM, Kenduiywo BK, Too EC. Arabica coffee leaf images dataset for coffee leaf disease detection and classification. Data Brief. 2021;36:107142. pmid:34095388
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

Figures

Abstract

Introduction

Materials and methods

Study area and data sources

Study variables

Data preprocessing

Class distribution and balancing using SMOTE

Methodological considerations for SMOTE application

Supervised learning algorithms

Data split and model training

Bayesian hierarchical logistic model for incidence

Model evaluation and scalability analysis

Results

Descriptive statistics

Coffee rust severity distribution analysis

Exploratory analysis of key predictors by disease incidence

Class distribution and SMOTE balancing

Logistic regression analysis

Bayesian hierarchical logistic regression

Note on coefficient interpretation

Posterior diagnostics and convergence assessment

Note on evaluation consistency

Performance evaluation of machine learning algorithms

Note on cross-validation implementation

Precision–recall (PR) curve analysis

Calibration analysis

Model scalability and deployment feasibility

Note on Bayesian model scalability

Partial dependence analysis

SHAP analysis

Discussion

Environmental and spatial drivers

Management and agronomic implications

Methodological contributions

Limitations and future work

Conclusion

Summary of key findings

Methodological and practical implications

Broader significance

Supporting information

S1 Fig. Coffee leaf with coffee leaf rust disease.

S2 Fig. Coffee leaf without coffee leaf rust disease.

S3 Fig. Distribution of coffee rust severity (percentage of leaf area affected) among Kenyan smallholder farms.

S4 Fig. Boxplots showing the distribution of key microclimatic and spatial variables across plots with (1) and without (0) coffee rust incidence.

S5 Fig. Posterior density (left) and trace (right) plots for the intercept () and county level variance parameter ().

S6 Fig. Posterior pairwise relationship between the intercept () and county level standard deviation ().

S7 Fig. Posterior predictive check comparing observed coffee rust incidence distribution with posterior predictive distribution.

Acknowledgments

References