Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Federated learning based reference evapotranspiration estimation for distributed crop fields

  • Muhammad Tausif,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Resources, Software, Writing – review & editing

    Affiliation Department of Computer Science & Information Technology, The Superior University, Lahore, Punjab, Pakistan

  • Muhammad Waseem Iqbal,

    Roles Conceptualization, Project administration, Supervision

    Affiliation Department of Computer Science & Information Technology, The Superior University, Lahore, Punjab, Pakistan

  • Rab Nawaz Bashir,

    Roles Supervision, Validation

    Affiliations Department of Computer Science, COMSATS University Vehari, Vehari, Punjab, Pakistan, Artificial Intelligence & Data Analytics (AIDA) Lab, College of Computer & Information Sciences (CCIS), Prince Sultan University, Riyadh, Saudi Arabia

  • Bayan AlGhofaily,

    Roles Software, Visualization

    Affiliation Artificial Intelligence & Data Analytics (AIDA) Lab, College of Computer & Information Sciences (CCIS), Prince Sultan University, Riyadh, Saudi Arabia

  • Alex Elyassih,

    Roles Funding acquisition, Investigation

    Affiliation Artificial Intelligence & Data Analytics (AIDA) Lab, College of Computer & Information Sciences (CCIS), Prince Sultan University, Riyadh, Saudi Arabia

  • Amjad Rehman Khan

    Roles Formal analysis, Funding acquisition

    arkhan@psu.edu.sa

    Affiliation Artificial Intelligence & Data Analytics (AIDA) Lab, College of Computer & Information Sciences (CCIS), Prince Sultan University, Riyadh, Saudi Arabia

Abstract

Water resource management and sustainable agriculture rely heavily on accurate Reference Evapotranspiration (ETo). Efforts have been made to simplify the (ETo) estimation using machine learning models. The existing approaches are limited to a single specific area. There is a need for ETo estimations of multiple locations with diverse weather conditions. The study intends to propose ETo estimation of multiple locations with distinct weather conditions using a federated learning approach. Traditional centralized approaches require aggregating all data in one place, which can be problematic due to privacy concerns and data transfer limitations. However, federated learning trains models locally and combines the knowledge, resulting in more generalized ETo estimates across different regions. The three geographical locations of Pakistan, each with diverse weather conditions, are selected to implement the proposed model using the weather data from 2012 to 2022 of the selected three locations. At each selected location, three machine learning models named Random Forest Regressor (RFR), Support Vector Regressor (SVR), and Decision Tree Regressor (DTR), are evaluated for local Evapotranspiration (ET) estimation and the federated global model. The feature importance-based analysis is also performed to assess the impacts of weather parameters on machine learning performance at each selected local location. The evaluation reveals that Random Forest Regressor (RFR) based federated learning outperformed other models with coefficient of determination (R2) = 0.97%, Root Mean Squared Error (RMSE) = 0.44, Mean Absolute Error (MAE) = 0.33 mm day−1, and Mean Absolute Percentage Error (MAPE) = 8.18%. The Random Forest Regressor (RFR) performance yields the local machine learning models against each selected site. The analysis results suggest that maximum temperature and wind speed are the most influential factors in Evapotranspiration (ET) predictions.

1 Introduction

Water resource management and planning considers Evapotranspiration (ET) as a key water cycle concept and has many application areas [1], i.e., water management, drought monitoring, and irrigation scheduling [2]. Although its computation is critical for water management, the interrelationship between weather parameters and ET makes such computation complex. Moreover, water management in multiple locations is more critical. Therefore, solutions for the ET estimation in multiple locations considering their weather parameters are required to help agriculturists.

In agriculture, farming and irrigation utilize about 70% of fresh water around the globe [3] and significantly impact sustainability [4]. Water management exploits Reference Evapotranspiration ETo to contribute to improving irrigation practices. Accuracy in ETo calculation considering the weather condition of a location is compulsory to implement the sustainability practices by irrigation water conservation.

ETo is a standard unit for calculating the ET rate among locations and weather conditions. It can be computed by estimating a crop’s transpired and evaporated water against specific weather conditions [5]. Different weather parameters, i.e., solar radiations, temperature, precipitation, vapor pressure deficit, and wind speed, may influence ETo. ETo can be estimated using a variety of methods, including Penman-Monteith (PM), Thornthwaite, Hargreaves [6], satellite-based, and machine-learning-based approaches [79]. Machine Learning based approaches able to capture more complex relationships between input and output variables [1012]. Based on weather data, these methods use machine and deep learning algorithms to estimate ETo. Each of these approaches has its advantages and disadvantages. Using historical data, efforts were made to establish empirical relationships between weather parameters and ETo.

Machine learning (ML) algorithms have made remarkable progress in ET estimation with limited weather parameters. ML algorithms, such as Support Vector Regression (SVR), Random Forest Regressor (RFR), and Artificial Neural Networks (ANN), are proven to be very effective for accurate estimation of ETo with limited weather parameters [13]. ML methods can analyze more complex relationships between the weather parameters for accurate ETo estimation. Tradition methods for estimating ETo, e.g., the Penman-Monteith equation, are complex and computationally hard. Moreover, simple ML approaches also face limitations, particularly related to data centralization, and may not effectively capture the diverse climatic conditions of different regions.

The application of ML techniques for predicting ETo leads to evidence of their accuracy by collecting limited weather data. Problems with such measures are that they are often limited to specific areas. The existing solutions were proposed for a specific location tailored according to local weather conditions [14, 15]. The existing solutions of ETo using limited weather parameters are limited to a specific area, and it is very hard to apply them in different contexts with the same accuracy [16]. The inherent variations in weather conditions of different locations further diversify the situation of ETo modeling using limited weather parameters [17]. Understanding the relationship between weather and ETo can be very complex, particularly for several locations with diverse weather patterns [18]. A universal ETo model using limited weather parameters is difficult to optimize across diverse weather parameters of different locations [17]. Traditional machine learning methods for estimating ETo are effective within certain contexts but are constrained by their dependence on local datasets, limiting their generalizability across diverse geographical locations. Consequently, this limitation is not due to the models themselves but rather to the restricted scope of the input data available to them. There is a need for a model that can generalize well for ETo prediction of different locations with distinct weather conditions. This study addresses this limitation by implementing a federated learning (FL) approach, allowing localized model training and the capability to generalize across these local models using a global model [14]. The proposed approach intends to explore the possibilities of generalizations ability of a federated model to learn ETo predictions using diverse weather parameters of multiple locations.

FL has the potential to handle different weather conditions across multiple locations using a single global model. Current methods have made some progress in certain areas. However, we still need to figure out how to deal with the diversity of data when estimating ETo across a wider range of locations using a single model. FL can handle data diversity by leveraging multiple models for different locations [19]. The FL enables distributed data across multiple sources without data centralization [20, 21]. The FL has the potential to address the issues of data heterogeneity, decentralized data distribution, real-time adaptability, and resource efficiency. The advantages of FL over traditional machine learning and deep learning approaches for ETo estimation of multiple locations with distinct weather conditions make it different, novel, and unique [22]. The architecture of the proposed FL model is illustrated in Fig 1.

FL is diverse in handling the weather parameters of different locations for ETo estimation considering the given data from multiple locations [23]. It integrates the dataset of multiple locations, improves the accuracy, and generalizes the ETo estimation model across multiple locations. Furthermore, FL improves geographical coverage and ensures privacy by uncovering the insights of datasets of multiple locations [24, 25]. Considering the mentioned advantages of FL, this paper proposes a novel model to improve the accuracy of ETo prediction for multiple locations. The proposed model considers the collective power of distributed data [1, 26] and enhances the precision and spatial coverage of ETo prediction. FL-based ETo estimation enhances water balance analysis by promoting model generalization and privacy by training on a diverse set of localized data from various regions, ensuring that the models are robust and applicable across different environmental conditions.

The main contributions of the paper are as follows:

  • A novel model is proposed for estimating ETo for three (3) geographical locations in Pakistan by exploiting FL that provides a global automated solution for ETo prediction and connects agriculturists around the globe.
  • The study intends to explore the performance of local and global machine learning models in ETo prediction and their relative comparison.
  • The study also explores the impact of different weather parameters in ETo prediction, using the feature important features of machine learning models.

The remaining sections of the paper are organized as follows: Section II explores existing literature and studies related to ETo estimation, machine learning models, and relevant methodologies. Section III provides a detailed overview of this study’s materials, data sources, and methodologies. Section IV provides the obtained results and discussion, and section V contains a conclusion that summarizes the study’s key findings.

2 Related work

Machine learning techniques have been the research community’s focus in agriculture domain [2729] in recent years. This section explores the progress of recent emerging approaches for ETo estimation using machine learning approaches.

Dong et al. [30] proposed a solution that aims to improve the accuracy of ETo estimations by analyzing the spatial and temporal variation of ETo in China. This study used three ML models. Included models are; multiple adaptive regression, convolution neural networks (CNN), and extreme learning machines (ELM). CNN provided better estimation results of ETo estimation.

Rai et al. [31] carried out a comparative analysis of various machine learning models for the prediction of monthly ETo. They used India’s weather data for a period from 2009 to 2016. The results revealed that among all the models examined, the SVR model yielded the highest accuracy in reconstructing water requirements.

Bellido et al. [32] discussed the application of the neural network for determining the ETo in the Andalusia region of southern Spain. The multi-layer perceptron, ELM, SVM, generalized regression neural network (GRNN), RF, and XGBoost were the methods assessed in this study. This study employed performance measures such as the coefficient of determination (R2), the root mean squared error (RMSE), and Nash-Sutcliffe model efficiency coefficient (NSE). Notably, the ELM approach emerged the best of all models, with an R2 of 0.89 NSE, 0.89, and RMSE of 0.67mm day−1.

Krishna et al. [33] have employed diverse cognitive computing models to predict ETo. The study utilized various factors and concluded that the second order neural network was most accurate in predicting ETo. It also showed low error and high accuracy with the use of RMSE and R2 values of 0.065 mm day−1 and 99%, respectively.

Ayaz et al. [34] used different machine learning models in India and New Zealand. The focus of this study is to use just only temperature data. They tried models like Long Short-Term Memory (LSTM), XGBR, SVR, and RF. When using all weather data inputs, the LSTM model outperformed with 99% accuracy. But when they only used temperature, accuracy dropped to 86%.

Samman et al. [35] examined the performance of four machine learning models in ETo estimations. Five Iraqi stations were used as inputs to the models. SVM, RF, Bagged Trees (BaTs), and Boosting Trees (BoTs) have all been used for modeling daily ETo. The RFR model provided the most accurate ETo estimates at all cites, while SVM provided the lowest results. RFR significantly enhanced estimation accuracy compared to SVM, BoT, and BaT models across different locations. The improvement in RMSE ranged from 8% to 94% during the test period.

Mirzania et al. [36] proposed ETo estimation approach for Australia. This study evaluated the performance of three models: the innovative Gunner algorithm, SVR, and hybrid innovative Gunner support vector regression. It was found that the AIG-SVR model outperformed, with r and RMSE corresponding to Marree Aero station values of 0.945 and 1.124, respectively, and St Helen Aerodrome stations of 0.951 and 0.476, respectively.

Khan et al. [16] proposed a method for reclaiming saline soil that employs the Internet of Things (IoT) and ML to estimate ETo on monthly data. LSTM and ensemble LSTM models predict ETs based on field temperature, irrigation water salinity, and soil salinity. It was found that the ensemble LSTM-based model was more accurate than the single LSTM model, with an accuracy of 92% for the ETo estimation.

Rashid et al. [37] aimed to develop an ETo estimation method using four machine learning models with different input combinations. All combinations of the four defined models showed that the RF model was the most effective with MAE, R2 and RMSE values 0.76 mm day−1, 0.85% of 0.82 mm day−1 respectively.

Yu et al. [38] aimed to assess the performance of different machine learning models for ETo estimation with various input combinations, such as minimum and maximum temperatures, wind speed, solar radiation, relative humidity, atmospheric pressure, and sunshine duration. This study evaluated the performance of three machine learning models: ANN, SVR, and ELM. The SVR model proved to be the most accurate, with an R of 0.881, an RMSE of 0.925mm day−1, MAE of 0.59 mm day−1, and NSE of 0.744.

Zhang et al. [39] provide a detailed analysis of the special specificity of FL and the potential future of FL. FL is important in numerous contexts of application, and in particular when it is used for the discussion of frameworks of IoT. The research also addresses the challenges of applying FL within the IoT framework. The work also outlines practical aspects concerning the implementation of FL in practice and the necessity of the corresponding development tools.

Using a federated approach, Manoj et al. [40] introduced crop yield prediction on distributed datasets across multiple client devices. The ResNet-16 and ResNet-28 regression models were trained with the “federated averaging” technique to ensure decentralized training. The results of these models were then compared with other deep learning, and machine learning models. This research indicates that federated averaging was effective when applied with the ResNet-16 regression model and Adam optimizer to enhance the performance.

Kumar et al. [41] address the issues of data privacy and security that affect the implementation of SA. The study proposes PEFL: private FL framework with distinctive depth of privacy encoding. To enhance privacy, PEFL employs perturbation based encoding while the short term memory-auto encoder enhances the capacity of the memory. Despite a somewhat ambiguous division between the standard and the attack pattern, PEFL outperforms the non-FL and other FL methods for the ToN-IoT dataset.

Nguyen et al. [42] focused on the growing popularity of FL in IoT networks. The overview of FL and IoT with the overview of the improvements made in the recent advancements is presented. This study also examines FL possibility of enabling a range of IoT services, including data sharing, caching, attack detection, localization, mobile crowdsensing, and privacy protection. FL is extensively analyzed across critical IoT domains such as healthcare, transportation, UAVs, intelligent cities, and industry, highlighting its transformative impact.

Imteaj et al. [43] conducted a study that examined how distributed machine learning models could be trained on IoT devices with limited resources. It describes the prior research on FL and the assumptions made about its widespread use through IoT devices. The study also discussed the difficulties and problems of integrating FL into an IoT environment. A thorough analysis of new obstacles to using FL in diverse IoT scenarios is presented. In the estimation of advancing ETo recent studies have contributed valuable understandings. Combining remote sensing techniques lays the foundation for satellite data to understand river dynamics [44]. [45] explores the potential of machine learning and satellite data for enhancing seasonal water supply forecasts. These studies collectively underscore the multidisciplinary approach required to refine reference evapotranspiration modeling and emphasize the importance of remote sensing data in ETo estimation.

Although significant improvements in existing approaches (as shown in Table 1) have been observed, these approaches still suffer from limited geographical coverage, highlighting the need to address data diversity in ETo estimation across a broader range. Despite significant advancements in the estimation of ETo through various traditional and machine learning methods, a notable gap exists in the literature concerning the application of these approaches across multiple geographical locations with diverse weather conditions. Most existing studies are limited to localized datasets, effectively modeling ETo for specific areas. The existing models are limited in generalizing findings across different locations with diverse weather parameters.

thumbnail
Table 1. Comparison of the state-of-the-art approaches.

https://doi.org/10.1371/journal.pone.0314921.t001

For instance, while numerous researchers have successfully applied machine learning techniques such as Support Vector Regression (SVR) and Random Forest Regressor (RFR) to predict ETo in singular climatic contexts, the results are often not transferable to other regions with different meteorological characteristics. This limitation is primarily due to the inherent variability in weather parameters—such as temperature, humidity, wind speed, and solar radiation—that influence ETo differently in distinct environments.

There is a needed to integrate FL techniques to estimate ETo across multiple locations simultaneously. By using an FL framework, this study aims to use localized data from diverse climatic conditions while ensuring model generalization. This approach not only addresses the existing limitations in the literature but also provides a comprehensive solution for improving ETo estimations relevant to agricultural practices and water resource management across different geographical settings.

3 Material and methods

This section details the key components and methodologies used in this ETo estimation study based on ML algorithms and FL. The PM equation has been recognized as a standard [59]. However, it gets complicated because it needs many different factors to operate. The PM equation is written as (1) (1) Where,

A reliable ETo estimate is essential for water resource management, agriculture, and weather sustainability. Using FL, we address the challenges associated with aggregating data from diverse geographic regions while maintaining the model’s generalizability and the data’s privacy.

3.1 Study area

Pakistan’s diverse climate and geography result in varying ET rates across different regions [6064]. The study area comprises Punjab, Pakistan’s second-largest province in the eastern part of the country, as shown in Fig 2.

thumbnail
Fig 2. Geographical locations of the experiment cities.

https://doi.org/10.1371/journal.pone.0314921.g002

Agriculture is an important sector of Pakistan’s economy. This sector directly supports the country’s population and accounts for 26 percent of gross domestic product [65]. The major crops include cotton, wheat, rice, sugarcane, fruits, and vegetables. Multan, Faisalabad, and Rawalpindi are three major cities in Pakistan known for their fertile lands and significant contributions to the Pakistan agriculture sector. Geographically, Punjab is located between 24–37° N and 62–75° E. The majority of Punjab falls within the arid and semi-arid zones. Punjab, Pakistan, can be categorized into three distinct regions. The Multan region is characterized by arid conditions and experiences relatively high temperatures. The climate here is notably harsh. The Faisalabad area is semi-arid. The climate in Faisalabad is generally milder than that in the southern region. Rawalpindi features a tropical and semi-arid climate. Consequently, it typically experiences high temperatures. These weather distinctions within Punjab are essential when studying various weather and agricultural phenomena in the region. These distinctions significantly impact factors such as ETo and water resource management.

3.2 Dataset

The data for this study were collected from three stations in Punjab, Pakistan. Multan is located at 30.1575° N, 71.5249° E, Rawalpindi is at 33.5651° N, 73.0169° E, and Faisalabad is at 31.4187° N, 73.0791° E. Daily data for 2012–2022 was obtained from NASA data sources [66] and Panman moniteth equation is used to calculate the daily ETo. These selected stations cover the southern, central, and upper parts of Punjab. Daily data were collected on three key parameters: Maximum temperature (Tmax), wind speed (WS), and relative humidity (RH). The weather in these areas is distinct, with significant differences in weather characteristics. Figs 35 show separate plots for three distinct datasets. These 3D-scatter plots visualize the relationships between selected features Tmax, WS, RH, and ETo Within each loop iteration. Adding climatic conditions into traditional ML models is helpful but doesn’t address challenges like data centralization, regional variability, and privacy. FL making it ideal for ETo estimation in distributed crop fields.

thumbnail
Fig 3. Relationship of the selected features with ETo at Multan dataset.

https://doi.org/10.1371/journal.pone.0314921.g003

thumbnail
Fig 4. Relationship of the selected features with ETo at Faisalabad dataset.

https://doi.org/10.1371/journal.pone.0314921.g004

thumbnail
Fig 5. Relationship of the selected features with ETo at Rawalpindi dataset.

https://doi.org/10.1371/journal.pone.0314921.g005

The dataset displays diverse weather characteristics at Multan, Faisalabad, and Rawalpindi (presented in Table 2). The temperatures in Multan and Faisalabad are higher, with Multan recording the highest Tmax at 50°C and Faisalabad following closely at 49.5°C. Rawalpindi, on the other hand, is comparatively milder, with a Tmax of 47.4°C. Rawalpindi shows the maximum RH of 94.5% and Multan the lowest minimum RH of 7.3%. Significant variations in WS in different cities, with Faisalabad having the highest maximum WS 5.67 ms-1. The ETo rate varies greatly in Multan, indicating substantial water loss. However, the rate is lower in Faisalabad and Rawalpindi due to their relatively milder climates. These observations underscore the diverse meteorological conditions among the regions.

thumbnail
Table 2. Summary statistics of weather variables in Multan, Faisalabad, and Rawalpindi.

https://doi.org/10.1371/journal.pone.0314921.t002

The diagram presented in Fig 6 consists of a set of violin plots describing the data distribution of five climatic variables. It shows the probability density of the data at different values, with the plot’s width representing the density and a central box plot indicating the interquartile range, median, and potential outliers. Each subplot is labeled from A to E, with each violin plot showing the data distribution. Faisalabad and Multan show a wider spread of Tmax values from around 10°C to 50°C while mean temperatures range approximately from 10°C to 40°C. Faisalabad and Multan have similar distributions of wind speeds reaching up to 7 m/s. Faisalabad and Multan display a broader range of evapotranspiration values, with distributions extending up to around 15 mm/day. Rawalpindi’s ETo values show fewer variations in evapotranspiration rates. These values are significant for understanding regional climatic conditions and their agricultural implications.

thumbnail
Fig 6. Violin plots for each variable in each dataset.

https://doi.org/10.1371/journal.pone.0314921.g006

Scatter plots are generated to allow comparison and visualization of data characteristics across datasets, as shown in Figs 79. Figs 79 used in this study to illustrate how these datasets are associated with estimates of ETo. This also illustrates the different statistical properties of each dataset, enabling a clear understanding of their characteristics. Variations in data distributions could significantly impact FL models’ ability to generalize and predict ETo. ETo is more accurately simulated by including Tmax, wind speed, and humidity as inputs to the model. The Tmax, WS and RH strongly correlate with ETo. These factors directly affect ETo as they govern the rate at which water converts from liquid to vapor. The inclusion of these parameters in ETo models allows better capture of the complex dynamics of water loss processes.

thumbnail
Fig 8. Distribution plot of Faisalabad dataset.

https://doi.org/10.1371/journal.pone.0314921.g008

thumbnail
Fig 9. Distribution plot of Rawalpindi dataset.

https://doi.org/10.1371/journal.pone.0314921.g009

The correlation plots are shown in Figs 1012. Correlation plots visually represent the relationships between variables in a dataset, often using color-coded matrices. These plots display the strength and direction of correlations, with the intensity of color or the slope of the trend line indicating the degree of positive or negative correlation between pairs of variables. In our study, these correlation plots show the relationships between ETo and feature sets. It is evident from the correlation results that there is a strong relationship between weather parameters and ETo. Fig 13 shows weather parameters across different datasets over the years.

thumbnail
Fig 10. Correlation among climatic variables at Multan dataset.

https://doi.org/10.1371/journal.pone.0314921.g010

thumbnail
Fig 11. Correlation among climatic variables at Faisalabad dataset.

https://doi.org/10.1371/journal.pone.0314921.g011

thumbnail
Fig 12. Correlation among climatic variables at Rawalpindi dataset.

https://doi.org/10.1371/journal.pone.0314921.g012

thumbnail
Fig 13. Comparison of input variables and ETo over the years at all sites.

https://doi.org/10.1371/journal.pone.0314921.g013

3.3 Machine learning models

3.3.1 Decision Tree Regressor (DTR).

DTR is a type of decision tree used in supervised machine learning tasks like regression and classification. The working principle is making a tree of decisions based on various features in the dataset, such as temperature, wind speed or humidity. DTR divides the data into smaller groups until it cannot be divided anymore or the stopping criteria are met. DTR accommodates various data types and is adept at capturing complex non-linear connections within the dataset.

3.3.2 Random Forest Regressor (RFR).

RFR used an ensemble learning approach and acted like a team of DTR models working together to enhance the prediction of the task. It combines many tree models to give a final prediction by averaging what each model (tree) predicts. Each tree in the prediction is trained on a random subset of the training data. For each tree split, a random subset of features is considered. This ensemble approach helps to prevent the model from memorizing the training data, which could lead to wrong predictions on new data. RFR Can handle large datasets with high dimensionality.

3.3.3 Support Vector Regressor (SVR).

SVR is a supervised machine learning tool used to perform regression tasks. SVR works by finding a best-fit line (hyperplane) to the data, leaving some points outside but not too many. SVR focuses on controlling how many points can be outside this line rather than trying to make every prediction perfect. SVR is Memory efficient, using only support vectors in the decision function. Using different kernel functions for non-linear decision boundaries makes it more efficient to perform regression tasks.

3.4 Feature importance analysis

Feature importance analysis determines the impact and importance of different weather parameters for other locations on a specific location’s ETo. Gini impurity metric is used to assess the importance of various weather parameters in predicting ETo of each location. Gini impurity is a commonly used criterion in decision tree algorithms to evaluate the quality of a split at each node. The Gini impurity for a dataset is calculated using the Eq 1. (2) Where:

  • pi is the proportion of instances in class i relative to the total number of instances.
  • n is the different values of ETo.

The Gini impurity is calculated for the parent and child nodes after the split when a feature is used to split the data at a node. The decrease in Gini impurity resulting from this split indicates how well the feature separates the data into different ETo value ranges. The more significant the reduction in impurity, the more important the weather feature is considered for ETo determination. The total decrease in Gini impurity is calculated for each feature in the model, which is determined by splits on that feature across all trees in the Random Forest model. The total decline is normalized by the number of trees to determine the average Gini importance score for each feature.

The resulting scores indicate the relative importance of each weather parameter in predicting ETo. A higher Gini importance score suggests a feature significantly impacts the ETo predictions.

The Gini impurity criterion is used to assess the importance of weather parameters for ETo estimation because it can handle continuous variables without considering data distribution. Gini impurity is also computationally efficient and allows seamless integration with RF models to apply an ensemble learning approach to enhance the predictive accuracy of the model.

3.5 Federated Learning (FL) framework for ETo estimation

The proposed FL framework adopts a decentralized architecture, enabling multiple clients to collaboratively train a global model for estimating evapotranspiration (ETo). Each client is responsible for training on its local dataset, which contains weather parameters relevant to ETo estimation, as shown in Fig 14. The central server orchestrates the training process by coordinating model updates across clients. This collaborative approach is particularly advantageous for estimating ETo in distributed crop fields, where local weather data varies, and direct data centralization may not be feasible due to privacy, bandwidth, or regulatory concerns.

3.5.1 FL framework design and methodology.

The core of the framework involves three key components: client initialization, local training, and global aggregation. The workflow of the federated learning process applied to ETo estimation is as follows:

  1. Client Initialization: Each client initializes its local model by receiving parameters from the global model, which is maintained by the central server. The local model represents an initial estimate for ETo based on the global understanding of weather patterns.
  2. Local Training: Clients train their local models using their respective datasets, which include historical weather data such as maximum temperature, wind speed, and relative humidity collected over the period from 2012 to 2022. Each client optimizes its local model parameters based on a loss function defined specifically for ETo prediction. The loss function typically measures the error between predicted and actual ETo values at the local level, helping each client refine its model.
  3. Model Update: After local training, clients compute updates to their model parameters. These updates are derived from the gradient of the loss function with respect to the local model parameters, representing the direction and magnitude of adjustments needed to improve the model’s accuracy. Clients then send these updates to the central server for aggregation.
  4. Aggregation: The server aggregates the updates from all clients to form a new global model, which incorporates the knowledge from all participating regions. The aggregation process involves averaging the model parameter updates from each client. This ensures that the global model reflects both the local weather conditions (which vary by region) and the generalizable patterns across all locations. The typical optimization objective in FL can be expressed as: (3) where F(θ) is the global loss function to be minimized, representing the overall model performance across all clients, θ represents the global model parameters, n is the total number of participating clients, and Li(θ) is the local loss function for client i, which is computed using the client’s local dataset and global model parameters θ.
    The goal of FL is to minimize the average of these local losses across all clients, ensuring that the global model performs well across diverse weather conditions.
    The process of aggregation is mathematically represented by the following equation for the global model update: (4) where n is the number of participating clients, Δθi is the update computed by client i based on its local training, and Δθglobal represents the aggregated update applied to the global model.
    This aggregation ensures that the final model is a synthesis of all local models, enhancing the model’s ability to generalize across different climates and geographical regions.
  5. Clients and Communication Process: In this study, three clients represent three distinct geographical locations: Multan, Faisalabad, and Rawalpindi. Each client collects weather data over the period from 2012 to 2022, focusing on parameters like maximum temperature, wind speed, and relative humidity. The communication process is designed to minimize the need for large-scale data transfer and ensure that local data privacy is maintained. The communication process includes the following steps:
    1. (a) Data Characteristics: Each client’s dataset is unique, reflecting local weather conditions and variations in ETo across the regions. This diversity in the data is essential for training a robust and generalized model that can adapt to different climates.
    2. (b) Communication Rounds: The FL process consists of 20 communication rounds. In each round, the central server distributes the updated global model to all clients and receives their local model updates. This iterative process continues until convergence is achieved, meaning that the global model performs satisfactorily across all regions.

The Algorithm regarding FL training is given by Algorithm 1. The FL algorithm starts with Initialization (lines 2-4), where the global model parameters (w) are set. The learning rate (α = 0.025), regularization strength (λ = 0.05), and number of training rounds (T = 20) are defined. The optimal values are found using the random search method. During the Training Process (lines 5-11), for each training round t (line 5), each client i (line 6) updates its local model parameters based on the global model w and its local dataset (Xi, Yi). This update involves adjusting the local model parameters according to the gradients from the local loss function and regularization term. After all clients have completed their local updates, the Global Model Update (line 8) takes place, where the global model parameters are updated by averaging the parameters from all clients. Finally, the Convergence Check (lines 12-13) determines whether the model has converged or if training should continue for up to the specified number of rounds T. This step ensures that the iterative process continues until convergence criteria are met or the maximum number of rounds is reached.

Algorithm 1 FL with three clients

1: Initialization:

2: Initialize global model parameters w

3: Initialize learning rate α, regularization strength λ, and number of training rounds T End Initialization

4: for t = 1 to T do

5:  for each client i do

6:   Update local model parameters wi based on the global model w and the local dataset (Xi, Yi)

7:   wiwαL(w, Xi, Yi) + λ∇R(w) ▷ Where L(w, Xi, Yi) is the local loss function, and R(w) is the regularization term

8:   end for

9:   After all clients have updated their local models, update the global model as follows:

10:   

11: end for

12: Convergence Check:

13: Check for convergence or end the training after T rounds

14: End Convergence Check

3.6 Evaluation metrics

To test the results’ reliability, the models were trained and tested using 10-fold cross-validation to identify the best-performing model. Finally, evaluation metrics were computed for each model to compare their performance. This study used R2, RMSE, MAE, MAPE, and NSE as evaluation metrics to assess the machine learning model’s performance. These metrics will help quantify how well the model estimates ETo. R2 measures the model’s goodness of fit to the observed ETo values. It tells the proportion of the variance in ETo that the model can explain. The R2 value can be obtained by Eq (5). (5) Where SSR presents the sum of squared residuals that can be computed by the squared difference between predicted and observed ETo), and SST presents the total sum of squares that can be computed by estimating the squared difference between observed ETo and its mean).

RMSE presents the average magnitude of the errors between predicted and observed ETo values. RMSE of the proposed model prediction can be formalized as follows: (6) where predicted ETo presents the estimated ETo of the proposed model, Observed ETo presents the actual observed ETo, and n presents the number of data points. In this equation, i is an index that corresponds to each specific data point in the dataset.

MAE can be utilized to measure the average absolute magnitude of the errors between predicted and observed ETo values (computed in the previous equation) can be formalized as follows: (7) where predicted ETo presents the estimated value of the proposed model, Observed ETo presents the estimated value measured from real-world data. Here, i is an index that represents each data point in the dataset. The above equation aggregates all the individual errors into one total error, and n represents the total number of data points in the given dataset.

MAPE can compute the average percentage difference between predicted and actual ETo values that can be formalized as follows: (8)

Nash-Sutcliffe Efficiency (NSE), used to measure the performance of ML-based models. It assesses the predictive accuracy of a model by comparing the model’s predictions to observed data. NSE mathematically expressed by (9) (9) where is the mean of the observed ETo values, and n is the total number of observations. Moreover, t is an index representing each dataset observation.

4 Results

The study examined three machine learning models: RFR, DTR, and SVR. A proposed experiment uses three separate weather datasets from Multan, Faisalabad, and Rawalpindi. Performance was evaluated using key metrics, including R2, RMSE, MAE, MAPE and NSE. A comparison of the performance of these models across different geographical locations is provided in the results. Notably, to our knowledge, the proposed FL method is the first to automatically estimate ETo for distributed fields using FL. Therefore, it is compared with traditional machine learning models instead of baseline models. At Multan, the RFR model achieved the highest R2 and NSE value of 0.98 and MAPE of 6.72%. The obtained value of R2 and MAPE indicates an excellent fit to the data. The lower RMSE values = 0.42, MAE = 0.32 mm day−1 reveals the model’s ability to accurately predict the ETo. The obtained values of RMSE and MAE suggest precise and accurate ETo predictions with the RFR model. SVR and DTR model also performed good, with R2 values above 0.95 and low in error metrics. A high R2 and NSE value of 0.97 with the Faisalabad dataset using the RFR model is achieved.

For Faisalabad, the RFR outperformed other models with RMSE = 0.31, MAE = 0.23 mm day−1, and MAPE of 5.21%. In the case of the Rawalpindi dataset, the RFR model outperformed other models with R2 = 0.96, NSE = 0.96 and MAPE = 8.31%, indicating a good fit for the model. The RFR also exhibits a low RMSE = 0.37 and MAE = 0.28 mm day−1 in ETo predictions. The DTR and SVR models also performed reasonably with R2 values = 0.93, but they exhibit higher errors than the RFR model. Kruskal-Wallis test is also performed to evaluate the performance of RFR, SVR and DTR. The performance of three machine learning models is compared using evaluation metrics R2, RMSE, MAE, MAPE, and NSE. The results of the test are described in Table 3. The p-values for all metrics are greater than 0.05. This indicates no significant difference in performance among the models across the local and the federated model. The RMSE and MAE values are closer to the threshold value. Moreover, we also perform the ANOVA for the reliability analysis. The results of ANOVA analysis suggest that f-ratio value is 14.6 and the p-value is.000077. The result is significant at p <.05.

Radar chart shows how the three models (RFR, SVR, DTR) perform according to the five metrics for each city as described in Fig 15. The radar charts represent that the models’ performance across different metrics is relatively consistent within each city, with no significant outliers. In the federated approach described in Fig 16, RFR and SVR show better performance in terms of lower errors (MAE, MAPE, RMSE) and higher R2 and NSE, while DTR has higher error metrics and lower R2 and NSE values on average. Note that we selected a Radar chart instead of a Smith chart as a Radar chart provides a better understanding while comparing multiple variables across a single category.

thumbnail
Fig 15. Performance comparison of ML algorithms across multiple evaluation metrics.

https://doi.org/10.1371/journal.pone.0314921.g015

thumbnail
Fig 16. Performance of federated learning on different ML models across multiple evaluation metrics.

https://doi.org/10.1371/journal.pone.0314921.g016

The performance of different models in correlation and the standard deviation is shown in Taylor diagram 17, offering a comprehensive overview of model accuracy and variability. Fig 17 suggests that the federated approach generally shows good performance with RFR and SVR, but DTR shows lower performance.

thumbnail
Fig 17. Performance evaluation of different models using Taylor diagram on various datasets.

https://doi.org/10.1371/journal.pone.0314921.g017

The error boxplot presented in Fig 18. It demonstrates that RFR constantly outperforms the DTR and SVR across all locations and performance metrics. RFR exhibits the highest R2 and NSE values and the lowest MAPE, MAE, and RMSE values, representing greater predictive accuracy and reliability. DTR generally shows the poorest performance, with the lowest R2 and NSE values and the highest MAPE, MAE, and RMSE values. SVR falls in between, performing better than DTR but not as well as RFR. FL models also show a similar trend, with RFR having the lowest MAPE values as represented in Fig 19. DTR in federated generally performs better than SVR in terms of RMSE and MAE, but worse in terms of R2, NSE, and MAPE

thumbnail
Fig 18. Box plot analysis of evaluation metrics for RFR, DTR, and SVR models.

https://doi.org/10.1371/journal.pone.0314921.g018

thumbnail
Fig 19. Box plot analysis of evaluation metrics for RFR, DTR, and SVR models using federated learning.

https://doi.org/10.1371/journal.pone.0314921.g019

A feature-importance-based analysis is also performed to determine the impact of different weather parameters on ETo. The feature importance analysis is shown in Fig 20. By analyzing this information, we can better understand the relationship between weather features and ETo. In feature analysis, it was found that Tmax and WS were the most influential parameters for ETo determination. A novel approach to learning called FL is also compared with separate traditional machine learning models in the study. This study also compared three regression models, RFR, SVR, and DTR, revealing varying performance metrics across evaluation criteria. The RFR outperformed other models with an R2 value of 0.97 while maintaining lower errors with an RMSE of 0.44 and MAE of 0.33 mm day−1 and MAPE of 8.18. The DTR results closely followed the RFR results, with an R2 = 0.96 and similar error values RMSE = 0.48, MAE = 0.35 mm day−1 and MAPE = 8.50. This federated model, with an R2 value = 0.97 and an RMSE = 0.44, can explain a large part of the changes in the target variable. In the FL approach, RMSE is higher than the best-performing individual model (e.g., Multan’s RFR at RMSE 0.4064). Although the difference is relatively small, it still represents an accurate prediction. The MAE of the federated model = 0.33 mm day−1, and the MAPE = 8.18%, demonstrating its ability to provide accurate estimates of ETo. Performance metrics of different models across all the datasets are represented in Table 4.

thumbnail
Table 4. Performance metrics of different models across datasets.

https://doi.org/10.1371/journal.pone.0314921.t004

4.1 Discussion

The study implemented ML and FL models using Python and Google Colab by exploiting Python libraries: Scikit-learn, Keras, and TensorFlow Federated. The specific configurations included a processor with Single-core hyper-threaded Xeon Processors with RAM Memory of 12.72 GB, GPU of NVIDIA Tesla K80, P100, or T4 (depending on availability), providing substantial computational resources. While training time can vary due to internet connectivity and the availability of Google services. The RFR model consistently performed excellently across all three datasets, with high R2 values and low error metrics. SVR and DTR models also produced competitive results, but often RFR outperformed them. Many parameters influence model selection, including the specific application, interpretability, and computational capacity. The FL approach combines data from multiple regions. It proved highly effective, especially when using the RFR. RFR is the most reliable model for accurate ETo estimation in diverse crop field settings. This finding is crucial for agricultural management, enabling more precise water resource optimization and planning. The high R2 and low error metrics of RFR indicate its strong capability to handle the spatial variability in ETo data, making it a valuable tool for improving irrigation efficiency and crop yields. DTR showed more variability in its performance. Although DTR can capture non-linear relationships, it didn’t perform as consistently as RFR. Regarding error metrics, the Support Vector Regressor (SVR) generally lagged behind RFR and DTR. It is likely due to its sensitivity to kernel choice and hyperparameters. The findings emphasize the importance of using the right machine-learning models for particular geographical locations when estimating ETo values. These findings explain how well machine learning models predict ETo for distributed crop fields. The RFR-based model seems appropriate for accurate ETo predictions, but the application’s requirements should be considered when choosing the final model. This study also uses feature importance analysis to prioritize and select the most suitable features for the ETo prediction model. A machine learning model that emphasizes the most important factors may optimize training times and enhance interpretability by focusing on the most significant factors. Identifying traits of low value in data preparation and quality control might be useful. FL is beneficial when dealing with data distributed across multiple locations or clients, such as geographic regions. It allows models to be trained locally on specific datasets while preserving data privacy. Three different datasets (Multan, Faisalabad, and Rawalpindi) are combined to create the federated global model. The obtained results indicate strong performance across multiple evaluation metrics. The R2 value of the federated model is comparable to that of the best-performing individual models on each dataset. It indicates the excellent generalization capabilities of the federated model. FL preserves privacy even though the RMSE of the federated model is slightly higher than that of the best individual models. The trade-off is acceptable when considering the accuracy of the predictions. As a result, the federated model’s MAE and MAPE for estimating ETo across multiple locations are reliable. Comparing FL to individual models for each dataset, the FL method can effectively predict ETo. It can be beneficial when distributing data across multiple locations, and model generalization is a key concern. It is important to consider the particular requirements of the application when choosing between individual models and FL. Federated models may exhibit slightly higher RMSE, MAE, and MAPE, indicating a modest compromise in accuracy. However, this trade-off enhances their ability to generalize across diverse datasets, making them more robust, albeit less optimized for specific individual datasets. Distributed learning can indeed be an effective method for geographically dispersed data but in many real-world applications, transferring local weather data to a central location for model training may be infeasible due to bandwidth limitations, and data security regulations. Federated learning also addresses these concerns by allowing models to be trained locally at each site.

The study focuses on three locations in Pakistan, each characterized by distinct weather conditions. Expanding the research to encompass broader geographical areas could significantly enhance the model’s adaptability and generalization. Moreover, Training a model with data from various regions, each with unique weather conditions, geographical features, and farming practices, is essential for achieving high accuracy. However, for future work, it is recommended to implement the proposed solution in regions with even more diverse weather conditions and to incorporate advanced deep learning approaches to refine the model’s performance further.

5 Conclusion

The study proposed an FL approach for estimating reference evapotranspiration (ETo) across multiple locations with distinct weather parameters. By employing various machine learning algorithms, including support vector machines, decision tree regression, and random forest regression (RFR), the research aimed to analyze and predict ETo effectively. The results demonstrated that the RFR model consistently outperformed other models at local and global levels, highlighting its robustness in ETo predictions. Feature importance analysis identified maximum temperature and wind speed as key weather parameters influencing ETo estimation. This research offers insights into the complex relationships between weather variables and ETo. However, the model’s adaptability might be limited by the study’s focus on three specific locations in Pakistan. Future work should explore the application of this approach in regions with more diverse weather conditions and consider the integration of deep learning techniques for further improvement.

The study proposed an FL approach for ETo estimation of multiple locations with distinct weather parameters. Various machine learning algorithms were used to analyze and predict Reference Evapotranspiration (ETo), including Support Vector Machines (SVM), DTR, and RFR. The implementation of the proposed solution reveals that the RER model outperformed local and global models with R2 = 0.95, MAPE = 10.35, RMSE = 0.49, NSE = 0.95, and MAE = 0.38 (mm day−1). The performance of the federate learning is satisfactory to estimate ETo with a single machine learning model trained using data of different locations. In the case of local models, the performance of the RFR model for the Multan dataset is R2 = 0.98, MAPE = 6.72, RMSE = 0.42, NSE = 0.98 and MAE = 0.30 (mm day−1). For the RFR model of the Faisalabad dataset is R2 = 0.97, MAPE = 5.46, NSE = 0.97 RMSE = 0.32, and MAE = 0.24 (mm day−1). For the RFR model of the Rawalpindi dataset is R2 = 0.96, MAPE = 8.31, RMSE = 0.37, NSE = 0.96 and MAE = 0.27 (mm day−1). Using a machine learning model, the RFR-based model outperformed the SVR and DTR in ETo predictions at global and local levels. A feature importance analysis revealed that maximum temperature and wind speed are the dominant weather parameters in ETo estimation. The study gains a deeper understanding of the relationships between weather parameters and reference evapotranspiration. Three locations in Pakistan may limit the model’s adaptability to other regions. Implementing the solution in areas with more diverse weather conditions and utilizing deep learning approaches are recommended for future work. The findings of this study underscore the potential of FL in enhancing ET predictions across varying climatic conditions, paving the way for improved agricultural management practices.

References

  1. 1. Boobalan P, Ramu SP, Pham QV, Dev K, Pandya S, Maddikunta PKR, et al. Fusion of federated learning and industrial Internet of Things: A survey. Computer Networks. 2022;212:109048.
  2. 2. Mostafa RR, Kisi O, Adnan RM, Sadeghifar T, Kuriqi A. Modeling potential evapotranspiration by improved machine learning methods using limited climatic data. Water. 2023;15(3):486.
  3. 3. Xie C, Chen PY, Zhang C, Li B. Improving privacy-preserving vertical federated learning by efficient communication with admm. arXiv preprint arXiv:220710226. 2022.
  4. 4. Zouzou Y, Citakoglu H. General and regional cross-station assessment of machine learning models for estimating reference evapotranspiration. Acta Geophysica. 2023;71(2):927–947.
  5. 5. Zhuge W, Yue Y, Shang Y. Spatial-temporal pattern of human-induced land degradation in Northern China in the Past 3 decades RESTREND approach. International Journal of Environmental Research and Public Health. 2019;16(13):2258. pmid:31248024
  6. 6. Cobaner M, Citakoğlu H, Haktanir T, Kisi O. Modifying Hargreaves–Samani equation with meteorological variables for estimation of reference evapotranspiration in Turkey. Hydrology Research. 2017;48(2):480–497.
  7. 7. Zotarelli L, Dukes MD, Romero CC, Migliaccio KW, Morgan KT. Step by step calculation of the Penman-Monteith Evapotranspiration (FAO-56 Method). Institute of Food and Agricultural Sciences University of Florida. 2010;8.
  8. 8. Manikumari N, Vinodhini G, Murugappan A. Modelling of reference evapotransipration using climatic parameters for irrigation scheduling using machine learning. Hydrological Sciences Journal. 2020;65(16):2669–2677.
  9. 9. Han X, Wei Z, Zhang B, Li Y, Du T, Chen H. Crop evapotranspiration prediction by considering dynamic change of crop coefficient and the precipitation effect in back-propagation neural network model. Journal of Hydrology. 2021;596:126104.
  10. 10. Uncuoglu E, Citakoglu H, Latifoglu L, Bayram S, Laman M, Ilkentapar M, et al. Comparison of neural network, Gaussian regression, support vector machine, long short-term memory, multi-gene genetic programming, and M5 Trees methods for solving civil engineering problems. Applied Soft Computing. 2022;129:109623.
  11. 11. Bayram S, Çıtakoğlu H. Modeling monthly reference evapotranspiration process in Turkey: application of machine learning methods. Environmental Monitoring and Assessment. 2023;195(1):67.
  12. 12. Citakoglu H, Cobaner M, Haktanir T, Kisi O. Estimation of monthly mean reference evapotranspiration in Turkey. Water Resources Management. 2014;28:99–113.
  13. 13. Hu Z, Bashir RN, Rehman AU, Iqbal SI, Shahid MMA, Xu T. Machine learning based prediction of reference evapotranspiration (et 0) using iot. IEEE Access. 2022;10:70526–70540.
  14. 14. Nauman MA, Saeed M, Saidani O, Javed T, Almuqren L, Bashir RN, et al. IoT and Ensemble Long-Short-Term-Memory-Based Evapotranspiration Forecasting for Riyadh. Sensors. 2023;23(17). pmid:37688039
  15. 15. Nauman MA, Saeed M, Saidani O, Javed T, Almuqren L, Bashir RN, et al. IoT and Ensemble Long-Short-Term-Memory-Based Evapotranspiration Forecasting for Riyadh. Sensors. 2023;23(17). pmid:37688039
  16. 16. Khan AA, Nauman MA, Bashir RN, Jahangir R, Alroobaea R, Binmahfoudh A, et al. Context Aware Evapotranspiration (ETs) for Saline Soils Reclamation. IEEE Access. 2022;10:110050–110063.
  17. 17. Bashir RN, Saeed M, Al-Sarem M, Marie R, Faheem M, Karrar AE, et al. Smart reference evapotranspiration using Internet of Things and hybrid ensemble machine learning approach. Internet of Things. 2023;24:100962.
  18. 18. Tausif M, Dilshad S, Umer Q, Iqbal MW, Latif Z, Lee C, et al. Ensemble learning-based estimation of reference evapotranspiration (ETo). Internet of Things. 2023;24:100973.
  19. 19. Babar M, Qureshi B, Koubaa A. Review on Federated Learning for digital transformation in healthcare through big data analytics. Future Generation Computer Systems. 2024;160:14–28.
  20. 20. Siddique AA, Alasbali N, Driss M, Boulila W, Alshehri MS, Ahmad J. Sustainable collaboration: Federated learning for environmentally conscious forest fire classification in Green Internet of Things (IoT). Internet of Things. 2024;25:101013.
  21. 21. Kaleem S, Sohail A, Babar M, Ahmad A, Tariq MU. A hybrid model for energy-efficient Green Internet of Things enabled intelligent transportation systems using federated learning. Internet of Things. 2024;25:101038.
  22. 22. El Hanjri M, Kabbaj H, Kobbane A, Abouaomar A. Federated learning for water consumption forecasting in smart cities. In: ICC 2023-IEEE International Conference on Communications. IEEE; 2023. p. 1798–1803.
  23. 23. Supriya Y, Gadekallu TR. Particle Swarm-Based Federated Learning Approach for Early Detection of Forest Fires. Sustainability. 2023;15(2):964.
  24. 24. Ullah F, Srivastava G, Xiao H, Ullah S, Lin JCW, Zhao Y. A Scalable Federated Learning Approach for Collaborative Smart Healthcare Systems with Intermittent Clients using Medical Imaging. IEEE Journal of Biomedical and Health Informatics. 2023.
  25. 25. Pandya S, Srivastava G, Jhaveri R, Babu MR, Bhattacharya S, Maddikunta PKR, et al. Federated learning for smart cities: A comprehensive survey. Sustainable Energy Technologies and Assessments. 2023;55:102987.
  26. 26. Friha O, Brik B, Touati F, Al-Fuqaha A, Afyouni I. FELIDS: Federated learning-based intrusion detection system for agricultural Internet of Things. Journal of Parallel and Distributed Computing. 2022;165:17–31.
  27. 27. Mahjoub T, Mnaouer AB, Said MB, Boujemaa H. LoRa signal propagation and path loss prediction in Tunisian date palm oases. Computers and Electronics in Agriculture. 2024;222:109027.
  28. 28. Ahmed RA, Hemdan EED, El-Shafai W, Ahmed ZA, El-Rabaie ESM, Abd El-Samie FE. Climate-smart agriculture using intelligent techniques, blockchain and Internet of Things: Concepts, challenges, and opportunities. Transactions on Emerging Telecommunications Technologies. 2022;33(11):e4607.
  29. 29. Boulila W, Alzahem A, Koubaa A, Benjdira B, Ammar A. Early detection of red palm weevil infestations using deep learning classification of acoustic signals. Computers and Electronics in Agriculture. 2023;212:108154.
  30. 30. Dong J, Wang Y, Shen Y, Wang C, Chen J. Nation-scale reference evapotranspiration estimation by using deep learning and classical machine learning models in China. Journal of Hydrology. 2022;604:127207.
  31. 31. Rai P, Kumar A, Kumar M, Kushwaha S, Chauhan A. Evaluation of machine learning versus empirical models for monthly reference evapotranspiration estimation in Uttar Pradesh and Uttarakhand States, India. Sustainability. 2022;14(10):5771.
  32. 32. Bellido-Jiménez JA, Estévez J, García-Marín AP. New machine learning approaches to improve reference evapotranspiration estimates using intra-daily temperature-based variables in a semi-arid region of Spain. Agricultural Water Management. 2021;245:106558.
  33. 33. Krishnashetty PH, Balasangameshwara J, Sreeman S, Desai S, Kantharaju AB. Cognitive computing models for estimation of reference evapotranspiration: A review. Cognitive Systems Research. 2021;70:109–116.
  34. 34. Ayaz A, Rajesh M, Singh SK, Rehana S, et al. Estimation of reference evapotranspiration using machine learning models with limited data. AIMS Geosci. 2021;7(3):268–290.
  35. 35. Sammen SS, Kisi O, Al-Janabi AMS, Elbeltagi A, Zounemat-Kermani M. Estimation of Reference Evapotranspiration in Semi-Arid Region with Limited Climatic Inputs Using Metaheuristic Regression Methods. Water. 2023;15(19):3449.
  36. 36. Mirzania E, Vishwakarma DK, Bui QAT, Band SS, Dehghani R. A novel hybrid AIG-SVR model for estimating daily reference evapotranspiration. Arabian Journal of Geosciences. 2023;16(5):1–14.
  37. 37. Rashid Niaghi A, Hassanijalilian O, Shiri J. Estimation of reference evapotranspiration using spatial and temporal machine learning approaches. Hydrology. 2021;8(1):25.
  38. 38. Yu H, Wen X, Li B, Yang Z, Wu M, Ma Y. Uncertainty analysis of artificial intelligence modeling daily reference evapotranspiration in the northwest end of China. Computers and Electronics in Agriculture. 2020;176:105653.
  39. 39. Zhang T, Gao L, He C, Zhang M, Krishnamachari B, Avestimehr AS. Federated learning for the Internet of things: Applications, challenges, and opportunities. IEEE Internet of Things Magazine. 2022;5(1):24–29.
  40. 40. Manoj T, Makkithaya K, Narendra V. A federated learning-based crop yield prediction for agricultural production risk management. In: 2022 IEEE Delhi Section Conference (DELCON). IEEE; 2022. p. 1–7.
  41. 41. Kumar P, Gupta GP, Tripathi R. PEFL: Deep privacy-encoding-based federated learning framework for smart agriculture. IEEE Micro. 2021;42(1):33–40.
  42. 42. Nguyen DC, Ding M, Pathirana PN, Seneviratne A, Li J, Poor HV. Federated learning for internet of things: A comprehensive survey. IEEE Communications Surveys & Tutorials. 2021;23(3):1622–1658.
  43. 43. Imteaj A, Thakker U, Wang S, Li J, Amini MH. A survey on federated learning for resource-constrained IoT devices. IEEE Internet of Things Journal. 2021;9(1):1–24.
  44. 44. Gleason CJ, Durand MT. Remote sensing of river discharge: A review and a framing for the discipline. Remote Sensing. 2020;12(7):1107.
  45. 45. Fleming SW, Rittger K, Oaida Taglialatela CM, Graczyk I. Leveraging next-generation satellite remote sensing-based snow data to improve seasonal water supply predictions in a practical machine learning-driven river forecast system. Water Resources Research. 2024;60(4):e2023WR035785.
  46. 46. Zhu B, Feng Y, Gong D, Jiang S, Zhao L, Cui N. Hybrid particle swarm optimization with extreme learning machine for daily reference evapotranspiration prediction from limited climatic data. Computers and Electronics in Agriculture. 2020;173:105430.
  47. 47. Gong D, Hao W, Gao L, Feng Y, Cui N. Extreme learning machine for reference crop evapotranspiration estimation: Model optimization and spatiotemporal assessment across different climates in China. Computers and Electronics in Agriculture. 2021;187:106294.
  48. 48. Duhan D, Singh MC, Singh D, Satpute S, Singh S, Prasad V. Modeling reference evapotranspiration using machine learning and remote sensing techniques for semiarid subtropical climate of Indian Punjab. Journal of Water and Climate Change. 2023.
  49. 49. Aly MS, Darwish SM, Aly AA. High performance machine learning approach for reference evapotranspiration estimation. Stochastic Environmental Research and Risk Assessment. 2023; p. 1–25.
  50. 50. Nagappan M, Gopalakrishnan V, Alagappan M. Prediction of reference evapotranspiration for irrigation scheduling using machine learning. Hydrological Sciences Journal. 2020;65(16):2669–2677.
  51. 51. Dias SHB, Filgueiras R, Fernandes Filho EI, Arcanjo GS, Silva GHd, Mantovani EC, et al. Reference evapotranspiration of Brazil modeled with machine learning techniques and remote sensing. Plos one. 2021;16(2):e0245834. pmid:33561147
  52. 52. Mokari E, DuBois D, Samani Z, Mohebzadeh H, Djaman K. Estimation of daily reference evapotranspiration with limited climatic data using machine learning approaches across different climate zones in New Mexico. Theoretical and Applied Climatology. 2022;147:575–587.
  53. 53. Reis MM, da Silva AJ, Junior JZ, Santos LDT, Azevedo AM, Lopes ÉMG. Empirical and learning machine approaches to estimating reference evapotranspiration based on temperature data. Computers and electronics in agriculture. 2019;165:104937.
  54. 54. Elbeltagi A, Srivastava A, Al-Saeedi AH, Raza A, Abd-Elaty I, El-Rawy M. Forecasting long-series daily reference evapotranspiration based on best subset regression and machine learning in Egypt. Water. 2023;15(6):1149.
  55. 55. Rajput J, Singh M, Lal K, Khanna M, Sarangi A, Mukherjee J, et al. Data-driven reference evapotranspiration (ET0) estimation: a comparative study of regression and machine learning techniques. Environment, Development and Sustainability. 2023; p. 1–28.
  56. 56. Santos PABd, Schwerz F, Carvalho LGd, Baptista VBdS, Marin DB, Ferraz GAeS, et al. Machine Learning and Conventional Methods for Reference Evapotranspiration Estimation Using Limited-Climatic-Data Scenarios. Agronomy. 2023;13(9):2366.
  57. 57. Estévez J, Bellido-Jiménez JA, Liu X, García-Marín AP. Monthly precipitation forecasts using wavelet neural networks models in a semiarid environment. Water. 2020;12(7):1909.
  58. 58. Achite M, Jehanzaib M, Sattari MT, Toubal AK, Elshaboury N, Wałęga A, et al. Modern techniques to modeling reference evapotranspiration in a semiarid area based on ANN and GEP models. Water. 2022;14(8):1210.
  59. 59. Allen R, Smith M, Perrier A, Pereira LS, et al. An update for the definition of reference evapotranspiration. ICID bulletin. 1994;43(2):1–34.
  60. 60. Elbeltagi A, Raza A, Hu Y, Al-Ansari N, Kushwaha N, Srivastava A, et al. Data intelligence and hybrid metaheuristic algorithms-based estimation of reference evapotranspiration. Applied Water Science. 2022;12(7):152.
  61. 61. Wang J, Raza A, Hu Y, Buttar NA, Shoaib M, Saber K, et al. Development of monthly reference evapotranspiration machine learning models and mapping of Pakistan—A comparative study. Water. 2022;14(10):1666.
  62. 62. Salahudin H, Shoaib M, Albano R, Inam Baig MA, Hammad M, Raza A, et al. Using Ensembles of Machine Learning Techniques to Predict Reference Evapotranspiration (ET0) Using Limited Meteorological Data. Hydrology. 2023;10(8):169.
  63. 63. Raza A, Khaliq A, Hu Y, Zubair N, Acharki S, Zubair M, et al. Water Resources and Irrigation Management Using GIS and Remote Sensing Techniques: Case of Multan District (Pakistan). In: Surface and Groundwater Resources Development and Management in Semi-arid Region: Strategies and Solutions for Sustainable Water Management. Springer; 2023. p. 137–156.
  64. 64. Raza A, Saber K, Hu Y, L Ray R, Ziya Kaya Y, Dehghanisanij H, et al. Modelling reference evapotranspiration using principal component analysis and machine learning methods under different climatic environments. Irrigation and Drainage. 2023;72(4):945–970.
  65. 65. Rehman A, Jingdong L, Shahzad B, Chandio AA, Hussain I, Nabi G, et al. Economic perspectives of major field crops of Pakistan: An empirical study. Pacific Science Review B: Humanities and Social Sciences. 2015;1(3):145–158.
  66. 66. NASA. NASA The Data Access Viewer; 2023. Available from: https://power.larc.nasa.gov/data-access-viewer/.