Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Enhancing autonomous agriculture control systems in greenhouses for sustainable resource usage using deep learning techniques

  • Iman Hindi,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Industrial Engineering Department, Al Hussein Technical University, Amman, Jordan, Computer Engineering Department, The University of Jordan, Amman, Jordan

  • Adham Alsharkawi ,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Resources, Supervision, Validation, Writing – review & editing

    a.sharkawi@ju.edu.jo

    Affiliation Mechatronics Engineering Department, The University of Jordan, Amman, Jordan

  • Malik Al-Ajlouni,

    Roles Formal analysis, Validation, Writing – review & editing

    Affiliation Department of Horticulture and Crop Science, The University of Jordan, Amman, Jordan

  • Bassam Qarallah

    Roles Investigation, Validation, Writing – review & editing

    Affiliation Department of Horticulture and Crop Science, The University of Jordan, Amman, Jordan

Abstract

Greenhouse climate control is essential for optimizing crop growth while minimizing resource consumption in controlled environment agriculture. Traditional rule-based and fixed-action strategies often struggle to achieve a balance between these objectives. This paper proposes a reinforcement learning (RL) based framework for greenhouse climate control, integrating deep learning models to predict both crop growth and resource consumption. The framework enables an RL agent to optimize greenhouse control setpoints dynamically, maximizing crop yield while ensuring sustainable resource usage. The proposed system incorporates a Multi-Layer Perceptron (MLP) model to predict internal greenhouse climate conditions, a Long Short-Term Memory (LSTM) model for crop parameter estimation, and a separate LSTM model for forecasting daily resource consumption. These models collectively simulate a greenhouse environment where an RL agent learns to regulate temperature, CO2 concentration, and irrigation levels by interacting with the virtual environment. A custom reward function is designed to guide the agent, considering key crop parameters; stem elongation, stem thickness, and cumulative trusses; alongside resource consumption metrics, including heating, electricity, CO2, and irrigation costs. To enhance the adaptability of the RL agent, a feature-selection mechanism identified the most influential climate and control features, reducing observation complexity and accelerating convergence. Retraining under stochastic weather conditions strengthened robustness to dynamic environments, enabling the agent to consistently outperform fixed-action strategies. Evaluation revealed a stable Pareto frontier between yield and resource consumption, confirming that the framework accurately captured the productivity and sustainability trade-off and remained robust across varying reward-weight settings. Comparative analysis of multiple RL algorithms; Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Soft Actor-Critic (SAC), and Twin Delayed Deep Deterministic Policy Gradient (TD3) demonstrated that TD3 outperforms other algorithms, achieving the highest cumulative rewards and reaching optimal policies faster. Experimental evaluations demonstrate that the proposed TD3 RL-based greenhouse control system achieves higher crop yield growth rates while optimizing resource usage, outperforming conventional greenhouse control strategies. This study presents a novel data-driven, adaptive greenhouse management approach, bridging the gap between crop growth modeling and autonomous climate control, contributing to sustainable and intelligent agricultural practices.

1. Introduction

In the past decade, climate change has significantly impacted global agriculture, creating challenges that demand innovative solutions to ensure food security. As governments and organizations prioritize sustainable energy usage to mitigate carbon footprints, the agricultural sector has increasingly turned to greenhouse farming to address extreme weather conditions such as heat waves, frost, and heavy rainfall. Greenhouses offer a controlled environment that protects crops and optimizes growing conditions, making them an essential tool in meeting the food demands of a global population expected to increase by 2.3 billion by 2050 [1].

Jordan, one of the world’s most water-scarce nations, faces unique agricultural challenges due to its arid climate, limited water resources, and high sensitivity to climate variability. Agriculture in Jordan consumes 51% of the country’s available water [2], posing critical concerns about resource sustainability. As climatic pressures increase, the need for innovative farming solutions and efficient resource management becomes more urgent. Greenhouse farming is particularly promising in Jordan as it enhances agricultural productivity while mitigating external climate-related constraints. However, the efficient management of greenhouse environments presents significant complexity, requiring real-time adjustments to parameters such as light, water, temperature, CO2 levels, and nutrients to meet plant growth needs.

Traditional manual control methods often fall short, leading to inefficient resource usage, suboptimal crop yields, and reliance on experienced growers, which are scarce in Jordan. To overcome these challenges, the agricultural sector is increasingly adopting advanced technologies, including artificial intelligence (AI), deep learning, and reinforcement learning (RL) [3]. These technologies enable farmers to improve decision-making, enhance resource efficiency, and minimize environmental impacts, while ensuring higher crop productivity [4].

This study presents a novel AI-based solution to automate and optimize greenhouse climate control. The proposed approach integrates reinforcement learning (RL) with ma- chine learning models to predict crop growth and resource consumption, providing a robust framework for real-time control of greenhouse parameters. By continuously monitoring and adjusting factors like temperature, irrigation, and CO2 levels, the RL agent optimizes plant growth and reduces unnecessary resource use. Additionally, a feature selection mechanism is incorporated to enhance model efficiency by focusing on the most influential predictors, thereby simplifying the training process while maintaining accuracy. Key contributions of this research include:

  • Improved Productivity: Maximizes crop yields through optimal greenhouse control.
  • Resource Efficiency: Reduces excessive use of critical resources like water, energy, and CO2 while lowering operational costs.
  • Climate Resilience: Enhances the ability of greenhouses to withstand extreme weather conditions, ensuring stable crop production.
  • Generalizability: Provides insights that are applicable to different climatic conditions, extending the utility of the proposed framework.
  • Enhanced Research Capability: Offers tools for monitoring plant health and studying the impact of environmental variables on crop growth.

The primary objective of this study is to develop a robust, AI-driven greenhouse control system capable of improving crop yields and resource efficiency. Specific objectives include:

  • Designing a Robust AI Model: Developing an AI model that can adapt to varying climatic conditions to optimize greenhouse control
  • Addressing System Complexity: Using reinforcement learning algorithms to identify optimal control policies in non-linear, dynamic greenhouse environments.
  • Resource and Crop Growth Prediction: Employing Long Short-Term Memory (LSTM) models for forecasting daily resource consumption and crop growth parameters.
  • Integrating Feature Selection: Enhancing model efficiency by selecting the most influential features, thereby reducing computational complexity while preserving performance.

The research assumes the dataset used is recorded from advanced greenhouse infrastructure, including sensors and actuators for monitoring and controlling the environment, and uniform soil structure ensuring consistent water and nutrient distribution. The research focuses on cherry tomato cultivation as a representative crop for the study. Where the data sourced from the 2nd International Autonomous Greenhouse Challenge, providing a robust dataset for model training and evaluation [5] which contains the data from six teams participating in the Autonomous Greenhouse Challenge [6].

This research presents an innovative framework integrating reinforcement learning, deep learning, and feature selection to optimize greenhouse climate control, addressing critical challenges in controlled environment agriculture. By leveraging AI-driven approaches, the study tackles complexities such as resource overuse, climatic variability, and the scarcity of experienced growers, enabling the optimization of crop yields while ensuring sustainable resource efficiency. This work bridges the gap between advanced AI solutions and practical implementation, paving the way for more sustainable agricultural practices in greenhouse management.

2. Literature review

Increasing crop yield quantity and quality has been a primary focus for researchers globally. Efforts range from developing smart agricultural monitoring systems to designing robust frameworks that accurately detect plant needs based on critical factors such as soil moisture, weather conditions, and irrigation levels. The recent emergence of AI-based technologies, particularly machine learning (ML) and reinforcement learning (RL), has driven significant advancements in this domain. These technologies aim to replace traditional control systems, offering precision and efficiency in addressing the optimization challenges inherent in greenhouse environments.

2.1. Irrigation efficiency and smart watering

Efficient water management is critical for plant health. Researchers have employed IoT systems and AI algorithms to optimize irrigation processes. For instance, Gutiérrez-Gordillo et al. [7] identified leaf diameter as a key indicator of water stress. Data collection from IoT sensors has been used to train models, such as ensemble learning approaches [810] decision trees [11], and K-Nearest Neighbors (KNN) [12], for irrigation prediction. These models rely on soil moisture and weather data to determine water needs accurately.

In research of [1315], the data collected used the moisture levels for irrigation control (On/Off) using predefined threshold. And they study various machine learning algorithms. Sema et.al, used IoT system to collect farming data and they used ANN (Artificial Neural Networks) as an irrigation system controller, which is trained as a classifier to turn the pump On/Off based on different parameter ranges of soil moisture, air humidity, and air temperature [16].

Ben Abdallah et al. [17] utilized Random Forest models combined with FAO’s CLIMWAT tool [18] to estimate water requirements with high accuracy (RMSE = 0.0509, R2 = 0.99). However, mathematical modeling limits real-world adaptability, as noted by researchers using deep learning classifiers for soil moisture conditions [19]. While in [20] paper, the authors used an estimation of evapotranspiration (ETMPT) from the infrared thermometry to assess the determination of variable rate irrigation (VRI).

LSTM-based systems have also emerged for predictive irrigation. Lyu et al. [21] and Kashyap et al. [22] demonstrated LSTM’s ability to forecast soil moisture and water needs. Reinforcement learning (RL) models, such as deep Q-networks, have further enhanced irrigation optimization. Alibabaei et al. [23] trained RL agents to maximize crop yield while minimizing water usage. Their comparison of on-policy and off-policy RL approaches revealed that on-policy methods reduced water consumption by 20% [24].

Kamyshova et al. [25] proposed an innovative Phyto-indicator-based approach using computer vision for irrigation prescriptions, while Amir et al. [26] and Yumang et al. [27] focused on greenhouse-specific systems for cherry tomato irrigation using machine learning and fuzzy logic, respectively. Hindi et al. [28] introduced a cascaded-output ANN model, achieving superior irrigation scheduling accuracy using the data of second International Greenhouse Challenge.

2.2. Monitoring plant status and growth rate

Monitoring plant health and growth stages is vital for decision-making. Advanced imaging and AI techniques have been widely adopted. Petropoulou et al. [29] and Lin et al. [30] utilized depth imaging and segmentation algorithms for lettuce growth monitoring, optimizing plant spacing and harvest timing. A two-stage CNN architecture by Gang et al. [31] estimated greenhouse lettuce growth indices such as fresh weight and leaf area with high accuracy.

Zhang et al. [32] developed a U-net mapping model for phenotypic parameter extraction from point clouds, demonstrating high performance in biomass estimation. These advancements facilitate precise growth predictions and enhance crop management.

Recent studies also highlight the role of Digital Twin frameworks integrated with RL to monitor crop growth and adapt control strategies in real-time [33].

2.3. Agricultural environmental modeling

Agricultural environmental modeling integrates crop simulation and machine learning to address changing environmental conditions. Traditional models like DSSAT and APSIM simulate crop growth based on environmental variables. Recent advancements incorporate machine learning for improved predictions [34]. For example, Van Klompenburg et al. highlighted the synergy between machine learning and simulation models [35].

Recent agricultural intelligence studies have continued to advance predictive and control methodologies beyond conventional statistical models. Sha et al. [36] proposed the ZHPO-LightXBoost model to handle limited-sample prediction problems for pesticide residues, demonstrating how hybrid machine-learning pipelines can maintain accuracy under data scarcity, an issue also critical to greenhouse control. Similarly, Gong et al. [37] introduced a LiDAR–Inertial–Ultrasonic SLAM system for plant-factory automation, exemplifying precise environmental sensing and autonomous navigation in controlled cultivation environments. These advances reflect the broader frontier of AI-driven agriculture, where data efficiency and automation are central to sustainability goals.

Moreover, IoT and cloud-based greenhouse monitoring systems are gaining traction [38], enabling real-time environmental monitoring and precision control while reducing human intervention.

Digital Twin technology, as reviewed by Purcell et al., creates virtual agricultural system replicas, optimizing real-time decisions [39]. Multi-objective optimization frameworks combine genetic algorithms with simulation models to balance environmental and economic goals [40].

Beyond agriculture, emerging AI frameworks such as the Knowledge-driven Two-stage Modulation Network (KTMN) by Shi et al. [41] illustrate progress in multi-modal learning and adaptive reasoning. Although developed for visual question answering, such adaptive knowledge-fusion techniques share conceptual similarities with reinforcement-learning frameworks that balance multiple dynamic objectives.

2.4. Autonomous farm control systems

The integration of AI and automation has transformed greenhouse management. RL- based frameworks are at the forefront of this innovation. Iacomi et al. [42] demonstrated a double deep Q-learning model for irrigation and chemigation, optimizing resource use and crop yield. Similarly, DRL models like DDPG have been applied for automatic sugar beet control [43]. While Elavarasan and Vincent developed a Deep Recurrent Q-Network model, which integrates recurrent neural networks with Q-learning to predict crop yields [44].

Recent works have expanded RL applications with model predictive control (MPC) to enhance robustness under uncertainty. Mallick et al. [45] and Morcego Seix et al. [46] proposed RL-MPC hybrid models for greenhouse climate control, outperforming traditional MPC under complex and stochastic environments.

Notable advancements include the CropGym environment for RL applications in nitrogen management [47] and DRLIC systems for irrigation scheduling using wireless sensor networks [48]. Kerkhof and Keviczky’s predictive control models for cherry tomato greenhouses combined data-driven approaches with meteorological data [49].

Additionally, Lee et al. presented an AI-powered greenhouse system (AI-GECS) that integrates weather forecasts and crop physiological indicators, employing a hybrid CLSTM-CNN-BP model to optimize microclimate control [50]. Astiaso Garcia showed that AI-driven greenhouse systems substantially improve energy efficiency, though challenges remain in CO2 and water usage optimization [51].

Furthermore, Fan et al. [52] investigated novel deep eutectic solvents with high CO₂ adsorption capacity, emphasizing the importance of efficient carbon-management solutions. Their findings conceptually align with this study’s emphasis on optimizing CO₂ utilization within greenhouse systems to achieve sustainable climate control.

Platero-Horcajadas et al. demonstrated the successful integration of IoT technologies with RL for greenhouse control, achieving energy savings and reducing the need for human intervention [53]. Such hybrid approaches combining RL, IoT, MPC, and Digital Twins are shaping the future of autonomous greenhouse management, driving toward more sustainable and resilient agriculture.

Recent studies on greenhouse automation have explored reinforcement learning for climate control, model predictive control based on simplified crop or energy models, and digital twin–driven optimization frameworks. While these approaches demonstrate promising improvements in isolated objectives, such as temperature regulation, energy efficiency, or yield maximization; they typically rely on single-objective formulations, static or manually tuned policies, or computationally intensive physics-based models.

In contrast, the present work advances the state of the art by integrating data-driven predictive models (MLP for climate dynamics and LSTM for crop growth and resource consumption) within a unified multi-objective reinforcement learning framework. This design enables explicit and interpretable trade-offs between productivity and sustainability, robustness to stochastic weather variability, and practical feasibility for deployment without reliance on high-fidelity digital twins.

Our RL-based greenhouse framework advances prior work by enabling adaptive, data-driven control in uncertain and dynamic environments. LSTM models forecast climate conditions, crop growth, and resource consumption, providing accurate predictive signals for decision making.Reinforcement learning agents then optimize control policies that balance yield with resource efficiency, adapting dynamically to stochastic weather and operational variability. This integrated approach enhances robustness, captures the yield-resource trade-off, and addresses key sustainability challenges in modern greenhouse management.

3. Dataset description

The dataset utilized in this research originates from the 2nd International Autonomous Greenhouse Challenge (2022), conducted at the Greenhouse Horticulture Business Unit, Wageningen Research, in Bleiswijk, The Netherlands. This dataset provides comprehensive time-series data collected over a six-month period of cherry tomato production across six high-tech greenhouse compartments. It includes external meteorological data, greenhouse climate conditions, climate control setpoints, actuator responses, daily resource consumption, and crop growth observations.

The dataset was generated during a competition where five international teams; The Automators, AICU, IUA.CAAS, Digilog, and Automatoes; developed AI-driven strategies to autonomously manage greenhouse operations. Their objective was to maximize net profit by minimizing resource consumption (water, energy, CO2) while ensuring optimal crop yield and quality. The AI-controlled compartments were compared against a manually operated reference compartment, managed by experienced Dutch growers, providing a benchmark for evaluating AI-driven greenhouse management.

The dataset includes both raw and processed sensor data, capturing climate control actions, irrigation metrics, actuator statuses, and realized setpoints. Advanced sensor networks and climate measurement devices were employed, complemented by manual crop growth observations and irrigation sample analyses. This dataset serves as a valuable foundation for training and validating AI models, supporting the development of reinforcement learning- based greenhouse control frameworks for sustainable and optimized agricultural practices.

4. Methodology

The methodology employed in this study is illustrated in Fig 1 outlining the structure of the proposed autonomous greenhouse control system. The approach is organized into two main components: The first is a virtual environment for greenhouse climate, crop parameters, and resource consumption estimations, and the second is a reinforcement learning (RL) agent for optimizing greenhouse setpoints.

thumbnail
Fig 1. The structure of the proposed methodology for autonomous greenhouse control system.

https://doi.org/10.1371/journal.pone.0344946.g001

The autonomous control cycle begins with sensor acquisition of external weather and internal climate; the data enter the MLP-based climate estimation and LSTM-based crop/ resource forecasting. The estimated observations enter the RL agent that determines weekly setpoints for CO₂, heating, ventilation, and irrigation, completing a closed-loop optimization of yield and sustainability.

4.1. Virtual environment simulator

The virtual environment simulates greenhouse operations by integrating machine learning models to estimate outcomes based on external weather and control setpoints. Fig 2 presents the virtual environment architecture, consisting of:

thumbnail
Fig 2. The structure of the proposed virtual environment simulator for autonomous greenhouse control system.

https://doi.org/10.1371/journal.pone.0344946.g002

  • Greenhouse Climate Estimation: An MLP model predicts greenhouse climate based on outside weather and control setpoints.
  • Crop Parameter Estimation: An LSTM model predicts weekly crop parameters, including stem elongation, thickness, and cumulative trusses.
  • Resource Consumption Estimation: Another LSTM model predicts daily resource consumption, such as heating, CO2 usage, electricity, and irrigation water.

The environment enables RL agents to train by simulating realistic greenhouse dynamics, facilitating the optimization of crop yield and resource efficiency.

4.1.1. Greenhouse climate estimation.

A Multi-Layer Perceptron (MLP) model was developed to predict greenhouse climate parameters. Tables 13 detail the external weather and control setpoints used as inputs, and greenhouse climate parameters as outputs. Data normalization, exploratory data analysis (EDA), and temporal correlation analyses (Figs 37) informed feature selection and model design.

thumbnail
Fig 3. Visualization of the correlation of CO2air with outside weather parameters.

https://doi.org/10.1371/journal.pone.0344946.g003

thumbnail
Fig 4. Visualization of the correlation of HumDef with outside weather parameters.

https://doi.org/10.1371/journal.pone.0344946.g004

thumbnail
Fig 5. Visualization of the correlation of Rhair with outside weather parameters.

https://doi.org/10.1371/journal.pone.0344946.g005

thumbnail
Fig 6. Visualization of the correlation of Tair with outside weather parameters.

https://doi.org/10.1371/journal.pone.0344946.g006

thumbnail
Fig 7. Visualization of the correlation of Total_PAR with outside weather parameters.

https://doi.org/10.1371/journal.pone.0344946.g007

The MLP architecture consisted of two hidden layers with ReLU activation. The model achieved superior predictive accuracy (MAE = 0.06) on test data, as shown in the training curves in Fig 8.

thumbnail
Fig 8. Training and validation loss curves over the epochs.

https://doi.org/10.1371/journal.pone.0344946.g008

4.1.2. Crop parameter and resource consumption estimation.

An LSTM-based model was trained to predict weekly crop parameters Table 4 using time-series data after Normalization in range (0–1) using MinMaxScaler. The model architecture incorporated 1D convolutional layers for feature extraction and LSTM layers for sequential pattern learning. Feature importance analysis (Fig 9) using random forest guided the selection of key features, improving model performance. The LSTM model achieved an MAE of 0.16 on the test set (Table 5), outperforming the MLP model.

thumbnail
Table 4. Description of crop parameters and corresponding units.

https://doi.org/10.1371/journal.pone.0344946.t004

thumbnail
Table 5. MAE test loss of crop parameters for trained LSTM and MLP models with and without feature selection.

https://doi.org/10.1371/journal.pone.0344946.t005

thumbnail
Fig 9. Correlation of Stem Elongation with greenhouse control setpoints.

https://doi.org/10.1371/journal.pone.0344946.g009

Resource consumption models predicted daily usage metrics, including heating, electricity, CO2, and irrigation (Table 6). The analysis of the feature importance (Figs 1015) identified critical factors such as humidity, temperature, and irrigation intervals. The LSTM model demonstrated better performance (MAE = 0.16) compared to the MLP model (Table 7).

thumbnail
Table 6. Description of resource consumption parameters and corresponding units.

https://doi.org/10.1371/journal.pone.0344946.t006

thumbnail
Table 7. MAE loss of resource consumptions for trained LSTM and MLP models with and without feature selection on test set.

https://doi.org/10.1371/journal.pone.0344946.t007

thumbnail
Fig 10. Correlation of Heat Consumption with climate setpoints.

https://doi.org/10.1371/journal.pone.0344946.g010

thumbnail
Fig 11. Correlation analysis for High Electrical Consumption.

https://doi.org/10.1371/journal.pone.0344946.g011

thumbnail
Fig 12. Low Electrical Consumption correlation with setpoints.

https://doi.org/10.1371/journal.pone.0344946.g012

thumbnail
Fig 13. Correlation between CO2 Consumption with setpoints.

https://doi.org/10.1371/journal.pone.0344946.g013

thumbnail
Fig 14. Irrigation Consumption correlation with setpoints.

https://doi.org/10.1371/journal.pone.0344946.g014

thumbnail
Fig 15. Important features for predicting resource consumption using random forest.

https://doi.org/10.1371/journal.pone.0344946.g015

While the LSTM-based models in this study demonstrated strong predictive performance for both crop parameters and resource consumption, it is important to acknowledge that more advanced time series architectures may further enhance adaptability under complex and irregular environmental conditions. Recent studies, such as [54], have introduced Transformer-based models for dynamic system degradation prediction, highlighting their superiority over traditional RNNs in capturing long-term dependencies and irregular fluctuations. In future work, the LSTM framework presented here can be extended to include Transformer architecture, allowing a direct comparison of their adaptability to extreme weather sequences and environmental variability. Such exploration would provide additional robustness and improve resilience of greenhouse control systems under challenging scenarios.

4.1.3. Reinforcement learning gym environment.

The designed reinforcement learning aimed to optimize greenhouse setpoints for temperature, humidity, CO2 levels, and irrigation, balancing crop yield and resource efficiency.

The interaction between models and the RL agent was designed around a weekly control cycle, consistent with crop growth and resource reporting in the dataset. Each environment step spans 2016 timesteps, corresponding to 5-minute intervals across 7 days. At the beginning of each step, the RL agent proposes a set of weekly control setpoints. These actions are combined with weather inputs and processed by the MLP climate estimator to predict greenhouse conditions at hourly resolution. The predicted climate variables and control setpoints are then used as inputs to two LSTM models: one forecasting weekly crop parameters and the other daily resource consumption, which is aggregated into weekly totals. The outputs of these models are synchronized by indexing the corresponding time window (2016 timesteps per week), ensuring that all predictions reflect the same control actions and environmental conditions. The updated state — including weather, climate, crop, and resource values is then provided back to the RL agent before the next decision step. This structure maintains temporal consistency between predictions and decisions while aligning the simulation with practical weekly greenhouse management cycles.

Observation and Action Spaces: The RL agent observed greenhouse climate, crop parameters, resource consumption, and external weather by state function as dictionary. The agent adjusted continuous setpoints (e.g., CO2, irrigation intervals, heating levels) to optimize greenhouse conditions (Table 8). These action setpoints are selected by the Agent from continuous range between (0–1).

thumbnail
Table 8. Action set points which the RL agent can take to control greenhouse environment.

https://doi.org/10.1371/journal.pone.0344946.t008

Reward Function: Designing an effective reward function is pivotal for guiding a reinforcement learning (RL) agent toward achieving the dual objectives of maximizing crop growth and minimizing resource consumption in a greenhouse environment. The reward function must account for multiple objectives, balancing productivity and sustainability, while ensuring stable and effective control strategies.

The reward function (Equation 1) balanced crop growth metrics (stem elongation, thickness, trusses) with resource penalties (e.g., heating and irrigation). Efficiency factors and stability penalties ensured optimal and consistent control actions. The inclusion of incentives (e.g., additional rewards for high crop quality) further guided the agent’s learning.

(1)

Where: α = 1.0 and β = 0.2 are scaling factors for the crop reward and resource penalty, respectively, and δ = 0.01, γ = 0.001 are scaling factors for efficiency factor and stability penalty, respectively. Punishment and big rewards capture additional incentives or penalties based on crop and resource conditions.

The crop growth reward encourages the RL agent to enhance critical crop parameters, including Stem Elongation, Stem Thickness, and Cumulative Trusses.

Each parameter is normalized against its maximum value to ensure consistency across different scales. The crop growth reward, crop_reward, is computed as a weighted sum of these normalized values as equation 2:

(2)

Where: w1 = 0.4, w2 = 0.3, and w3 = 0.3 are weights assigned to each parameter.

To discourage excessive use of resources, a resource penalty, the resource_penalty, is applied. The penalty considers key resources such as: Heat Consumption, CO2 Consumption, Electricity Consumption (High and Low), and Irrigation.

The penalty is computed as Equation 3 below:

(3)

Where: p1 = 0.2, p2 = 0.3, p3 = 0.2, and p4 = 0.3 are the respective weights that adjust the importance of each resource.

To guide the agent toward desirable outcomes, additional incentives and penalties are incorporated:

  • Incentives for High Crop Quality: Rewards (+0.2) are given when individual crop parameters exceed thresholds (e.g., ≥ 0.7), with higher rewards (+1.0) for all parameters reaching (0.8).
  • Penalties for Low Crop Quality: Penalties (0.1) are imposed if any parameter falls below 0.5, with severe penalty (1.0) for all parameters below 0.5.
  • Resource Overuse Punishment: Extra penalties (1.0) are applied when resource consumption exceeds predefined thresholds, triggering an episode termination.

A stability penalty promotes smooth control actions by discouraging abrupt changes. It is calculated as the following equation 4:

(4)

Where: S=0.001 is scaling factor for stability penalty.

The efficiency factor measures the agent’s ability to achieve crop growth with minimal resource use. It is computed as Equation 5:

(5)

The weighting coefficients (α, β, w₁–₃, p₁–₄) were selected according to agronomic significance and pilot tuning. Greater weight was assigned to stem elongation and irrigation cost, which most directly influence biomass gain and resource sustainability.

Termination criteria: Termination criteria are employed to ensure sustainable greenhouse management by limiting episodes to 23 steps representing 23 weeks, enforcing crop quality thresholds, and penalizing excessive resource consumption. These criteria, illustrated in Fig 16, guide the agent’s learning by balancing crop productivity and resource efficiency.

thumbnail
Fig 16. The structure of termination criteria during each episode.

https://doi.org/10.1371/journal.pone.0344946.g016

4.1.4. Environment enhancement with feature selection.

The enhancement process starts with evaluating the initial environment, which uses pre-trained estimators on all features, establishing a baseline for the RL agent’s training performance. A feature selection mechanism, using a random forest model, is then introduced to identify and focus on critical features, reducing system complexity. This maintains the agent’s baseline performance while improving learning efficiency by concentrating on significant features. Fig 17 illustrates the enhanced environment with feature selection integrated before each estimation process.

thumbnail
Fig 17. Enhanced RL environment structure with feature selection.

https://doi.org/10.1371/journal.pone.0344946.g017

Feature importance was computed using Random Forest models, ensuring temporal stability of selected features across the control horizon. While alternative feature-selection approaches such as PCA or SHAP could be considered, Random Forest importance was selected for its robustness and interpretability in nonlinear, multivariate greenhouse environments.

The RL agent was then retrained on the environment after applying feature selection. Feature selection improved training efficiency by reducing models’ complexity. Figs 1820 show the identified key features for each estimator using Random Forest importance analysis. The implemented feature selection mechanism using Random Forest importance ranking was used for each estimator (MLP climate prediction, LSTM crop growth, LSTM resource consumption). Models trained on selected features demonstrated faster convergence and lower loss values (Figs 21, 22).

thumbnail
Fig 18. Feature importance for predicting greenhouse control parameters.

https://doi.org/10.1371/journal.pone.0344946.g018

thumbnail
Fig 19. Feature importance distribution for resource consumption.

https://doi.org/10.1371/journal.pone.0344946.g019

thumbnail
Fig 20. Feature importance for predicting all crop parameters.

https://doi.org/10.1371/journal.pone.0344946.g020

thumbnail
Fig 21. MLP model’s train and validation MAE loss overtraining epochs.

https://doi.org/10.1371/journal.pone.0344946.g021

thumbnail
Fig 22. LSTM model train and validation MAE loss overtraining epochs.

https://doi.org/10.1371/journal.pone.0344946.g022

The implemented feature selection mechanism using Random Forest importance ranking was used for each estimator (MLP climate prediction, LSTM crop growth, LSTM resource consumption), the top features were identified and used as inputs to reduce model complexity. Table 9 lists the top selected features per model, ranked by their importance scores. This selection preserved predictive performance while reducing observation space dimensionality, thereby accelerating RL agent convergence.

thumbnail
Table 9. Top selected features per model based on Random Forest importance ranking.

https://doi.org/10.1371/journal.pone.0344946.t009

4.1.5. Environment enhancement with stochastic external weather.

The RL agent’s robustness was evaluated by extending the virtual environment to include random weather data generation. Initially tested with historical weather data, the environment was later modified to simulate dynamic and unpredictable conditions to assess the models’ adaptability. Observing a performance decline, the RL agent was re-trained in the enhanced environment, enabling it to learn strategies for adapting to stochastic weather patterns.

4.1. Training and evaluation of reinforcement learning algorithms

To address the challenges of greenhouse climate control, multiple reinforcement learning (RL) algorithms were implemented, trained, and compared within a virtual environment. The algorithms evaluated include Proximal Policy Optimization (PPO) [55], Deep Deterministic Policy Gradient (DDPG) [56], Soft Actor-Critic (SAC) [57], and Twin Delayed Deep Deterministic Policy Gradient (TD3) [58]. Among these, PPO was selected as the baseline reference model due to its stability and effectiveness in handling continuous action spaces [55].

4.2.1. Description of reinforcement learning algorithms used.

The performance of each RL algorithm was evaluated against the PPO baseline and fixed-action strategies employed by other teams in the Autonomous Greenhouse Challenge. The virtual environment was enhanced to incorporate stochastic weather data generation, challenging the algorithms to adapt and generalize under dynamic and unpredictable conditions. A brief overview of the RL algorithms is provided below:

Proximal Policy Optimization (PPO): PPO leverages a policy-gradient approach with a clipped surrogate objective to ensure stable training. By limiting policy updates, it avoids large deviations that may destabilize learning [55].

Deep Deterministic Policy Gradient (DDPG): DDPG is an actor-critic algorithm de- signed for continuous action spaces. It employs deterministic policy updates alongside experience replay and target networks for stable training [56].

Soft Actor-Critic (SAC): SAC is an off-policy algorithm that optimizes a trade-off between expected reward and entropy, encouraging exploration. Its design makes it particularly effective in environments with stochastic dynamics [57].

Twin Delayed Deep Deterministic Policy Gradient (TD3): TD3 extends DDPG by addressing overestimation bias through twin Q-networks and improving stability with policy delay and target smoothing techniques [58].

4.2.2. Training configuration and hyperparameters.

Each RL algorithm was configured with tailored hyperparameters to ensure robust performance and effective learning. Table 10 summarizes the key hyperparameters used for training, which include custom network architectures, entropy tuning, exploration noise, and soft target updates. These configurations facilitated consistent evaluation across all algorithms.

To ensure a fair and reproducible comparison, all reinforcement learning algorithms (PPO, DDPG, SAC, and TD3) were trained under identical environments, observation spaces, episode horizons, and evaluation protocols, and were allocated the same computational budget in terms of total interaction steps and training duration. Hyperparameter tuning strategies and values are reported in Table 10.

Stochastic weather conditions were intentionally incorporated into the environment and regenerated at the beginning of each episode, exposing all agents to diverse environmental realizations during training and evaluation. This design enhances robustness and prevents overfitting to a single weather trajectory, while maintaining a consistent and fair comparison since all algorithms experience the same stochastic generation mechanism.

For each algorithm, extensive training and hyperparameter tuning were conducted, and the reported results correspond to a representative finalized configuration. The hyperparameters were selected to balance computational efficiency and learning performance. For instance, PPO’s small learning rate ensures training stability, while SAC’s automatic entropy tuning dynamically balances exploration and exploitation. Similarly, TD3 mitigates overestimation bias with twin Q-networks and policy delay, while DDPG enhances exploration through Ornstein-Uhlenbeck noise.

4.2.3. Training process and monitoring.

The training process spanned 120,000 steps and involved iterative interactions with the environment. At each timestep, the RL agent observed greenhouse conditions, selected control actions based on its policy, executed the actions in the environment, and recorded the resulting states and rewards. Periodic policy and value updates were performed using experience replay buffers. To monitor progress, the following strategies were employed:

  1. Evaluation Callback: Agents were evaluated every 10,000 steps, and the best- performing models were saved for reference.
  2. Checkpoint Callback: Model checkpoints were saved every 10,000 steps, enabling recovery if necessary.
  3. Visualization Tools: Training progress was visualized using TensorBoard, which logged metrics such as cumulative rewards, loss values, and episodic rewards.

4.2.4. Performance evaluation.

After training, the RL algorithms were assessed based on their ability to optimize green- house control. Performance metrics included cumulative rewards, crop growth parameters, and resource consumption levels. The results were benchmarked against the PPO baseline and fixed-action strategies from the Autonomous Greenhouse Challenge. The analysis highlighted the effectiveness of trained RL agents in maximizing crop yields while maintaining resource efficiency under both static and dynamic environmental conditions. In summary the methodology integrates advanced ML and RL techniques for autonomous greenhouse control. Key innovations include a custom virtual environment simulating dynamic greenhouse conditions, feature selection to enhance model efficiency and RL agent performance, and a robust RL framework capable of optimizing greenhouse control strategies under varying environmental conditions.

5. Results

This section presents experimental results evaluating the RL-based framework designed to optimize greenhouse control strategies. The results are organized as follows: (1) performance evaluation of the PPO-baseline RL agent in different greenhouse environments with and without feature selection, and with/without random weather generation, (2) comparison of different RL algorithms during training, (3) comparative analysis of the best performed RL-based agent (TD3) against fixed-action strategies, and (4) evaluation of crop yield and resource consumption metrics. The findings are discussed in the context of their implications for sustainable agriculture.

5.1. Performance of PPO baseline RL agent

The PPO-RL agent’s performance was assessed over multiple training episodes, simulating complete growing cycles for cherry tomato crops. Key performance indicators included cumulative rewards, training results. The performance during training for the PPO-RL agent on both environments: (1) The full-features environment and the environment with feature selection, (2) environment with/without random weather generations were recorded.

5.1.1. Evaluation with and without feature selection.

The PPO-RL agent underwent extensive training in environments with and without feature selection. The cumulative rewards shown in Fig 23 demonstrate the agent’s ability to learn and adapt to greenhouse conditions. Training in the feature-selected environment achieved optimal strategies in fewer episodes compared to the full-featured environment, reducing the total training time and complexity.

thumbnail
Fig 23. Cumulative rewards over episodes of full featured environment compared with environment with feature selection.

https://doi.org/10.1371/journal.pone.0344946.g023

The feature-selected environment reduced training time and improved efficiency by focusing on critical features. The RL agent achieved optimal strategies more quickly, confirming the benefits of reduced complexity.

5.1.2. Evaluation under stochastic weather.

To evaluate robustness, the RL agent was tested under randomly generated weather conditions. Initially trained on historical weather data, the agent exhibited reduced performance under dynamic conditions. Retraining in the stochastic environment significantly improved its adaptability (Figs 2426). After training, the RL agent was tested in a random weather generation environment to assess its robustness under dynamic conditions.

thumbnail
Fig 24. Explained variance of the RL agent over the training episodes for stochastic and deterministic environment.

https://doi.org/10.1371/journal.pone.0344946.g024

thumbnail
Fig 25. Evaluation mean-episode reward of RL-Agent for stochastic and deterministic environment.

https://doi.org/10.1371/journal.pone.0344946.g025

thumbnail
Fig 26. Training value loss of RL agent over the training episodes for stochastic and deterministic environment.

https://doi.org/10.1371/journal.pone.0344946.g026

Fig 27 indicated that while the PPO agent outperformed team strategies under historical weather data, its performance declined under random weather conditions (Fig 28).

thumbnail
Fig 27. Accumulated rewards of PPO agent’s relative to the team strategies on historical weather Environment.

https://doi.org/10.1371/journal.pone.0344946.g027

thumbnail
Fig 28. Accumulated rewards of PPO and team strategies under random weather conditions.

https://doi.org/10.1371/journal.pone.0344946.g028

Re-training the PPO agent in the random weather environment significantly improved performance, as demonstrated in Fig 29, underscoring its ability to adapt to stochastic inputs.

thumbnail
Fig 29. Improved rewards of the PPO agent after re-training in the random weather generation environment relative to the team strategies.

https://doi.org/10.1371/journal.pone.0344946.g029

5.2. Performance comparison between RL algorithms

This section compares the performance of four reinforcement learning (RL) algorithms (PPO, DDPG, SAC, and TD3) on the virtual greenhouse environment. The comparison focuses on their ability to maximize rewards, adapt to stochastic weather conditions, and optimize resource efficiency.

The performance of PPO, DDPG, SAC, and TD3 algorithms was compared in the green- house environment. Fig 30 illustrates the rollout mean rewards over training episodes. TD3 achieved the highest rewards and stable convergence, outperforming SAC, DDPG, and PPO. The average rewards achieved by each algorithm are summarized in Fig 31, with TD3 leading at 51.8, followed by SAC (46.7), DDPG (36.0), and PPO (29.4).

thumbnail
Fig 30. Rollout mean rewards over episodes for PPO, DDPG, SAC, and TD3.

https://doi.org/10.1371/journal.pone.0344946.g030

The superior performance of TD3 in our experiments can be attributed to its algorithmic enhancements over baseline actor-critic methods. By employing twin Q-networks, TD3 mitigates overestimation bias common in continuous control, ensuring more reliable value estimates. Policy delay prevents premature updates that could destabilize training, while target policy smoothing reduces the impact of noise on policy updates. These features align well with the greenhouse control problem, where multi-variable coupling and actuator limits demand stable, incremental policy adjustments. Compared to SAC, TD3’s deterministic policy outputs allow for finer-grained control, and its exploration strategy better balances exploitation of known optimal setpoints with exploration under stochastic weather conditions.

5.3. Comparison analysis of best agent (TD3) with teams fixed strategies

The TD3 agent was benchmarked against fixed-action strategies from competing teams, with comparisons focusing on cumulative rewards, crop yield, and resource consumption. The comparative evaluation follows the setup of the 2nd International Autonomous Greenhouse Challenge, in which participating teams applied diverse AI and data-driven control algorithms, while a reference team operated the greenhouse manually using expert-defined fixed setpoints. Specifically, the participating teams (The Automators, AICU, IUA.CAAS, Digilog, and Automatoes) employed autonomous AI-based algorithms for climate, irrigation, and crop management, while the reference compartment reflected traditional expert-driven greenhouse operation. All teams are serving as the baselines for evaluating the efficiency of data-driven policies. the proposed TD3 agent’s control performance compared against both static data-driven and expert-controlled strategies, demonstrating superior adaptability and sustainability performance. The TD3 agent achieved the most favorable trade-off among all tested algorithms, realizing a 24.05% reduction in irrigation consumption while maintaining yield performance and only modestly increasing heating and electricity use.

5.3.1. Cumulative rewards.

The cumulative rewards for TD3 and teams strategies have been calculated, Fig 32 shows that the TD3 agent consistently achieved higher cumulative rewards compared to all team strategies, highlighting its adaptability and dynamic optimization capabilities.

thumbnail
Fig 32. Cumulative rewards of TD3compared to team strategies.

https://doi.org/10.1371/journal.pone.0344946.g032

5.3.2. Crop yield parameters.

Table 11 summarizes crop yield metrics, including stem elongation, thickness, and cumu- lative trusses. The TD3 agent outperformed traditional strategies, achieving a significant improvement in cumulative trusses. The TD3 agent demonstrated significant improvements in crop yield parameters, particularly cumulative trusses, as illustrated in Figs 3335. These improvements highlight the TD3 agent’s ability to optimize crop growth while balancing resource allocation.

thumbnail
Table 11. Average crop yield parameters across all teams for full episode.

https://doi.org/10.1371/journal.pone.0344946.t011

thumbnail
Fig 33. Weekly greenhouse stem elongation result during full episode for all teams.

https://doi.org/10.1371/journal.pone.0344946.g033

thumbnail
Fig 34. Weekly greenhouse normalized stem thickness results during full episode for all teams.

https://doi.org/10.1371/journal.pone.0344946.g034

thumbnail
Fig 35. Weekly greenhouse normalized cumulative trusses results during full episode for all teams.

https://doi.org/10.1371/journal.pone.0344946.g035

5.3.3. Resource consumption metrics.

Resource usage metrics, including CO2, heat, electricity and irrigation of TD3 agent were compared across all strategies. Table 12 and Figs 3640 show that TD3 agent maintained competitive resource efficiency while achieving higher crop yields. The adaptive nature of the TD3 agent allowed it to dynamically optimize resource use based on greenhouse conditions.

thumbnail
Table 12. Cumulative sum of resource usage across all teams.

https://doi.org/10.1371/journal.pone.0344946.t012

thumbnail
Fig 36. Greenhouse heat consumption during full episode for all teams.

https://doi.org/10.1371/journal.pone.0344946.g036

thumbnail
Fig 37. Greenhouse electricity low result during full episode for all teams.

https://doi.org/10.1371/journal.pone.0344946.g037

thumbnail
Fig 38. Greenhouse electricity high result during full episode for all teams.

https://doi.org/10.1371/journal.pone.0344946.g038

thumbnail
Fig 39. Greenhouse CO2 consumption during full episode for all teams.

https://doi.org/10.1371/journal.pone.0344946.g039

thumbnail
Fig 40. Greenhouse Irrigation result during full episode for all teams.

https://doi.org/10.1371/journal.pone.0344946.g040

The RL-based approach demonstrated superior adaptability and efficiency in greenhouse control, achieving:

  • Improved crop yields, particularly in cumulative trusses, compared to fixed-action strategies.
  • Competitive resource efficiency across metrics with the TD3 agent achieving optimal resource utilization, particularly a 24.05% reduction in irrigation.
  • Enhanced robustness under stochastic weather conditions after re-training.

The experimental results indicate that the Twin Delayed Deep Deterministic Policy Gradient (TD3) agent consistently achieved a balanced performance in resource utilization optimization compared to other benchmark strategies, as evaluated within the simulated greenhouse environment. The agent’s adaptive policy enabled dynamic decision-making, which resulted in satisfactory crop outcomes while effectively managing resource consumption.

Table 13 presents the calculated deviation ratios of the TD3 agent’s resource consumption relative to the average values reported by other participating teams. The deviation ratio for each resource was computed using the following formula:

thumbnail
Table 13. The resource consumption deviation ratios for TD3 agent relative to the average of other teams.

https://doi.org/10.1371/journal.pone.0344946.t013

(6)

The deviation ratios underscore the TD3 agent’s strategic approach to balancing resource efficiency with crop performance across key metrics. A deviation of −1.86% in CO2 consumption suggests a slightly more conservative dosing strategy, likely maintaining photosynthetic efficiency while reducing CO2 usage. The 0.63% increase in high electricity consumption may be attributed to targeted lighting during critical growth phases, while the 1.59% increase in low electricity usage reflects optimized energy management during off-peak periods. Additionally, the 1.6% rise in heat consumption indicates a focus on maintaining stable temperatures to support consistent crop yield quality. Most notably, the −24.05% deviation in irrigation highlights the agent’s ability to implement advanced water conservation measures without compromising crop productivity.

5.3.4. Reward sensitivity analysis.

To evaluate the robustness of the framework, we performed a structured sweep of the reward weights α (crop yield) and β (resource consumption). The Pareto frontier (Fig 41) highlights the trade-off between yield maximization and resource efficiency, with non-dominated solutions spanning multiple operating points. The corresponding heatmap (Fig 42) confirms that increasing α systematically favors yield-oriented policies, while higher β values promote resource-efficient behaviors.

thumbnail
Fig 41. Pareto frontier of yield versus resource consumption under different reward weightings.

https://doi.org/10.1371/journal.pone.0344946.g041

thumbnail
Fig 42. Sensitivity heatmap of average reward as a function ofα (yield weight) and β (resource weight).

https://doi.org/10.1371/journal.pone.0344946.g042

The Pareto points are clustered within a narrow band, which is expected given the normalization of crop growth and resource caps. Within this compressed scale, the agent’s solutions delineated a consistent frontier, indicating that incremental yield gains necessarily required proportional increases in resource use. Complementary sensitivity analysis further confirmed this observation, as varying (α, β) shifted the mean reward surface smoothly, showing robustness to different preference settings.

These results indicate that the observed performance gains are not driven by a narrowly tuned choice of α and β but remain stable across a range of yield-resource trade-off configurations, demonstrating robustness with respect to the primary reward weights.

In summary, the TD3 agent achieved a favorable trade-off between resource usage and yield optimization. Minor increases in specific parameters (e.g., heating and electricity) are justified by their role in enhancing crop growth conditions. The significant reduction in irrigation consumption stands out as a key indicator of the agent’s potential in advancing sustainable greenhouse management. These findings affirm the applicability of reinforcement learning in optimizing resource use while supporting productive and sustainable agricultural practices.

6. Discussion

This study evaluates the application of a reinforcement learning (RL) based framework for optimizing greenhouse control strategies, addressing the complexities of managing dynamic environments. The RL agent demonstrated notable improvements in crop growth and resource efficiency, highlighting its potential to advance sustainable agricultural practices. Among the RL algorithms tested, the TD3 agent consistently outperformed others, achieving higher cumulative rewards, better crop yield metrics, and balanced resource usage. The findings align with existing literature on AI-driven agricultural management while extending its scope to include a unified multi-objective RL framework.

The RL agent displayed several strengths that underscore its effectiveness in greenhouse management:

  • Crop Growth Management: The TD3 agent achieved significant improvements in crop metrics, such as stem elongation, stem thickness, and cumulative trusses. The TD3 agent consistently outperformed other strategies in cumulative trusses, highlighting its ability to optimize crop yield effectively.
  • Resource Efficiency: The TD3 RL agent exhibited balanced resource utilization, achieving reductions in CO2 and irrigation consumption while maintaining competitive electricity and heat usage. These results emphasize the agent’s capability to optimize resource allocation for sustainability.
  • Reward Maximization: The TD3 agent attained the highest cumulative rewards among all algorithms, reflecting its ability to optimize trade-offs between crop growth and resource efficiency dynamically.

5.4. Implications for greenhouse management

The study highlights the potential of RL-based systems to revolutionize greenhouse management by optimizing crop yields and resource use. These AI-driven approaches address the dual challenges of global food demand and climate change by enhancing sustainability in agricultural practices. The integration of RL into autonomous greenhouses represents a significant step toward efficient, scalable, and sustainable farming systems.

5.5. Training performance of PPO baseline

The Proximal Policy Optimization (PPO) algorithm served as the baseline for evaluating RL performance. The agent’s training results revealed the following:

  • Rollout Mean Rewards: As shown in Fig 30, the PPO agent displayed a steady upward trend in cumulative rewards across episodes, indicating effective learning and policy convergence. However, its performance lagged behind that of the TD3 agent due to its conservative policy updates.
  • Explained Variance: Stabilization near 0.8 (Fig 24) suggests that the PPO agent accurately predicted outcomes based on its policy, ensuring effective learning and decision-making.
  • Value Loss: A steep initial reduction in value loss, followed by gradual stabilization (Fig 26), demonstrated the PPO agent’s ability to minimize errors over time.

While the PPO agent performed competitively, robust algorithms were necessary to train and test to surpass leading teams such as “IUACAAS” (Fig 29).

6.3. Cumulative rewards and strategy comparison

  1. Impact of Random Weather Generation: Testing the PPO agent and fixed-action strategies under randomized weather conditions (Fig 28) revealed a decline in performance across all strategies. This finding highlights the limited adaptability of fixed-action policies and underscores the importance of refining RL-based approaches for stochastic environments.
  2. Re-Training Under Random Weather: Re-training the RL agent under these randomized weather conditions significantly enhanced its robustness, allowing it to outperform fixed-action strategies in varying environments (Fi 29). This improvement demonstrates the RL agent’s capacity to generalize under dynamic conditions, a critical requirement for real-world greenhouse applications.
  3. Fixed-Action Strategy Robustness: Fixed-action strategies, while effective under historical weather data, showed significant performance drops in randomized environments. This limitation validates the advantage of adaptive AI techniques, such as reinforcement learning, in managing complex systems.

6.4. Performance comparison between RL algorithms

The comparative analysis of PPO, DDPG, SAC, and TD3 revealed significant insights into the strengths and limitations of each algorithm.

The TD3 agent demonstrated the best performance, achieving the highest and most stable cumulative rewards (Figs 30,31). Its twin Q-networks, policy delay, and target smoothing mechanisms enhanced stability and optimization under complex greenhouse constraints. SAC performed well, with rewards close to TD3. Its entropy-based exploration enabled effective adaptation to stochastic conditions, though its policy updates were less efficient in some scenarios.

DDPG struggled with exploration in high-dimensional action spaces, resulting in lower rewards and less stable learning compared to TD3 and SAC. While PPO exhibited stable learning, its conservative policy updates limited its reward improvement, making it less effective than TD3, SAC and DDPG.

These findings reinforce the suitability of TD3 for complex greenhouse systems where nonlinear interactions and continuous action spaces dominate. Unlike PPO, which converges conservatively, or DDPG, which is prone to instability, TD3 achieves a balance between stability and adaptability, making it more resilient under dynamic climate conditions. This aligns with recent control engineering research emphasizing the reduction of function approximation error and the stabilization of policy updates in safety-critical environments.

In direct comparison, PPO provides stable but overly cautious updates; DDPG, though designed for continuous spaces, suffers from noisy and unstable exploration; and SAC improves adaptability through entropy maximization but often converges more slowly. By contrast, TD3 integrates deterministic policy outputs with stability-enhancing mechanisms such as twin critics, policy delay, and target smoothing. These properties produce reliable value estimates, smoother control trajectories, and enhanced robustness under both regular and stochastic climate inputs. Importantly, TD3’s smooth deterministic actions also reduce actuator wear and prevent abrupt microclimate changes, which are critical for crop health. This comparative analysis underscores why TD3 not only achieved superior numerical performance in our results but also aligns most closely with the operational requirements of real-world autonomous greenhouse management.

6.5. TD3 comparison with fixed-action strategies

The TD3 agent was compared against fixed-action strategies used by teams in the Autonomous Greenhouse Challenge, with the following observations:

Crop Yield Parameters: The TD3 agent consistently outperformed all teams in cumulative trusses (Fig 35), reflecting its ability to dynamically adapt resource usage to optimize plant growth. Teams such as “The Automators” and “Reference” fell behind due to suboptimal resource distribution strategies.

Resource Usage Optimization: The TD3 agent demonstrated efficient resource utilization, with notable reductions in irrigation (24.05% lower than the average of other teams) and CO2 consumption (1.86% lower than the average). These results underscore the agent’s capacity for sustainable water and energy management (Figs 3640, Table 12).

Although our experiments did not explicitly simulate continuous extreme events such as prolonged rain or sandstorms, the stochastic weather generator already introduced large and irregular fluctuations in temperature, radiation, and humidity. These fluctuations represent a proxy for stress-testing, as they deviate substantially from normal seasonal variations and require the RL agent to adapt dynamically. The results (Figs 2429) show that the retrained TD3 agent maintained stable performance despite these abrupt changes, underscoring its inherent climate resilience.

In particular, the stochastic weather generator used in this study incorporates high-variance patterns such as multi-day heatwaves, cold spells, and prolonged low radiation periods, which emulate real extreme climate events encountered in greenhouse operations. This ensures that the agent’s robustness is evaluated under both typical and stress-inducing environmental scenarios.

In practical terms, climate extremes can be considered as extended versions of the fluctuations already modeled in our randomized scenarios. The ability of the RL agent to generalize under such irregular dynamics provides confidence in its robustness. Future work will extend this by incorporating disaster-oriented scenarios (e.g., multi-day rain, heat waves, or sandstorms) following the approaches in recent resilience-focused greenhouse studies, thereby quantifying the agent’s performance under rare but high-impact events.

Prediction errors from the MLP-based climate model and the LSTM-based crop and resource models propagate into the reinforcement learning loop as structured noise rather than deterministic bias. To mitigate potential error amplification, the RL agent was trained under stochastic weather perturbations and evaluated using reward sensitivity and Pareto-front robustness analyses. This design ensures that learned policies remain stable despite moderate prediction inaccuracies, as evidenced by consistent convergence behavior and smooth trade-offs across varying reward weight configurations.

6.6. Trade-off and sensitivity robustness

The sensitivity of the proposed framework to reward weights was systematically investigated to assess how different values of the coefficients α (crop yield weight) and β (resource consumption weight) influence the learned trade-off between productivity and sustainability. Unlike exploratory tuning, we performed a structured sweep of α and β across a defined grid and evaluated performance using logged evaluation episodes.

The Pareto frontier analysis (Fig 41) clearly illustrates the inherent trade-off: higher yields are only attainable at the cost of increased resource consumption, and vice versa. This confirms that the framework naturally spans multiple operational regimes that stakeholders can select based on strategic priorities.

To further quantify sensitivity, a heatmap of mean reward across (α, β) configurations was generated (Fig 42). The results show that increasing α systematically shifts the policy toward yield-maximizing behaviors, whereas higher β values promote resource efficiency. The transition is smooth and continuous, underscoring the robustness of the reward design. Importantly, the reward landscape does not collapse into degenerate solutions, suggesting that the formulation generalizes well across a spectrum of stakeholder preferences.

The limited spread of points on the Pareto plots should not be interpreted as a lack of variability, but rather as a consequence of normalization and intrinsic greenhouse constraints. This concentration highlights the stability of the RL agent, which consistently identifies balanced strategies. The complementary heatmap further supports this robustness, showing that systematic adjustments to α and β produce interpretable shifts in the yield–resource balance rather than unstable or erratic responses.

Together, Figs 41 and 42 demonstrate that the proposed framework provides a structured and tunable trade-off frontier that adapts to diverse sustainability objectives. These findings directly address the reviewer’s concern, confirming that the reward function offers a flexible and universal mechanism to balance productivity with resource efficiency in practical greenhouse management.

6.7. Algorithmic interpretation and deployment implications

The TD3 agent’s twin-critic structure mitigates overestimation of aggressive CO₂ dosing, avoiding waste. Policy-delay prevents abrupt irrigation or heating oscillations that could stress plants, while target-policy smoothing stabilizes exploration under stochastic weather. Table 14 summarizes behavioral differences among TD3, DDPG, and SAC.

thumbnail
Table 14. Behavioral comparison of RL algorithms under stochastic greenhouse control.

https://doi.org/10.1371/journal.pone.0344946.t014

To further illustrate these behaviors, a comparative evaluation of TD3, DDPG, and SAC agents was conducted using the same stochastic-weather greenhouse environment. TD3 exhibited markedly smoother policy convergence and fewer control oscillations in both irrigation and heating setpoints. In contrast, DDPG occasionally overestimated aggressive CO₂ dosing, leading to transient spikes in resource use, while SAC showed slower convergence and higher variance during exploration. These observations align with the theoretical strengths of TD3’s architecture, its twin-critic stability and delayed policy updates, which enable consistent decision-making under biological and environmental uncertainty.

6.8. Alignment with existing literature

This research aligns with and expands upon existing studies in AI-driven agricultural management. Unlike prior works focusing on single objectives such as irrigation or energy optimization, this study integrates multi-objective RL to simultaneously optimize crop growth and resource efficiency.

By incorporating dynamic weather generation and multi-variable greenhouse control, the study addresses limitations in traditional approaches that rely on static environments or predefined setpoints. The TD3 agent’s superior performance builds on foundational works by demonstrating adaptability to stochastic conditions and robustness in optimizing crop yield and resource use.

6.9. Cross-disciplinary universality

The principles underlying our RL-based resource optimization extend beyond agriculture and are applicable to other dynamic, resource-constrained systems. Similar challenges exist in energy systems, manufacturing, and process industries, where sustainability optimization requires balancing efficiency, performance, and resource recovery. For example, the reversible loss recovery model for battery systems presented in [59] provides a theoretical basis for managing the balance between consumption and recovery in real time. Drawing parallels to greenhouse management, incorporating intermittent recovery phases for heating and CO₂ supply equipment could further enhance system sustainability. Such cross-disciplinary perspectives underscore the universality of our framework and open pathways for integrating methodologies from other domains to strengthen climate-resilient agricultural practices.

6.10. Practical deployment potential

Practical deployment of the proposed framework can be achieved by integrating the RL agent with existing IoT-based greenhouse management systems. Real-time sensor data can be streamed to a cloud-hosted RL service, which computes optimal setpoints and communicates them to local actuator controllers.

The full deployment loop operates as follows: (1) sensors measure temperature, humidity, and CO₂ concentration; (2) a local edge device runs the MLP model for immediate climate state estimation; (3) state data and short-term forecasts from the LSTM models are transmitted to the cloud-based TD3 agent for decision-making; (4) the agent computes optimal weekly setpoints for heating, ventilation, and irrigation; and (5) these setpoints are returned to the edge controller, which executes the actions through the actuator interface. The combined inference footprint of the MLP-LSTM-TD3 pipeline remains lightweight (<50 MB RAM and <0.1 s per forward pass on an NVIDIA Jetson Nano), making deployment feasible on low-power edge hardware or hybrid cloud–edge systems. In case of communication loss or model failure, the rule-based controller automatically re-engages to maintain safe operating limits and prevent crops or equipment damage.

Periodic retraining with updated data would ensure adaptability to seasonal changes and equipment aging. This deployment pathway supports both on-premises and remote decision support, making it viable for large-scale commercial adoption.

In real-world deployments, additional operational challenges must be considered. Sensor noise and measurement drift may introduce uncertainty into state estimation, which can be mitigated through sensor calibration routines, redundancy, and filtering techniques at the edge level. Actuator delays and response latencies, particularly in heating and ventilation systems, may affect control precision and should be accounted for through conservative action constraints and delayed-reward awareness during training. Model retraining frequency represents a trade-off between adaptability and operational stability; in practice, retraining can be scheduled periodically or triggered by performance degradation indicators. From a computational perspective, the lightweight inference footprint enables on-site deployment on embedded hardware, while more computationally intensive retraining processes can be performed offline or in the cloud.

While the proposed RL framework demonstrated robust performance under stochastic environmental conditions, long-term deployment must consider gradual equipment performance degradation. For example, heating system efficiency and CO₂ dosing accuracy may deteriorate over extended operation periods. Integrating predictive maintenance models, such as the dynamic operating life prediction framework described in (50) could enable the RL agent to adjust control strategies based on real-time equipment health indicators, ensuring sustained optimization.

6.11. Limitations

While the study demonstrates the effectiveness of RL-based frameworks for greenhouse management, certain limitations remain. The agent’s performance is influenced by the quality and diversity of training data. Limited training scenarios may reduce adaptability to unforeseen conditions. The RL agent was tested in a virtual simulation. Real-world implementation may face challenges related to data collection, sensor accuracy, and integration with existing infrastructure.

Furthermore, the experimental validation relies on a single crop (cherry tomato) and a specific high-tech greenhouse dataset, which may limit the direct generalizability of the learned control policies to other crops, greenhouse configurations, or climatic regions without additional retraining or adaptation.

Although stochastic environmental conditions are regenerated across episodes to enhance robustness and each algorithm underwent extensive training and hyperparameter tuning, the reported results summarize representative performance from a single finalized training configuration per algorithm. Future work could further strengthen statistical confidence by aggregating performance across multiple independent training runs.

In addition, while our study employed LSTM models for time series forecasting of crop parameters and resource consumption, recent advances suggest that Transformer-based architectures may offer superior capability in modeling long-term and irregular environmental fluctuations. As highlighted in [60], 3050, future work comparing LSTM and Transformer models could provide further improvements in adaptability, particularly under extreme or unpredictable climate conditions.

Future research could explore model-based reinforcement learning (MBRL) frameworks, leveraging predictive models to simulate interactions and guide RL training. This approach could enable scalable, data-efficient training while ensuring adaptability to dynamic real- world conditions.

This study highlights the potential of RL-based systems to modernize greenhouse management by achieving a balance between crop yield optimization and resource efficiency. The TD3 agent’s superior performance underscores the transformative role of AI in advancing sustainable agricultural practices. By addressing existing limitations and expanding its scope, this research paves the way for scalable and adaptive solutions in response to global challenges in food security and climate change.

7. Conclusions

This study highlights the transformative potential of reinforcement learning (RL) in advancing autonomous greenhouse management systems. By dynamically optimizing critical parameters for crop growth and resource consumption, the proposed RL-based framework addresses pressing challenges in modern agriculture, including food security and resource sustainability.

The RL agent demonstrated its ability to adapt to varying environmental conditions, balancing short-term goals of high productivity with long-term sustainability. The agent effectively optimized control strategies to maximize crop yields while ensuring efficient re- source utilization.

Among the tested RL algorithms, the Twin Delayed Deep Deterministic Policy Gradient (TD3) agent emerged as the most robust, consistently achieving the highest cumulative re- wards, superior crop yield parameters (stem elongation, thickness, and cumulative trusses), and optimized resource consumption metrics (heat, electricity, CO2, and irrigation). Its twin-critic structure, policy delay, and smoothing mechanisms enabled reliable control under complex and stochastic greenhouse conditions, outperforming PPO, DDPG, and SAC. These results confirm TD3’s suitability for managing the dynamic and nonlinear nature of greenhouse environments, where both stability and adaptability are critical.

The RL agent outperformed fixed-action strategies under historical weather conditions and demonstrated significantly higher adaptability when trained in randomized weather environments. The retrained agent achieved higher cumulative rewards, showcasing its ability to generalize and respond to stochastic changes, a critical advantage over static strategies.

The TD3-based greenhouse controller achieved an effective balance between maximizing crop yield and minimizing resource consumption, demonstrating clear improvements over fixed-action strategies. Sensitivity analyses verified the universality and stability of the reward design, confirming adaptability to different sustainability priorities. The framework’s modular design and cross-domain applicability position it as a scalable step toward intelligent, sustainable agriculture.

A promising extension of this work lies in the integration of Model-Based Reinforcement Learning (MBRL) approaches. By leveraging predictive environment models, MBRL could enable more efficient training and deployment of RL agents in real-world greenhouse systems. This approach ensures continuous improvement and adaptability, bridging the gap between simulation and practical applications.

In conclusion, the proposed RL framework showcases its ability to optimize greenhouse operations by dynamically adapting to environmental conditions. It sets the foundation for future advancements in smart agriculture, paving the way for scalable, sustainable, and efficient agricultural practices.

8. Recommendations

To further enhance the applicability, robustness, and scalability of RL-based frameworks in autonomous greenhouse management, the following recommendations are proposed:

  • Practitioners should implement RL-based systems to optimize decision-making processes and resource usage. Training programs should be developed to equip operators with the necessary skills to leverage AI-driven systems effectively, enabling dynamic and adaptable control over greenhouse conditions.
  • To maintain adaptability, RL agents must undergo regular retraining using updated data reflecting changing environmental conditions and crop-specific requirements. This iterative process ensures sustained high performance and addresses evolving agricultural demands.
  • RL agents should be trained with stochastic elements, such as random weather generation, pest infestations, and resource constraints, to improve their robustness under real-world uncertainties. Incorporating these dynamic scenarios during training enhances the agent’s generalizability and reliability in practical applications.
  • Future research should refine the virtual greenhouse environment by incorporating diverse historical greenhouse records and conducting real-world benchmarking experiments. Advanced validation techniques, such as cross-validation with diverse datasets and sensitivity analyses, can further improve the model’s accuracy and reliability.
  • The integration of IoT sensors into greenhouse systems can provide real-time data on critical environmental parameters such as temperature, humidity, soil moisture, and CO2 levels. This data enables the RL agent to make informed decisions and adapt dynamically to changes, creating a real-time feedback loop for precise resource management.
  • Expanding RL frameworks to include diverse crop species, regional conditions, and growth environments can validate their scalability and adaptability. This approach ensures applicability across various agricultural systems, promoting broader adoption of AI-driven practices.
  • Interdisciplinary collaborations with agronomists, environmental scientists, and agricultural practitioners are essential for refining RL models. Experts can provide insights into factors such as pest management, nutrient cycles, and soil health, ensuring the RL framework addresses real-world complexities effectively.
  • Future research should explore the integration of MBRL, which enables simultaneous training of the RL agent and the environment model. This approach allows the system to dynamically adapt to new data and scenarios, bridging the gap between virtual simulations and practical agricultural applications.
  • Real-world implementation of RL frameworks is critical for validating their performance and refining their effectiveness. Controlled greenhouse experiments can provide valuable insights into the practical utility of these systems, ensuring their readiness for large-scale deployment. Additionally, the assumption of uniform soil properties simplifies real-world variability. Incorporating heterogeneous soil-sensor feedback will enable spatially adaptive irrigation and temperature control.
  • RL-based systems should be leveraged to enhance sustainability in agricultural practices. By optimizing resource usage and improving crop yields, these systems contribute to global efforts to address food security and climate change challenges.

By addressing these future directions, the proposed RL framework can be refined and expanded, contributing to a sustainable transformation of agricultural practices in response to global food production challenges.

Supporting information

S1 File. Supporting information.

https://doi.org/10.1371/journal.pone.0344946.s001

(Supporting Information.ZIP)

References

  1. 1. Office of the Director ADED. High level expert forum - global agriculture towards 2050: how to feed the world 2050. Rome: Tome; 2009. https://www.fao.org/fileadmin/user_upload/lon/HLEF2050_Global_Agriculture.pdf
  2. 2. Utilities JW. Ministry of water and irrigation utilities performance monitoring unit (UPMU) aqaba wastewater treatment plant. Amman: Ministry of Water and Irrigation; 2020. www.mwi.gov.jo
  3. 3. Hassan SI, Alam MM, Illahi U, Al Ghamdi MA, Almotiri SH, Su’ud MM. A systematic review on monitoring and advanced control strategies in smart agriculture. IEEE Access. 2021;9:32517–48.
  4. 4. Saggi MK, Jain S. A survey towards decision support system on smart irrigation scheduling using machine learning approaches. Arch Comput Methods Eng. 2022;29(6):4455–78. pmid:35573028
  5. 5. Hemming S, de Zwart HF, Elings A, Righini I, Petropoulou A. Autonomous greenhouse challenge, first edition (2018). 4TU.ResearchData; 2019.
  6. 6. Hemming S, de Zwart F, Elings A, Petropoulou A, Righini I. Cherry tomato production in intelligent greenhouses-sensors and ai for control of climate, irrigation, crop yield, and quality. Sensors (Basel). 2020;20(22):6430. pmid:33187119
  7. 7. Gutiérrez-Gordillo S, García-Tejero IF, Durán Zuazo VH, García Escalera A, Ferrera Gil F, Amores-Agüera JJ, et al. Assessing the water-stress baselines by thermal imaging for irrigation management in almond plantations under water scarcity conditions. Water. 2020;12(5):1298.
  8. 8. Chen Y-A, Hsieh W-H, Ko Y-S, Huang N-F. An ensemble learning model for agricultural irrigation prediction. In: 2021 International conference on information networking (ICOIN), 2021. 311–6.
  9. 9. Vij A, Vijendra S, Jain A, Bajaj S, Bassi A, Sharma A. IoT and machine learning approaches for automation of farm irrigation system. Procedia Comp Sci. 2020;167:1250–7.
  10. 10. Vijay R, Priya YV, Reddy PC, Monisha S, Ramasamy V. A IoT based Smart Auto Irrigation system. In: 2023 International conference on computer communication and informatics (ICCCI), 2023. 1–4.
  11. 11. Cardoso J, Gloria A, Sebastiao P. A methodology for sustainable farming irrigation using WSN, NB-IoT and machine learning. In: SEEDA-CECNSM 2020 - 5th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference, 2020.
  12. 12. Chandra S, Bhilare S, Asgekar M, Ramya RB. Crop water requirement prediction in automated drip irrigation system using ML and IoT. In: 2021 International Conference on Nascent Technologies in Engineering, ICNET 2021 - Proceedings, 2021.
  13. 13. Garg S, Pundir P, Jindal H, Saini H, Garg S. Towards a multimodal system for precision agriculture using IoT and machine learning. In: 2021 12th International conference on computing communication and networking technologies (ICCCNT), 2021. 1–7.
  14. 14. Cardoso J, Gloria A, Sebastiao P. Improve irrigation timing decision for agriculture using real time data and machine learning. In: 2020 International conference on data analytics for business and industry: way towards a sustainable economy (ICDABI), 2020. 1–5.
  15. 15. Lakshmi C, Pagadala A, Bandam SM, Penugonda A, Gangaraju B, Ganapavarapu T, et al. Analysing agricultural information through machine learning and artificial intelligence for SMART IRRIGATION. In: 2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN), 2023. 1–6.
  16. 16. Al-Faydi SNM, Al-Talb HNY. IoT and artificial neural network-based water control for farming irrigation system. In: 2022 2nd International conference on computing and machine intelligence, ICMI 2022 - Proceedings, 2022.
  17. 17. Ben Abdallah E, Grati R, Boukadi K. A machine learning-based approach for smart agriculture via stacking-based ensemble learning and feature selection methods. In: 2022 18th International conference on intelligent environments (IE), 2022. 1–8.
  18. 18. FAO. CLIMWAT - Climatic database to be used with CROPWAT. Land and Water. Accessed 2025 August 19. https://www.fao.org/land-water/land/land-governance/land-resources-planning-toolbox/category/details/en/c/1026544/
  19. 19. Permana IKYT, Mantoro T, Irawan E, Dwi Handayani DO, Safitri C. Increased efficiency of smart water systems for vegetable plants using the deep learning classification approach. In: 5th International conference on computing engineering and design, ICCED 2019. 2019.
  20. 20. Brown HE, Jamieson PD, Hedley C, Maley S, George MJ, Michel AJ. Using infrared thermometry to improve irrigation scheduling on variable soils. Agric For Meteorol. 2021;307:108033.
  21. 21. Lyu L, Caballero JM, Juanatas RA. Design of irrigation control system for vineyard based on LoRa wireless communication and dynamic neural network. In: 2022 7th International Conference on Business and Industrial Research (ICBIR), 2022. 373–8.
  22. 22. Kashyap PK, Kumar S, Jaiswal A, Prasad M, Gandomi AH. Towards precision agriculture: IoT-enabled intelligent irrigation systems using deep learning neural network. IEEE Sensors J. 2021;21(16):17479–91.
  23. 23. Alibabaei K, Gaspar PD, Assunção E, Alirezazadeh S, Lima TM. Irrigation optimization with a deep reinforcement learning model: case study on a site in Portugal. Agric Water Manag. 2022;263:107480.
  24. 24. Alibabaei K, Gaspar PD, Assunção E, Alirezazadeh S, Lima TM, Soares VNGJ, et al. Comparison of on-policy deep reinforcement learning A2C with off-policy DQN in irrigation optimization: a case study at a site in Portugal. Computers. 2022;11(7):104.
  25. 25. Kamyshova G, Osipov A, Gataullin S, Korchagin S, Ignar S, Gataullin T, et al. Artificial neural networks and computer vision’s-based phytoindication systems for variable rate irrigation improving. IEEE Access. 2022;10:8577–89.
  26. 26. Yumang AN, Garcia LAP, Mandapat GA. IoT-based monitoring of temperature and humidity with fuzzy control in cherry tomato greenhouses. 2023. 480–5.
  27. 27. Amir A, Butt M, Van Kooten O. Using Machine Learning Algorithms to Forecast the Sap Flow of Cherry Tomatoes in a Greenhouse. IEEE Access. 2021;9:154183–93.
  28. 28. Hindi I, Mashagbeh MA, Alsharkawi A. Optimizing cherry tomato crop irrigation: a robust daily schedule incorporating weather, soil, and irrigation data through cascaded-output ANN. In: 15th International conference on information and communication systems, ICICS. 2024.
  29. 29. Petropoulou AS, van Marrewijk B, de Zwart F, Elings A, Bijlaard M, van Daalen T, et al. Lettuce production in intelligent greenhouses-3D imaging and computer vision for plant spacing decisions. Sensors (Basel). 2023;23(6):2929. pmid:36991638
  30. 30. Lin Z, Fu R, Ren G, Zhong R, Ying Y, Lin T. Automatic monitoring of lettuce fresh weight by multi-modal fusion based deep learning. Front Plant Sci. 2022;13:980581.
  31. 31. Gang M-S, Kim H-J, Kim D-W. Estimation of greenhouse lettuce growth indices based on a two-stage CNN using RGB-D images. Sensors (Basel). 2022;22(15):5499. pmid:35898004
  32. 32. Zhang Y, Li M, Li G, Li J, Zheng L, Zhang M, et al. Multi-phenotypic parameters extraction and biomass estimation for lettuce based on point clouds. Measurement. 2022;204:112094.
  33. 33. Goldenits G, Mallinger K, Raubitzek S, Neubauer T. Current applications and potential future directions of reinforcement learning-based Digital Twins in agriculture. Smart Agricul Tech. 2024;8:100512.
  34. 34. Walshe D, McInerney D, De Kerchove RV, Goyens C, Balaji P, Byrne KA. Detecting nutrient deficiency in spruce forests using multispectral satellite imagery. Inter J Appl Earth Observ Geoinform. 2020;86:101975.
  35. 35. Kallenberg MGJ, Overweg H, van Bree R, Athanasiadis IN. Nitrogen management with reinforcement learning and crop growth models. Environ Data Science. 2023;2.
  36. 36. Sha X, Zhu Y, Sha X, Guan Z, Wang S. ZHPO-LightXBoost an integrated prediction model based on small samples for pesticide residues in crops. Environ Model Softw. 2025;188:106440.
  37. 37. Gong L, Gao B, Sun Y, Zhang W, Lin G, Zhang Z. PreciseSLAM: robust, real-time, LiDAR-inertial-ultrasonic tightly-coupled SLAM with ultraprecise positioning for plant factories. IEEE Trans Industrial Inform. 2024;20(6):8818–27.
  38. 38. Zhang W. Greenhouse monitoring system integrating NB-IOT technology and a cloud service framework. Nonlinear Eng. 2024;13(1).
  39. 39. Purcell W, Neubauer T. Digital twins in agriculture: a state-of-the-art review. Smart Agricul Tech. 2023;3:100094.
  40. 40. Kropp I, Nejadhashemi AP, Deb K, Abouali M, Roy PC, Adhikari U. A multi-objective approach to water and nutrient efficiency for sustainable agricultural intensification. Agric Syst. 2019;173:289–302.
  41. 41. Shi J, Han D, Chen C, Shen X. KTMN: knowledge-driven two-stage modulation network for visual question answering. Multimedia Syst. 2024;30(6):1–17.
  42. 42. Iacomi C, Roșca I, Madjar R, Iacomi B, Popescu V. Automation and computer-based technology for small vegetable farm holders. 2014. https://www.academia.edu/download/91116274/art74.pdf
  43. 43. Liu T, Yuan Q, Wang Y. Hierarchical optimization control based on crop growth model for greenhouse light environment. Comp Electron Agriculture. 2021.
  44. 44. Elavarasan D, Durairaj Vincent PM. Crop yield prediction using deep reinforcement learning model for sustainable agrarian applications. IEEE Access. 2020;8:86886–901.
  45. 45. Mallick S, Airaldi F, Dabiri A, Sun C, De Schutter B. Reinforcement learning-based model predictive control for greenhouse climate control. Smart Agricul Tech. 2025;10:100751.
  46. 46. Morcego B, Yin W, Boersma S, van Henten E, Puig V, Sun C. Reinforcement learning versus model predictive control on greenhouse climate control. Comput Electron Agric. 2023;215:108372.
  47. 47. Decardi-Nelson B, You F. Improving resource use efficiency in plant factories using deep reinforcement learning for sustainable food production. Chem Eng Trans. 2023;103:79.
  48. 48. Ding X, Du W. DRLIC: deep reinforcement learning for irrigation control. In: 2022 21st ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), 2022. 41–53.
  49. 49. Kerkhof L, Keviczky T. Predictive control of autonomous greenhouses: A data-driven approach. In: 2021 European control conference, ECC 2021, 2021. 1229–35.
  50. 50. Lee M-H, Yao M-H, Kow P-Y, Kuo B-J, Chang F-J. An artificial intelligence-powered environmental control system for resilient and efficient greenhouse farming. Sustainability. 2024;16(24):10958.
  51. 51. Hoseinzadeh S, Astiaso Garcia D. Ai-driven innovations in greenhouse agriculture: reanalysis of sustainability and energy efficiency impacts. Energy Convers Manag: X. 2024;24:100701.
  52. 52. Fan J, Zhang X, He N, Song F, Wang X. Investigation on novel deep eutectic solvents with high carbon dioxide adsorption performance. J Environ Chem Eng. 2025;13(5):117870.
  53. 53. Platero-Horcajadas M, Pardo-Pina S, Cámara-Zapata J-M, Brenes-Carranza J-A, Ferrández-Pastor F-J. Enhancing greenhouse efficiency: integrating IoT and reinforcement learning for optimized climate control. Sensors (Basel). 2024;24(24):8109. pmid:39771844
  54. 54. Tang X, Shi L, Li M, Xu S, Sun C. Health state estimation and long-term durability prediction for vehicular PEM fuel cell stacks under dynamic operational conditions. IEEE Trans Power Electron. 2025;40(3):4498–509.
  55. 55. Schulman J, Wolski F, Dhariwal P, Radford A, Openai OK. Proximal policy optimization algorithms. 2017. https://arxiv.org/pdf/1707.06347
  56. 56. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. In: 4th International conference on learning representations, ICLR 2016 - conference track proceedings. 2015. https://arxiv.org/pdf/1509.02971
  57. 57. Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J. Soft Actor-Critic Algorithms and Applications. 2018. https://arxiv.org/pdf/1812.05905
  58. 58. Fujimoto S, Van Hoof H, Meger D. Addressing function approximation error in actor-critic methods. PMLR. 2018. 1587–96. https://proceedings.mlr.press/v80/fujimoto18a.html
  59. 59. Meng X, Sun C, Mei J, Tang X, Hasanien HM, Jiang J, et al. Fuel cell life prediction considering the recovery phenomenon of reversible voltage loss. J Power Sources. 2025;625:235634.
  60. 60. Meng X, Mei J, Tang X, Jiang J, Sun C, Song K. The degradation prediction of proton exchange membrane fuel cell performance based on a transformer model. Energies. 2024;17(12):3050.