Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Finding optimum climatic parameters for high tomato yield in Benin (West Africa) using frequent pattern growth algorithm

  • Sèton Calmette Ariane Houetohossou ,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Writing – original draft, Writing – review & editing

    harianecalmet@gmail.com

    Affiliation Laboratoire de Biomathématiques et d’Estimations Forestières, University of Abomey-Calavi, Cotonou, Benin

  • Vinasetan Ratheil Houndji,

    Roles Conceptualization, Formal analysis, Methodology, Supervision, Writing – review & editing

    Affiliations Laboratoire de Biomathématiques et d’Estimations Forestières, University of Abomey-Calavi, Cotonou, Benin, Institut de Formation et de Recherche en Informatique, University of Abomey-Calavi, Cotonou, Benin

  • Rachidatou Sikirou,

    Roles Writing – review & editing

    Affiliation Laboratoire de Défense des Cultures, Centre de Recherches Agricoles d’Agonkanmey, Institut National des Recherches Agricoles du Bénin (INRAB), Cotonou, Republic of Benin

  • Romain Glèlè Kakaï

    Roles Conceptualization, Methodology, Supervision, Validation, Writing – review & editing

    Affiliation Laboratoire de Biomathématiques et d’Estimations Forestières, University of Abomey-Calavi, Cotonou, Benin

Abstract

Tomato is one of the most appreciated vegetables in the world. Predicting its yield and optimizing its culture is important for global food security. This paper addresses the challenge of finding optimum climatic values for a high tomato yield. The Frequent Pattern Growth (FPG) algorithm was considered to establish the associations between six climate variables: minimum and maximum temperatures, maximum humidity, sunshine (Sun), rainfall, and evapotranspiration (ET), collected over 26 years in the three agro-ecological Zones of Benin. Monthly climate data were aggregated with yield data over the same period. After aggregation, the data were transformed into ‘low’, ‘medium’, and ‘high’ attributes using the threshold values defined. Then, the rules were generated using the minimum support set to 0.2 and the confidence to 0.8. Only the rules with the consequence ‘high yield’ were screened. The best yield patterns were observed in the Guinean Zone, followed by the Sudanian. The results indicated that high tomato yield was associated with low ET in all areas considered. Minimum and maximum temperatures, maximum humidity, and Sun were medium in every Zone. Moreover, rainfall was high in the Sudanian Zone, unlike the other regions where it remained medium. These results are useful in assessing climate variability’s impact on tomato production. Thus, they can help farmers make informed decisions on cultivation practices to optimize production in a changing environment. In addition, the findings of this study can be considered in other regions and adapted to other crops.

Introduction

Agriculture is a significant component of the economy of developing countries. It contributes 30 to 60 percent of the gross domestic product (GDP) in about two-thirds of these countries [1]. In Benin, it is the most important sector, providing up to 36% of the GDP [2]. In particular, vegetable production is essential in reducing poverty by increasing employment opportunities and promoting the country’s economic development [3]. The top five vegetables for investments in Benin are tomato (Solanum lycopersicum L.), chilli pepper (Capsicum frutescens L.), habanero pepper (Capsicum chinense J.), onion (Allium cepa L.), and carrot (Daucus carota L.) [4]. Among them, tomatoes are the most important regarding area and production [5]. It is very appreciated because of its usefulness for most cooked foods [4].

Farmers need to predict yield in advance to be more efficient and maximize profits. Predicting yield is essential for agricultural risk management and future forecasting decisions. Yield depends on the climate, soil, water, nutrient availability, diseases and pests management, farming practices, choice of variety, and growing methods [6]. Statistical methods such as simple and multiple linear regressions can be used to predict crop yield. Bahrami [7] using Backward Multiple Linear Regressions based on the Relative Importance Metrics to determine the effect of climatic parameters on the rainfall yield of wheat showed that the duration of sunshine was the most important parameter for growth. Linear regression models assume a linear relationship between the independent and dependent variables. However, the relationships between the predictors and dependent variables can be non-linear and complex. Machine learning models, on the other hand, overcome this shortcoming [8]. Various machine learning approaches have been actively employed in recent studies related to yield prediction, owing to the non-linear spatiotemporal nature of crop yields [9]. Machine Learning is a component of Artificial Intelligence that has proven effective in providing concrete solutions to many challenges in agriculture, including yield prediction, disease detection, weed detection, crop quality, and species recognition [10]. For example, [11] applied machine learning models to predict the yield of six crops: rice, maize, cassava, cotton, yams, and bananas in some West African countries. For this purpose, they used historical data on the yield of the target crops as well as climate, weather, and chemical data. The Decision Tree model performed best with a coefficient of determination of 95.3%. In addition, Gómez [12] used satellite and climate data to model wheat yield in Mexico. After testing several ML models, the Random Forest (RF) provided better prediction. Moreover, using a LASSO, RF, XGBoost regression method and the Long Short Term Memory (LSTM), Zhang [13] developed an approach that integrated optical, fluorescence, satellite thermal, and environmental data to predict maize yield in four agro-ecological Zones in China and showed that combining these multi-source data explained more than 75% of the yield variation. Furthermore, Rao [14] used three association algorithms, Apriori, Eclat, and AprioriTid, to determine the climatic and soil attributes that contribute to the excellent performance of paddy rice in India. They found that the Eclat algorithm performed better than the other two.

Also, in India, a prediction model was proposed to forecast crop yield in the districts of Tamil Nadu based on available previous data [15]. The Apriori algorithm was applied to a pre-processed dataset obtained through the modified K-means algorithm to predict the yield of various crops. Moreover, Supro [16] used the Apriori algorithm to generate association rules on the attributes area, production, yield, temperature, precipitation, humidity, and wind speed for predicting paddy crop yield in the Larkana district of India.

The aforementioned studies have focused on cereals. No work links climate attributes to tomato yield in Africa, although this could help optimize production. Indeed, by using weather data, farmers can predict tomato yield and adjust their agricultural practices accordingly. This can help maximize production by adapting the amount of water and fertilizers applied and predicting market fluctuations. In addition, predicting tomato yields based on weather conditions can help farmers better manage the risks associated with extreme weather conditions, such as droughts, floods, or winds. Accurate forecasts can help farmers protect their crops, reduce production losses, and minimize economic impacts. Moreover, farmers can plan yields and forecast stocks to ensure a constant and stable market supply, thus contributing to food security [17].

In this paper, we propose to use an association rule algorithm, frequent pattern growth (Fp growth), to find the optimum climatic attributes to maximize tomato yield in Benin. In data mining, association rule mining is a popular and extensively studied method for discovering relationships of interest between two or more variables stored in large databases [18]. The study used weather and tomato yield data collected in Benin’s three agroecological zones over a period of 26 years. The weather data considered are temperature, humidity, sunshine, and rainfall.

The remaining sections of the paper are organized in the following manner. The first section briefly outlines the literature’s most prevalent association rules approaches. The second section describes the proposed methodology, highlighting the study area’s presentation and the database used, the pre-processing of this data, and the metric used to evaluate the chosen approach. The third and fourth sessions present and discuss the main results obtained. Finally, the last session concludes.

Brief description of the most common association rule algorithms

Association rules use Machine Learning models to analyze datasets for patterns or co-occurrences in a database. An association rule has two components: an antecedent (if) and a consequent (then), and it identifies frequent ‘if-then’ associations, which are association rules. The rules are computed from itemsets consisting of two or more elements. Popular algorithms that use association rules include Apriori, Equivalence Class Clustering bottom-up Lattice Traversal (ECLAT), and Frequent Pattern Growth (Fp Growth). A brief description of the three algorithms is provided below.

Apriori

Created by Agrawal [19], the Apriori algorithm is a multi-pass database form whose initial practical application is to recommend products based on ones already in the user’s cart. It enables finding frequent patterns among the elements stored in a database to generate association rules from these common elements. Its fundamental principle is that all non-empty subsets of a set of frequent elements must also be frequent. To do this, it starts by finding candidate item sets and combining each item with every other item in the preceding item set. The frequent item set is then generated from the previous candidate item set by pruning the items whose support does not meet the selected minimum threshold. Finally, it uses a bottom-up approach known as candidate sets, where frequent sets are extended individually. The main disadvantage is that it requires several database scans to calculate each item’s support.

ECLAT

The ECLAT algorithm works vertically, like a deep graph search, making it faster than the Apriori algorithm. The primary purpose is to use the intersections of Transaction Id Sets(tidsets) to calculate a candidate’s support value and avoid the generation of subsets that are not present in the prefix tree [20]. When the function is at the first call, all unique elements are used with their tidsets. After that, the procedure is called recursively, and in each recursive call, each element-tidset pair is checked and combined with other element-tidset couples. The process is repeated until no candidate item-tidset pair can be combined.

Fp growth

The Frequent Patterns Growth algorithm finds sets of frequent items without generating candidates. It works in two steps. The first step consists of constructing a compact data structure called FP-Tree. The second step is devoted to directly extracting frequent sets from the FP-Tree. The FP-Tree has been proposed by Kamber [21]. Each path in the FP-Tree represents a set of relevant frequent information, and the paths’ nodes are arranged in descending order of their frequency. In the FP-tree, the information in the dataset is highly compressed because all overlapping itemsets share the same prefix path [22]. It traverses the database only twice and does not require candidate generation.

Materials and methods

This section explains the study environment and the data used. It illustrates the variation of these data in time, presents the preprocessing, and details the technique and performance metrics.

Study area and dataset presentation

The country of study is the Republic of Benin, a West African country (Fig 1). We used secondary data collected at the Kandi, Savè, and Cotonou synoptic stations in the Sudanian, Sudano-Guinean, and Guinean Zones, respectively. The government of Benin has defined two poles for vegetable production: poles 1 and 7. Pole 1 is located in the Sudanian Zone and covers the districts of Malanville and Karimama, which do not have synoptic stations. Therefore, we chose the synoptic station of Kandi, the closest station to Karimama and Malanville and located in the same department as these two districts. Pole 7, located in the Sudanian Zone, covers several communities with a single synoptic station: Cotonou. In addition, data from the synoptic station of Savè, situated in the Sudano-Guinean transition Zone, was collected. The choice of the station of Savè allows the study to cover the three types of climate in the country.

The secondary climate data from the three synoptic stations (Kandi, Savè, and Cotonou) were collected at the ‘Direction de la Météo-Bénin’ of the ‘Agence pour la Sécurité de la Navigation Aérienne en Afrique (ASECNA)’ from 1995 to 2020. These are minimum temperature (Tmin) in °C, maximum temperature (Tmax) in °C, minimum humidity (Umin) in %, maximum humidity (Umax) in %, rainfall (RR) in mm, sunshine (Sun) in hours (h), and evapotranspiration (ET) in mm. Annual tomato yield data from 1995 to 2020 was obtained from the ‘Direction de la Statistique Agricole du Benin (DSA)’. The average monthly yield was computed from the annual yields. This monthly yield was aggregated to the monthly average of each climatic data. The final base consists of 936 observations on eight variables, including the yield. Fig 2 shows the data distribution for each parameter and the yield according to the districts. The rainfall boxplots show a similar pattern for the three synoptic stations, with more outliers observed at the Cotonou station (Fig 2A). Sunstroke is higher at the synoptic station of Kandi than at the other two stations (Fig 2C). The last trend is also observed for Evapotranspiration and maximum temperature (Fig 2B & 2D). On the other hand, the minimum temperature, maximum and minimum humidities are more important at the Cotonou station than the other two stations (Fig 2E–2G). Similarly, tomato yields are much higher in Cotonou (Fig 2H).

thumbnail
Fig 2. Distribution of climate parameters and tomato yield from 1995 to 2020 in the three districts.

A: Rainfall. B: Evapotranspiration. C: Sunstroke. D: Maximum temperature. E: Minimum temperature. F: Maximum humidity. G: Minimum humidity. H: Tomato yield.

https://doi.org/10.1371/journal.pone.0297983.g002

Temporal variation of the climatic parameters over the years

Fig 3 showed the annual average of the parameters in the study areas over 26 years. Kandi station recorded the highest values of ET (6.6 mm), Tmax (35.6°C), and Sun (8.2 h). The Sun trends are similar in Cotonou and Savè. A slight variation was observed in Tmax at Cotonou between 1995 and 2015, where Tmin and Umin were high. However, Umin was low in Savè and Kandi. Cotonou and Savè showed the same variations for the Umax. The yield was higher up to 5000 kg/ha at Cotonou, while in the other two Zones, it remained at almost 500 kg/ha.

thumbnail
Fig 3. Temporal variation of climate parameters and tomato yield in the three districts.

A: Rainfall. B: Evapotranspiration. C: Sunstroke. D: Maximum temperature. E: Minimum temperature. F: Maximum humidity. G: Minimum humidity. H: Tomato yield.

https://doi.org/10.1371/journal.pone.0297983.g003

Data pre-processing

The correlation analysis between the dependent and independent variables was performed using a threshold of 80%. Thus, all variables that correlate above this threshold were removed. No explanatory variable was correlated with the response variable (Fig 4A). However, the correlation analysis between the explanatory variables indicated that the maximum temperature was correlated with the minimum humidity at more than 80%. In addition, there was a high correlation of 85% between maximum and minimum humidity (Fig 4B). Therefore, we removed the minimum humidity from the predictors.

thumbnail
Fig 4. Correlation analysis between variables.

A: Correlation between predictors and response variable. B: Correlation between predictors.

https://doi.org/10.1371/journal.pone.0297983.g004

From the daily climate data, we calculated the monthly average of each parameter to have twelve entries per year. This mean was aggregated to the monthly averages of the yield data. This resulted in a matrix of 936 observations on seven attributes, including minimum temperature, maximum temperature, maximum humidity, rainfall, sunshine, evapotranspiration, and tomato yield. The attributes were then categorized as ‘low’, ‘medium’, or ‘high’ based on the respective threshold value specified by the agro-ecological Zone in Tables 13. The thresholds were defined by calculating the means ± standard deviation of the data collected over the 26 years by agro-ecological Zone. Tables 13 provide information on the threshold values for the Sudanian, Sudano-Guinean, and Guinean Zones for each variable. After this, the attributes were transformed into dummy variables with the package pandas available in the free Python software.

thumbnail
Table 1. Threshold values of variables for Sudanian Zone.

https://doi.org/10.1371/journal.pone.0297983.t001

thumbnail
Table 2. Threshold values of variables for Sudano-Guinean Zone.

https://doi.org/10.1371/journal.pone.0297983.t002

thumbnail
Table 3. Threshold values of variables for the Guinean Zone.

https://doi.org/10.1371/journal.pone.0297983.t003

The association rule technique used and evaluation metrics

Several algorithms can be used to establish the rules, including Apriori and frequent pattern growth (Fp growth). We focused on FP Growth to establish the association rules because it is more efficient and scalable than the Apriori algorithm [23], and Garg [24] demonstrated that Eclat is less efficient than FP Growth. The minimum support was set to 0.2. We chose a minimum support value that is not too small to avoid the size of frequent itemsets being too large at the expense of execution efficiency [25]. In addition, this value is manageable to prevent insignificant itemsets from being generated [25]. Filtering was performed on rules that contained at least three antecedents and only high yield as a consequent. The rules retained were the five most relevant ones whose confidence is at least 0.8, and lift is greater or equal to 1. Three metrics were used to evaluate the constructed rules: support, confidence, and lift. For the association rule XY, the support indicates how frequently items X and Y appear together in the database. The confidence represents the number of times the ‘if-then’ statements are considered trustworthy [26]. The lift measures the importance of a rule by comparing the confidence of the rule with the expected confidence [26]. If the lift is negative, the data points have a negative correlation. If it is at least 1, it indicates a positive correlation between X and Y and the significance of the association. The following formulas were considered to calculate the parameters [26]. (1) (2) (3) where X is the antecedent, Y is the consequent and N is the total number of transactions.

Results

The most relevant rules in each agro-ecological Zone define associations between the attributes minimum temperature, maximum temperature, ET, sunshine, maximum humidity, and rainfall for high yield of tomato in Benin (Tables 46). The rules have been filtered to get out the most pertinent with a consequent: ‘high yield’. The supports for the rules were above the minimum threshold. In addition, each rule established has a confidence of 1 and a lift equal to 1, attesting to their reliability. Overall trends indicated high tomato yield with medium values of minimum temperature, maximum temperature, maximum humidity, and sunshine, whatever the area considered (Tables 46). At the same time, ET was low (Tables 46) and rainfall were high in the Sudanian Zone (Table 4) and medium in the other two Zones (Tables 5 and 6). In the following subsections, we detail the rules separately for each Zone.

Rules from Sudanian Zone

In the Sudanian Zone, the most favorable conditions for high tomato yield were either ET Low, Tmin medium, and Umax High or Tmax medium, Umax Medium, and RR High. Tomato yield in the Sudanian Zone was also high when ET was low, Umax Medium, and RR Medium, or Tmax Medium, ET low, and RR Medium. High yield was also achieved in this Zone when Tmax, Tmin, and RR were medium (Table 4).

More explicitly, the first rule dictated that tomato yield was high when ET was less than 3.009 mm, Tmin was between 19.024 and 25.120°C, and Umax was greater than 94.264%. The yield was also high when Tmax was within 31.733 and 37.793°C, and Umax between 52.850 and 94.264%, with RR greater than 0.323 mm. The third rule stated that high yield is obtained with ET less than 3.009 mm and Umax in the interval 52.850 and 94.264 mm and RR within ]0.109–0.323 mm[. According to the fourth rule, the yield was high with Tmax within ]31.733–37.793°C[, and ET less than 3.009 mm and RR within ]0.109–0.323 mm[. The last rule of the Zone suggested Tmax between 31.733 and 37.793°C, and Tmin between 19.024 and 25.120°C, and RR within ]0.109–0.323 mm[ for high tomato yield.

Rules from Sudano–Guinean Zone

In the Sudano-Guinean Zone, only four rules met the stated conditions.

Tomato yield was high either when ET was Low, Tmin and RR Medium, or ET low, Tmin and Umax Medium, or Tmax, Tmin and Umax Medium, or Tmin, Umax, and Sun Medium. The first rule specified that the yield was above 322.760 Kg/ha in this area when ET is less than 2.387 mm, and Tmin in the range ]21.968–23.825°C[ with RR between 4.504 and 8.347 mm. The second rule projected a high yield when ET is less than 2.387 mm, and Tmin in the interval 21.968–23.825°C with Umax within ]88.161–95.710%[. About the third rule, the yield was high when Tmax is between ]31.079–36.545°C[ and Tmin within ]21.968–23.825°C[ with Umax in ]88.161–95.710%[. Regarding the fourth rule, the yield was high with Tmin, Umax, and Sun within 21.968–23.825°C[, ]88.161–95.710%[ and ]4.664–7.994 h[ respectively.

Rules from Guinean Zone

In the Guinean Zone, the yield was high either when ET was low, Tmin and Sun Medium, or Tmax, Tmin, and RR medium, or ET low, Umax and Sun medium, or Tmin, Umax and Sun Medium or ET low, Tmin and Umax Medium. The first rule defined that when the ET is less than 1.971 mm, and Tmin between 24.424 and 26.213°C with Sun within 4.664 and 7.528 h the yield was high and therefore greater than 3618.280 Kg/ha. The second rule considered a high yield with Tmax between 29.214–32.282°C, Tmin between 24.424 and 26.213°C, and RR between 1.331 and 3.250 mm. From the third rule, the yield was high with ET below 1.971 mm, Umax between 88.161–95.710%, and Sun between ]4.664–7.528h[. The fourth rule stated that tomato yield was high with Tmin, Umax, and Sun in the intervals ]24.424–26.213°C[, ]88.161–95.710%[ and ]4.664–7.528 h[ respectively. The last rule assumed high yield when ET is below 1.971 mm, and Tmin ranges from 24.424 to 26.213°C and Umax between 88.161–95.710%.

Discussion

The present study linking climatic variables to tomato yield is an original analysis of one of the most important vegetables in the Republic of Benin. The results indicated that conditions conducive to high tomato yield were average for most of the climatic parameters considered in this study. However, the rainfall was high in the Sudanian Zone. This is quite normal because this Zone’s monthly rainfall is lower than that in the other two Zones (Tables 13). Thus, the high rainfall threshold in the Sudanian Zone is below the average of the other Zones. The best yield patterns were observed in the Guinean Zone, followed by the Sudanian Zone. Performance in the Sudano-Guinean Zone was poor compared to the other two Zones. Thus, considering climatic data, the Sudano-Guinean Zone is unsuitable for tomato production. Our results align with those of Dwamena [27], who used multiple regression to assess the impact of minimum temperature, maximum temperature, and relative humidity variations on maize, cassava, and yam yields in Ghana. The results indicated that increased rainfall does not produce higher cassava yields [28]. Going in the same direction, Zhou and Guo [29, 30] demonstrated that high precipitation during flowering limits tomato growth. Indeed, heavy rainfall can lead to crops’ waterlogging, affecting crop roots’ respiration [29, 30]. Our results indicated that high tomato yield was associated with low ET in the three areas. ET refers to the loss of water through evaporation from the soil and transpiration by the plants themselves. High ET induces water stress for tomato plants, losing more water than they can absorb through their roots [31]. This leads to plant wilting, reduced growth, reduced fruit size, and eventually plant death if the water stress persists. When tomato plants suffer from water stress due to high ET, fruit production can be reduced. Flowers may abort before giving fruits, and growing fruits may dry out and fall prematurely. High ET affects the quality of tomato fruits [32]. Fruits become smaller, less juicy, and less flavorsome due to reduced water content. In addition, tomato plants subjected to prolonged water stress due to high ET become more vulnerable to disease. A weakened root system can be more susceptible to soil-borne infection, leading to diseases such as downy mildew or root rot. As a result, tomato plants subjected to high ET may not give high yields. We also found that the temperature and humidity were average in the regions with high yields. Our results are in line with the findings of several previous studies. Ezin [33] worked on food security under the threat of climate change in Benin. They experimented with three regions of the country (Cotonou, Bohicon, and Natitingou) and applied two treatments: 40°C considered as high and 27°C as normal. The study concluded that high temperature caused flower abortion and desiccation, resulting in low tomato yield [33]. Another study by Ayankojo [34] produced similar results. The study found that high temperatures resulted in lower fruit production and yield reduction. Similarly, Bhandari [35] observed that tomato production per hectare decreased when the maximum temperature was above 28°C. Tomatoes are sensitive to heat stress, which leads to reduced fruit production. However, Fernanda [36] found that temperature in Portugal is highly important in tomato yield prediction and negatively affects productivity from 21°C. They also observed that yield decreased with increasing relative humidity. According to Bhandari [35], yield increased when relative humidity was between 75% and 95%, which is consistent with our findings. In the same line, Fernanda [36] reported that humidity values greater than 71% positively impacted the average prediction of tomato yield. Furthermore, the results of this study are partially in line with those of Supro [16]. These authors worked on rice yield prediction and optimization using the Apriori algorithm and neural networks to improve agribusiness. They established several associations, including one which states that paddy rice yield was high in the Larkana region of India when humidity was medium. A high humidity environment is more likely conducive to the appearance of pests and diseases, resulting in reduced crop yields [29, 30]. However, our results could be more consistent with the findings of Rao [14], who established relationships between climatic and soil parameters and paddy yield in the Nanjangud taluk region of India with the ECLAT algorithm. They reported high yield when temperature and rainfall were high while soil pH and Nitrogen were medium. It can be justified by the fact that rice and tomato are different species with specific requirements. For example, in terms of water, rice is more water-consuming than tomatoes. The water requirement for tomatoes is between 1.62 and 4.58 mm per day [37]. However, according to Aryal [38], rice requires an average of 7.905mm of water daily. Therefore, while rice yields are high with high rainfall, tomato yields are high with low rainfall. In addition, the study areas are different, and the author did not give details on the different threshold values used to classify the attributes as low, medium, and high. This work is in the same direction as Ranjani [39], which used association rule algorithms to set up a farmer recommendation system. This system utilizes information about soil, weather, region, season, and past production to recommend the most profitable crops for cultivation in the appropriate environmental conditions. By presenting a comprehensive list of potential crops, the system aids farmers in deciding which crop to grow. Moreover, the system incorporates historical production data, allowing farmers to gain insight into the market demand and cost of different crops.

A variation in tomato yields was observed between different areas. Several factors may explain this variability. Changing climatic parameters significantly affect tomato yield. Areas benefiting from favorable climatic conditions, such as moderate temperatures, adequate rainfall, and optimal sunshine, tend to have higher yields than regions subject to extreme or unfavorable weather conditions. In addition, soil nutrient composition plays an important role in tomato production [40]. Furthermore, when tomato production is not carried out in managed environments, plants may be confronted with varying pest and disease infestation levels, which harms its yield. Pests such as aphids, nematodes, or diseases such as downy mildew, bacterial wilt, and fungal infections can lead to yield losses if not managed effectively [41]. In addition, the choice of tomato varieties or cultivars influences yield variations [42]. Different cultivars have varying genetic characteristics such as disease resistance, tolerance of environmental conditions, or productivity [42]. Thus, selecting appropriate cultivars adapted to specific regions helps optimize yields. Similarly, advanced agricultural technologies, such as greenhouse cultivation, precision irrigation, controlled environment systems, and improved post-harvest handling, can contribute to variations in tomato yields.

The present study has some limitations. Besides the climatic parameters that influence the yield, there is also the soil pH and the nutrient content in the soil. These different parameters have yet to be addressed at the moment. However, our research constitutes an essential departure point to implement the rules found in experiments with various fertilizers of different natures and doses to predict the yield of tomatoes based on all these parameters in real time.

Conclusion

This paper used the Frequent Patterns Growth algorithm to establish rules between yield and climate parameters in Benin’s three agroecological Zones. The database comprised climatic and yield data collected over 26 years in Benin. The rules obtained revealed that the attributes giving high tomato yield were variable from one region to another. In particular, rainfall was high in the Sudanian Zone but low in the other two areas. On the other hand, the attributes minimum temperature, maximum temperature, and maximum humidity were medium regardless of the Zone considered. The best yield patterns were observed in the Guinean area. This work can be extended to other vegetables requiring approximately the same climatic conditions as tomatoes. This study can also be improved by taking data specific to each growth phase of the tomato in combination with fertilizer.

Acknowledgments

We gratefully acknowledge the support provided by the German Academic Exchange Service (DAAD) and Artificial Intelligence for Development (AI4D) in Africa. The AI4D program is funded by the International Development Research Centre (IDRC) and the Swedish International Development Cooperation Agency (SIDA), and managed by the African Centre for Technology Studies (ACTS). Their support was instrumental in the completion of this research.

References

  1. 1. FAO. The Role of Agriculture in the Development of Least-developed Countries and their Integration into the World Economy. Fao (Food and Agriculture Organization of the United Nations), page 7, 2002. ISSN 1463-4236.
  2. 2. MAEP (Ministère de l’Agriculture, de l’Elevage et de la Pêche). Stratégie nationale pour l’e-Agriculture au Bénin 2020-2024. pages 1–57, 2019.
  3. 3. Ceylan R. and Alidou M. Factors affecting the most preferred local tomato variety “akikon” purchasing prices in Benin. Eurasian Journal of Agricultural Economics, 1:65–75, 04 2021.
  4. 4. D. Houessou, C. Gbedomon, J. v. d. Broek, K. Gandji, and F. Thoto. Roadmap to strengthen the vegetables sector in Benin Exploring business links with the dutch private sector. pages 1–72, 2021.
  5. 5. G. A. C. Mensah, R. Sikirou, F. Assogba-Komlan, B. B. Yarou, S.-K. Midingoyi, J. Honfoga, et al. Mieux produire la tomate en toute période au bénin. Référentiel Technico- Economique (RTE). MAEP/INRAB/FIDA/ProCar/PADMAR/World Vegetable Center/Bénin. Dépôt légal N° 11553, du 26/08/2019, Bibliothèque Nationale (BN) du Bénin, 3ème trimestre. ISBN: 978-9. 2019. ISBN 9789998253131.
  6. 6. Suruliandi A., Mariammal G., and Raja S. P. Crop prediction based on soil and environmental characteristics using feature selection techniques. Mathematical and Computer Modelling of Dynamical Systems, 27(1):117–140, 2021. ISSN 17445051.
  7. 7. Bahrami Mehdi, Shabani Ali, Mahmoudi Mohammad Reza, and Didari Shohreh. Determination of Effective Weather Parameters on Rainfed Wheat Yield Using Backward Multiple Linear Regressions Based on Relative Importance Metrics. Complexity, volume 2020, pages 6168252, 2020.
  8. 8. Chen X., Zheng H., Wang H., and Yan T. Can machine learning algorithms perform better than multiple linear regression in predicting nitrogen excretion from lactating dairy cows. Sci Rep., 12(1):12478, 2022. pmid:35864287
  9. 9. Ju S., Lim H., Ma J. W., Kim S., Lee K., Zhao S., et al. Optimal county-level crop yield prediction using modis-based variables and weather data: A comparative study on machine learning models. Agricultural and Forest Meteorology, 307:108530, 2021. ISSN 0168-1923.
  10. 10. Cravero A., Pardo S., Sepúlveda S., and Muñoz L. Challenges to use machine learning in agricultural big data: A systematic literature review. Agronomy, 12(3), 2022. ISSN 2073-4395.
  11. 11. Cedric L. S., Adoni W. Y. H., Aworka R., Zoueu J. T., Mutombo F. K., Krichen M., et al. Crops yield prediction based on machine learning models: Case of west african countries. Smart Agricultural Technology, 2:100049, 2022. ISSN 2772-3755.
  12. 12. Gómez D., Salvador P., Sanz J., and Casanova J. L. Modelling wheat yield with antecedent information, satellite and climate data using machine learning methods in mexico. Agricultural and Forest Meteorology, 300:108317, 2021. ISSN 0168-1923.
  13. 13. Zhang Y., Yun L., Cao J., and Tao . Combining optical, fluorescence, thermal satellite, and environmental data to predict county-level maize yield in china using machine learning approaches. Remote Sensing, 12:21, 12 2019.
  14. 14. Rao P. R., Gowda S. P., and Prathibha R. J. Paddy Yield Predictor Using Temperature, Rainfall, Soil pH, and Nitrogen. Lecture Notes in Electrical Engineering, 545(July):245–253, 2019. ISSN 18761119.
  15. 15. Manjula E. A Model for Prediction of Crop Yield. International Journal of Computational Intelligence and Informatics, Vol. 6: No. 4, March 2017, 6(4):298–305, 2017.
  16. 16. Supro I. A., Mahar J. A., and Mahar S. A. Rice yield prediction and optimization using association rules and neural network methods to enhance. Indian Journal of Science and Technology, (2), 2020.
  17. 17. V. Gitz, A. Meybeck, L. Lipper, C. Young, and S. Braatz. Climate change and food security: Risks and responses. 2016. ISBN 9789251089989.
  18. 18. Zhang S. and Wu X. Fundamentals of association rules in data mining and knowledge discovery. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, 1:97–116, 03 2011.
  19. 19. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. Proc.of 20th Int’l conf. on VLDB, pages 487–499, 1994.
  20. 20. kaur M. and Grag Urvashi. ECLAT Algorithm for Frequent Itemsets Generation. International Journal of Computer Systems, 01(January 2014):82–84, 2014.
  21. 21. M. Kamber, J. Han, and J. Chiang. Metarule-guided mining of multi-dimensional association rules using data cubes. Int. Conf. Knowledge Discovery and Data Mining (KDD’97), page 207–210, 1997.
  22. 22. Kavitha M. M. and Tamil Selvi S. T. Comparative Study on Apriori Algorithm and Fp Growth Algorithm with Pros and Cons. International Journal of Computer Science Trends and Technology (IJCS T), 4(4):161–164, 2016. ISSN 2347-8578.
  23. 23. Mythili M. and Mohamed Shanavas A. Performance Evaluation of Apriori and FP-Growth Algorithms. International Journal of Computer Applications, 79(10):34–37, 2013.
  24. 24. Garg K. and Kumar D. Comparing the Performance of Frequent Pattern Mining Algorithms. International Journal of Computer Applications, 69(25):21–28, 2013.
  25. 25. E. Hikmawati and K. Surendro. How to determine minimum support in association rule. In Proceedings of the 2020 9th International Conference on Software and Computer Applications, ICSCA 2020, page 6–10, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450376655.
  26. 26. T. Denmat, M. Ducasse, and O. Ridoux. Data mining and cross-checking of execution traces. a re-interpretation of jones, harrold, and stasko test information visualization. In 20th IEEE/ACM International Conference on Automated Software Engineering (ASE 2005), pages 396–399, 11 2005.
  27. 27. Dwamena Harriet Achiaa, Tawiah Kassim, and Kodua Amanda Serwaa Akuoko. The Effect of Rainfall, Temperature, and Relative Humidity on the Yield of Cassava, Yam, and Maize in the Ashanti Region of Ghana. International Journal of Agronomy, 2022:9077383, January 2022. Publisher: Hindawi. ISSN: 1687-8159. URL:
  28. 28. Y.R. Pandey and B.N. Chaudhary. Evaluation of tomato varieties and their planting dates for commercial production under Jumla agro-ecological condition. In: Proceedings of the Fourth National Horticultural Research Workshop, pages 380–385, 2004.
  29. 29. Zhou L. and Zhou S. (2012). Post-disaster adaptability to extreme weather events. China Population Resources and Environment, 22, 167–174.
  30. 30. Guo J. and Chen J. (2022). The Impact of Heavy Rainfall Variability on Fertilizer Application Rates: Evidence from Maize Farmers in China. International Journal of Environmental Research and Public Health, 19(23), 15906. pmid:36497975
  31. 31. Hao Shuxue, Cao Hongxia, Wang Hubing, and Pan Xiaoyan. The physiological responses of tomato to water stress and re-water in different growth periods. Scientia Horticulturae, 249:143–154, 2019.
  32. 32. Aires E. S., Ferraz A. K. L., Carvalho B. L., Teixeira F. P., Putti F. F., de Souza E. P., et al. Foliar Application of Salicylic Acid to Mitigate Water Stress in Tomato. Plants, 11(13):1775, 2022. pmid:35807727
  33. 33. Ezin V., Ibouraïma Y., Kochoni G. M. E., and Ahanchede A. Agriculture and food security under threat of change climate in benin. African Journal of Agricultural Research, 13:1389–1399, 07 2018.
  34. 34. Ayankojo Ibukun T and Morgan Kelly T. Increasing Air Temperatures and Its Effects on Growth and Productivity of Tomato in South Florida. Plants, 9(9):1245, 2020. pmid:32967258
  35. 35. Bhandari Roshan, Neupane Nilhari, and Adhikari Danda Pani. Climatic change and its impact on tomato (lycopersicum esculentum l.) production in plain area of Nepal. Environmental Challenges, 4:100129, 2021.
  36. 36. María Fernanda and Restrepo Suescún. Machine learning approaches for tomato crop yield prediction in precision agriculture.
  37. 37. Sharma P., Kothari M., and Lakhawat S. Water requirement on drip irrigated tomatoes grown under shade net house. Engineering and Technology in India, 6(1):12–18, 2015. ISSN 09761268.
  38. 38. Aryal S. Rainfall And Water Requirement Of Rice During Growing Period. Journal Of Agriculture and Environment, 13:1–4, 2013. ISSN 2091-1009.
  39. 39. J. Ranjani, V. K. Kalaiselvi, A. Sheela, D. Deepika Sree, and G. Janaki. Crop Yield Prediction Using Machine Learning Algorithm. Proceedings of the 2021 4th International Conference on Computing and Communications Technologies, ICCCT 2021, 2020:611–616, 2021.
  40. 40. Gao F., Li H., Mu X., Gao H., Zhang Y., Li R., et al. Effects of Organic Fertilizer Application on Tomato Yield and Quality: A Meta-Analysis. Applied Sciences, 13(4):2184, 2023.
  41. 41. Panno S., Davino S., Caruso A. G., Bertacca S., Crnogorac A., Mandić A., et al. A Review of the Most Common and Economically Important Diseases That Undermine the Cultivation of Tomato Crop in the Mediterranean Basin. Agronomy, 11(11):2188, 2021.
  42. 42. Bihon W., Ognakossan K. E., Tignegre J.-B., Hanson P., Ndiaye K., and Srinivasan R. Evaluation of Different Tomato (Solanum lycopersicum L.) Entries and Varieties for Performance and Adaptation in Mali, West Africa. Horticulturae, 8(7):579, 2022.