Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Prediction of Hexaconazole Concentration in the Top Most Layer of Oil Palm Plantation Soil Using Exploratory Data Analysis (EDA)

  • Zainol Maznah ,

    Affiliation Analytical and Quality Development Unit, Product Development and Advisory Services Division (PDAS), Malaysian Palm Oil Board (MPOB), Persiaran Institusi, Bandar Baru Bangi Kajang, Selangor, Malaysia

  • Muhamad Halimah,

    Affiliation Analytical and Quality Development Unit, Product Development and Advisory Services Division (PDAS), Malaysian Palm Oil Board (MPOB), Persiaran Institusi, Bandar Baru Bangi Kajang, Selangor, Malaysia

  • Mahendran Shitan,

    Affiliation Laboratory of Computational Statistics and Operations Research, Institute for Mathematical Research, University Putra Malaysia, UPM Serdang, Selangor, Malaysia

  • Provash Kumar Karmokar,

    Affiliations Laboratory of Computational Statistics and Operations Research, Institute for Mathematical Research, University Putra Malaysia, UPM Serdang, Selangor, Malaysia, Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh

  • Sulaiman Najwa

    Affiliation Analytical and Quality Development Unit, Product Development and Advisory Services Division (PDAS), Malaysian Palm Oil Board (MPOB), Persiaran Institusi, Bandar Baru Bangi Kajang, Selangor, Malaysia

Prediction of Hexaconazole Concentration in the Top Most Layer of Oil Palm Plantation Soil Using Exploratory Data Analysis (EDA)

  • Zainol Maznah, 
  • Muhamad Halimah, 
  • Mahendran Shitan, 
  • Provash Kumar Karmokar, 
  • Sulaiman Najwa


Ganoderma boninense is a fungus that can affect oil palm trees and cause a serious disease called the basal stem root (BSR). This disease causes the death of more than 80% of oil palm trees midway through their economic life and hexaconazole is one of the particular fungicides that can control this fungus. Hexaconazole can be applied by the soil drenching method and it will be of interest to know the concentration of the residue in the soil after treatment with respect to time. Hence, a field study was conducted in order to determine the actual concentration of hexaconazole in soil. In the present paper, a new approach that can be used to predict the concentration of pesticides in the soil is proposed. The statistical analysis revealed that the Exploratory Data Analysis (EDA) techniques would be appropriate in this study. The EDA techniques were used to fit a robust resistant model and predict the concentration of the residue in the topmost layer of the soil.


Oil palm is the most important agricultural sector in Malaysia and it is contributing significantly to the economy of the country. In 2015, the total production of crude palm oil (CPO) recorded was 19,961,581 tons as higher compared in 2014 was 19,667,016 tons [1]. In order to produce this large volume of quality oil products, the concerned authorities such as the palm oil producers need to be careful about their production of palm oil. The application of chemicals such as fertilizers for nutrient requirements and pesticides for crop protection on oil palm is necessary in this regard. The use of pesticides is mainly for controlling weeds, pest, and fungus. Among the pesticides, cypermethrin, deltamethrin, endosulfan, fluroxypyr-MHE, chlorpyrifos, thiram, hexaconazole, etc. are notable and several studies have been carried out to investigate their behaviours [25].

Nowadays, the basal stem rot (BSR) which is caused by the fungus Ganoderma boninense is the most serious disease of oil palm in Malaysia. The cases of BSR are reported in Johor, Negeri Sembilan and Malacca in Malaysia [67]. The cases of BSR have also been reported in other countries like Africa, Papua New Guinea, Indonesia, and Thailand [8]. This disease causes the death of the oil palm trees in more than 80% cases midway through their economic life [9]. The primary infection of oil palms by the species of Ganoderma is due to the direct contact of living roots with colonized debris [10]. Among the different techniques, the best approach to control this disease consists of the removal of infected palms, soil mounding, fungicide treatment or a combination of these methods. The applications of chemical treatments are considered as the immediate short-term control measures. The use of this systemic fungicide, together with an appropriate technique of application may help to reduce the progress of the BSR [11]. Hexaconazole is one of the fungicides which can be used against the Ganoderma species [5,1011] and applied by the soil drenching method. It will be of interest to know the concentration of the residue in the soil (after treatment) with respect to time.

The term Exploratory Data Analysis (EDA) was introduced by John W. Tukey who had shown how simple graphical and qualitative techniques can be used to open-mindedly explore data. The EDA can help to improve the results of statistical hypothesis testing by forcing one to look at unbiased data before formulating hypotheses which are subsequently tested using the conventional statistic (confirmatory data analysis) [12]. EDA is a specific traditional data analysis tool was introduced by John Tukey and his associates in the early 1960s while Behrens [13] discussed on the philosophical underpinning and general heuristics of it. The EDA typically begins by examining each variable individually, combing through the data, checking the shapes of distributions and looking for outliers and rogue values. Then, the exploratory data analyst turns to look at relationships between pairs of variables and finally considers multivariate relationships [14].

It has been over 37 years since the Exploratory Data Analysis (EDA) was introduced. Since that time, a number of publications integrating the EDA into the multidiscipline of science have been published [1518]. Numerous publications discussing on the environmental behaviour of pollutant and agrochemical have been studied. The agricultural scientists routinely manage sizeable amounts of scientific data originating from the agricultural field [35], lab experimentation [1920], observation [2122], computer models [2324], and simulations [2526].

However, the exploitation of EDA in environmental pollution such as pesticides, polycyclic aromatic hydrocarbons (PAH) and air pollution has never been reported. The prediction of concentrations of a pesticide in the soil, surface water, and ground water often involved the use of mathematical simulation models such as VARLEACH [2728], PESTLA [29], PELMO [30] and LEACHP [31]. The present paper reports a field study conducted in order to determine and predict the hexaconazole concentration in the real environment conditions in the top most layer of the soil. Pesticide degradation in the field can then be predicted on the basis of these parameters and actual or predicted on-site temperature and moisture data. Therefore, we decided to take different approaches based on the EDA as an alternative. We started our analysis with the data obtained from the field experiment and transformed the data to construct the statistical model.


Ethics Statement: N/A

Experimental site

The experimental site, as described by our previous work [3], is located in Bangi Lama, Selangor (101°47’E, 02°54’N). The research station is owned by Universiti Kebangsaan Malaysia (UKM) and jointly developed by the Malaysian Palm Oil Board (MPOB). The trial field consists of 225 DxP palm trees of seven years old. A complete randomized block was used as the experimental design with three replicates for each plot to accommodate three treatments namely, the recommended dosage (0.639 kg ha-1), double the recommended dosage (1.278 kg ha-1) and control (without fungicide treatment). The commercial grade of hexaconazole (Anvil®) was applied to the experimental plots using the soil drenching method.

Soil sampling.

The soil samples were collected from 0–50 cm (0–10, 10–20, 20–30, 30–40 and 40–50 cm) depth using a soil auger. However, in the present study only the top most layer, 0–10 cm depth of soil was used for analyses. The soil samples were collected on day 0 (on the day of spraying), 1, 3, 7, 21, 70, 90 and 120 after treatment with three replicates on the day of sampling. The soil samples were air-dried, sieved through a 2-mm mesh and stored in black polyethylene bags at -4°C prior to analysis.


Daily rainfall and temperature were recorded from May to September 2009. The amounts of rainfall recorded were 115.1 mm, 78.90 mm, 66.20 mm, 237.70 mm and 205.5 mm for May, June, July, August, and September, respectively (Fig 1). The monthly mean of the maximum temperature values ranged from 33.8°C to 34.9°C as observed during the study period (Fig 1).

Recovery study.

The soil sample (25 g) was treated with five concentration levels of hexaconazole viz. 0.8, 0.5, 0.2, 0.1 and 0.01 mg kg-1. The pesticide was then extracted from the soil to determine the hexaconazole residue for the quantification by using a gas chromatography.

Determination of hexaconazole in soil.

The hexaconazole was determined using a similar method as in the earlier published papers [3, 5]. Twenty-five grams of soil were shaken with 100 mL of dichloromethane and placed in an ultrasonic bath for half an hour. The extract was filtered from dichloromethane using a filter paper (Whatman 4) which contains sodium sulfate. Then, 50 mL of extract solution was evaporated to make up to 10 mL. The extract was then shifted to the graduated micro-vial to gain dryness by nitrogen gas. Next, the residue was re-dissolved with 1 mL acetone before being injected into the GC-ECD. The standard hexaconazole solution was used to quantify the analyte where each solution was injected twice with five replications.

Gas chromatography (GC) analysis.

The analysis of hexaconazole was carried out by using the gas chromatograph (GC) (HP 6890) fitted with electron capture detector (ECD) and auto-sampler injector. The capillary column used was HP 5% MS column (30 m x 0.25 mm ID, 0.25 mm film thickness). The sample volumes of 2.0 μL were injected into the programmable splitless injector. The instrumental conditions used in the present study were similar to the earlier published papers [3, 5].

Exploratory data analysis.

For any given data set, traditional statistical methods like the Method of Least Squares can be used to obtain parameter estimates where the unknown parameters are estimated by minimizing the sum of the residual squares. However, the traditional least square estimators are based on some assumptions. If the data violates some of the assumptions, then the estimates and results can be misleading. Although the method of least squares often gives optimal estimates of the unknown parameters, it is very sensitive to the outliers and influential observations. Outliers can sometimes be a serious problem and consequently the result can mislead the prediction and validation of the fitted model. In such a case, some distributional assumptions must be made. Usually, it is assumed that the errors are normally distributed with zero mean and constant variance. The statistical inference based on the normality assumption is known to be vulnerable to outliers [32]. Secondly, it is assumed that the errors are uncorrelated and it would also be desirable to have a good number of observations, though not absolutely necessary.

Various diagnostic checking (behaviour of residuals) of the fitted models are seen in the literature with the help of which validation of a fitted model should be judged. The diagnostic plots are effective tools for checking the adequacy of data set to the fitted regression models. Residuals are assumed to be independent of the fitted values, meaning that the correlation between residuals and fitted values should be zero.

Subsequently, the effect of violation of the assumptions for the study data sets will be shown. In particular, the data sets are not normally distributed and diagnostic analysis revealed the poorly fitted models by traditional statistical methods. Furthermore, each data set comprised of eight observations only.

Hence, it is useful to search an alternative procedure where this small number of data points could be modelled properly. The alternative method suggested is a resistant fit after straightening out the plot. Details about straightening out plots and fitting resistant lines can be found in several references [3335]. A resistant fit is a tool of EDA which does not rely too much on assumptions and is robust. First, one should check the half-slope ratio and if this ratio is not approximately 1 then a transformation is required. Usually, the Ladder of Powers is used for re-expressing the data. Once, the half-slope ratio is reasonably close to 1, and then a linear resistant fit can be done.

Results and Discussion

Recovery study

The recovery and relative standard deviation percent of hexaconazole spiked in soil samples at levels of 0.8, 0.5, 0.2, 0.1 and 0.01 mg kg-1 were 100 ± 1.92%, 105 ± 3.74%, 100 ± 5.64%, 102. ± 1.20% and 106 ± 4.28%, respectively. The detection limit of hexaconazole was 0.2 μg L-1.

Field study

The data was collected from the topmost layer of the experimental plot of soil and are shown in Table 1. The soil in the trial plot was characterized by the sandy loam texture containing 27.29% clay, 62.62% sand, 10.09% silt, 0.86% total carbon and a cation exchange capacity (CEC) of 6.55.

Statistical modelling for the recommended dosage

A scatter plot of the concentration of hexaconazole (mg kg-1) against time is given in Fig 2. It is clear from the figure that the concentration declined with respect to time. The commonly used models namely the Linear, Exponential, Power, Logarithmic (natural and base 10) and Log-linear models have been fitted to the above data by the traditional OLS method. The results of the fitted models appear in Table 2. However, these models were not very useful since the normality assumptions used in building these models were violated, as discussed below.

Fig 2. Scatter plot of hexaconazole concentration for the recommended dosage

A box plot of the data is shown in Fig 3 indicating that the data is skewed to the left. The skewness value is -0.505 and kurtosis value 2.350 also supported the result of the box plot. The skewness value for a normal distribution is zero and kurtosis is 3. Therefore, there is evidence that the data was not normally distributed. The diagnostic plots of OLS residuals are shown in Fig 4.

The residual versus fitted plot is shown in the top panel in Fig 4. The observation numbers 6 and 8 may be suspected as the potential outliers from the figure. The Q-Q Normality plot as shown in the middle panel in Fig 4 identified that the same observations were somewhat far from the straight line indicating that these points may be the source of violation of normality assumption. An observation with an extreme value of a predictor variable is known as a high leverage point and it has an unusually large effect on the estimated regression coefficients. If the model is fitted in the presence of such observations, it may mislead the whole inference. The plot of the residuals versus leverage shown in the bottom panel of Fig 2 indicates that observation 8 was outside the boundary line while observation 6 was close to the boundary line. This means that the data set had some leverage value problems, also.

Since it was shown that the data sets were not normally distributed and diagnostic analysis revealed the presence of outliers together with the leverage problems, the traditional least square regression methods may not be suitable to analyse this data set.

Hence, as an alternative, the EDA techniques named as the resistant fit were used in this study after straightening out the plot. For an appropriate resistant fitted model, the examination of the half-slope ratio is very essential. The half-slope ratio for this data set was found 0.596 which is not close to 1, indicating that the data was not linear. Hence, some sort of transformation was required for this data set. A loge transformation for the concentration was chosen later and the half-slope ratio was recalculated and it was found to be 0.929 which is reasonably close to 1 and hence, the fitted resistant line is, (1) In terms of the concentration the fitted model is (2)

Fig 5 shows the fitted model together with the observed data points while the fitted values along with the residuals are shown in Table 3.

The H-spread value for the residuals was 0.224. The procedure of computation of the H-spread value can be found in [33]. The predicted intervals were constructed by using the equation, The predicted concentration of hexaconazole and corresponding predicted intervals are given in Table 4. Fig 6 shows the fitted line and predicted intervals together with the observed data points.

Statistical Modelling for the double recommended dosage

The scatter plot of the hexaconazole concentration of the double dosage is given in Fig 7. The half-slope ratio (Conc. against Day) found was 0.317 which is not close to 1 indicating that the data was not linear. Hence, some sort of transformation was required and a transformation was chosen for the Day raised to the power of 0.4. The half-slope ratio of the transformed line found was 1.007 which is very close to 1.

Fig 7. Scatter plot of hexaconazole concentration for the double recommended dosage

Thus, the fitted resistant line for double dosage is (3)

Fig 8 shows the fitted model together with the observed data points. The fitted values along with the residuals of the fitted models for the double dosage of concentration are shown in Table 5.

Here the H-spread value for the residuals was 0.350. Again, the predicted intervals were constructed by using the equation,

The predicted concentration and corresponding predicted intervals are given in Table 6 and the fitted line and predicted intervals together with the observed data points are shown in Fig 9. To measure the decline rate of hexaconazole concentration for both single and double dosages, both the models were plotted together as shown in Fig 10. It can be seen that the rate of decline was not the same, based on these models. The double dosage declines at a faster rate than the single dosage during the first week after treatment.

Table 6. Predicted values and intervals for the double recommended dosage


Statistical models were constructed for hexaconazole concentration for the topmost layer of the soil, both for the single and double dosages. Transformations and fitted resistant lines were applied for these dosages. The usefulness of these models are that they are simple, robust and do not rely on too many assumptions. The predicted values and intervals for both single and double dosages were also constructed and it was found that EDA can serve as a guide in predicting the concentration of hexaconazole after treatment. The advantages of using these prediction models are that it will save the time and effort in collecting field samples, cost of manpower and chemicals and indicate sampling interval after treatment. As these models were developed with only eight observations, it can be improved gradually if more observations are available. The inclusion of some other important variables like rainfall, preferential flow, microorganisms, etc. can also be considered in the future.


This is a joint work between MPOB and Universiti Putra Malaysia. We thank the Director General of MPOB for the permission to publish this article. We are extremely grateful to the technical staff of the Pesticides Laboratory, MPOB for their invaluable technical assistance. Our sincere thanks also go to the Institute of Mathematical Research and Department of Mathematics, Universiti Putra Malaysia for their support in completing this project.

Author Contributions

  1. Conceptualization: ZM MH MS PKK SN.
  2. Funding acquisition: MH.
  3. Methodology: ZM.
  4. Software: MS PKK.
  5. Supervision: MH MS.
  6. Writing – original draft: ZM MH MS PKK SN.
  7. Writing – review & editing: ZM PKK SN.


  1. 1. MPOB. 2015. Malaysian Palm Oil Board.
  2. 2. Cheah UB, Ma CK, Dzolkhifli O, Ainie K, Chung GF. Persistence of cypermethrin, deltamethrin and endosulfan in an oil palm agroecosystem. 2001. Proceedings of PIPOC International Palm Oil Congress (Chemistry and Technology): 105–113.
  3. 3. Halimah M, Maznah Z, Ismail BS, Idris AS. Determination of hexaconazole in field samples of an oil palm plantation. Drug Test Analysis. 2012; 4 (Suppl. 1): 1–6.
  4. 4. Maznah Z, Ismail BS, Halimah M. Fate of thiram in an oil palm nursery during the wet season. J Oil Palm Res. 2012; 24: 1397–1403.
  5. 5. Maznah Z, Halimah M, Ismail BS, Idris AS. Dissipation of the fungicide hexaconazole in oil palm plantation. Environ. Sci. Pollut. Res. 2015; 22: 19648–19657.
  6. 6. Khairuddin, H. 1990. Basal stem rot of oil palm: incidence, etiology and control, Ph. D diss, Universiti Pertanian Malaysia, Serdang, Selangor.
  7. 7. Benjamin M, Chee KH. Basal stem root of oil palm–A serious problem on inland soils. MAPPS Newletter 19(1). 1995
  8. 8. Idris AS, Kushairi A, Ismail S, Ariffin D. Selection for partial resistance oil palm progenies to Ganoderma basal stem root. J Oil Palm Res. 2004; 16(2): 12–18.
  9. 9. Singh G. Ganoderma: the scourge of oil palms in the coastal areas. The Planter. 1991; 67: 421–444.
  10. 10. Pilotti CA, Sanderson FR, Aitken EAB. Genetic structure of a population of Ganoderma boninense on oil palm. Plant Pathology. 2003; 52: 455–463.
  11. 11. Idris AS, Arifurrahman R, Kushairi A. Hexaconazole as a preventive treatment for managing Ganoderma in oil palm. MPOB Information Series. TT No. 75. 2010
  12. 12. Hinterberger H. 2009. Exploratory Data Analysis, in Encyclopedia of Database Systems, Liu L. and Ozsu M.T., eds, Springer US, pp 1080.
  13. 13. Behrens JT. Principles and procedures of exploratory data analysis. Psychological Methods. 1997; 2(2): 131–160.
  14. 14. Curtis DA, Araki CJ. Whatever happened to exploratory data analysis? An evaluation of behavioral science statistics textbooks. Annual Meeting of the American Educational Research Association Chicago, IL, April 21–25. 2003
  15. 15. Almorza D, Garcia MH. Results of exploratory data analysis in the broken stick model. Journal of Applied Statistics. 2008; 35(9): 979–983.
  16. 16. Košmelj K, Blejec A, Kompan D. 2003. Exploratory data analysis as an efficient tool for statistical analysis:a case study from analysis of experiments, in Developments In Applied Statistics, Ferligoj A and Mrvar A., eds. Metodološki zvezki, 19, Ljubljana: FDV.
  17. 17. Leith RM, Hipel KW, Goertz H. Exploratory data analysis. Canadian Water Resources Journal. 1991; 16: 81–92.
  18. 18. Nikas C, Baklavas G. Savings and remitting attitudes of Albanian emigrants–an exploratory data analysis. Southeast European and Black Sea Studies. 2009; 9(4): 481–495.
  19. 19. Bromilow RH, Evans AA, Nichollas PH. Factors affecting degradation rates of five triazole fungicides in two soil types: 1. Laboratory incubations. Pest. Sci. 1999; 55: 1129–1134.
  20. 20. Chai LK, Wong MH, Hansen HCB. Degradation of chlorpyrifos in humid tropical soils. J Environ. Manage. 2013; 125: 28–32. doi: 10.1016/j.jenvman.2013.04.005. pmid:23632002
  21. 21. Diekmann F. Data practices of agricultural scientists: results from an exploratory study. Journal of Agricultural & Food Information. 2012; 13: 14–34.
  22. 22. Halimah M, Tan YA, Nik Sasha KK, Zuriati Z, Rawaida AI, Choo YM. Determination of life cycle inventory and greenhouse gas emissions for a selected oil palm nursery in Malaysia: a case study. J Oil Palm Res. 2013; 25(3): 343–347.
  23. 23. Ismail BS, Maznah Z. Comparison between field experiment and PERSIST model simulation: dissipation of fenvalerate in a Malaysian agricultural soil. Bull Environ Contam Toxicol. 2005; 74: 1143–1150. pmid:16158853
  24. 24. Ismail BS, Ngan CK. Dissipation of chlorothalonil, chlorpyrifos, and profenofos in a Malaysian agricultural soil: a comparison between the field experiment and simulation by the PERSIST model. J Environ Sci Health. 2005; 40: 341–353.
  25. 25. Beulke S, Dubus IG, Brown CD, Gottesbtiren B. Simulation of pesticide persistence in the field on the basis of laboratory data. A review. J Environ Qual. 2000; 29: 1371–137.
  26. 26. Scorza RP, Boesten JJTI. Simulation of pesticide leaching in a cracking clay soil with the PEARL model. Pest Manag Sci. 2005; 61(5): 432–448. doi: 10.1002/ps.1004. pmid:15643643
  27. 27. Walker A, Hollis JM. Prediction of pesticide mobility in soil and their potential to contaminate surface and groundwater. In: Hewitt, H.G., Caseley, J.C., Copping, B.T., Grayson, B.T., Tyson, D. (Eds.), Comparing Glasshouse and Field Pesticide Performance, Technical Monograph No. 59. British Crop Protection Council, Farnham, Surrey, pp. 221–224.
  28. 28. Walker A, Welch SL. The relative movement and persistence in soil of chlorsulfuron, metsulfuron-methyl and triasulfuron. Weed Res. 1989; 29: 143–152.
  29. 29. Boesten JJTI, Van der Linden AMA. Modeling of the influence of sorption and transformation on pesticide leaching and persistence. J Environ Qual. 1991; 20: 425–435.
  30. 30. Nicholls PH. Simulation of the movement bentazon in soils using CALF and PRZM models. J Environ Sci Health. 1994; 29: 1157–1166.
  31. 31. Wagenet RJ, Rao PSC. 1990. Modeling pesticide fate in soils, in Pesticide in the Soil Environment-Processes, Impacts and Modeling, Cheng H.H., edt, Soil Sci Soc Amer, 351–399.
  32. 32. Lange KL, Little RJA, Taylor JMG. Robust statistical modeling using the t distribution. Journal of the American Statistical Association. 1989; 84(408): 881–896.
  33. 33. Shitan M, Vazifedan T. 2011. Exploratory Data Analysis for Almost Anyone. UPM Press, Serdang, Selangor.
  34. 34. Tukey JW. 1977. Exploratory Data Analysis. Addison-Wesley, Reading, Massachusetts.
  35. 35. Velleman PF, Hoaglin DG. 1981. Applications, Basics and Computing Of Exploratory Data Analysis. Duxbury Press, Boston, Massachusetts.