Prediction of Hexaconazole Concentration in the Top Most Layer of Oil Palm Plantation Soil Using Exploratory Data Analysis (EDA)

Ganoderma boninense is a fungus that can affect oil palm trees and cause a serious disease called the basal stem root (BSR). This disease causes the death of more than 80% of oil palm trees midway through their economic life and hexaconazole is one of the particular fungicides that can control this fungus. Hexaconazole can be applied by the soil drenching method and it will be of interest to know the concentration of the residue in the soil after treatment with respect to time. Hence, a field study was conducted in order to determine the actual concentration of hexaconazole in soil. In the present paper, a new approach that can be used to predict the concentration of pesticides in the soil is proposed. The statistical analysis revealed that the Exploratory Data Analysis (EDA) techniques would be appropriate in this study. The EDA techniques were used to fit a robust resistant model and predict the concentration of the residue in the topmost layer of the soil.


Introduction
Oil palm is the most important agricultural sector in Malaysia and it is contributing significantly to the economy of the country. In 2015, the total production of crude palm oil (CPO) recorded was 19,961,581 tons as higher compared in 2014 was 19,667,016 tons [1]. In order to produce this large volume of quality oil products, the concerned authorities such as the palm oil producers need to be careful about their production of palm oil. The application of chemicals such as fertilizers for nutrient requirements and pesticides for crop protection on oil palm is necessary in this regard. The use of pesticides is mainly for controlling weeds, pest, and fungus. Among the pesticides, cypermethrin, deltamethrin, endosulfan, fluroxypyr-MHE, chlorpyrifos, thiram, hexaconazole, etc. are notable and several studies have been carried out to investigate their behaviours [2][3][4][5].
Nowadays, the basal stem rot (BSR) which is caused by the fungus Ganoderma boninense is the most serious disease of oil palm in Malaysia. The cases of BSR are reported in Johor, Negeri Sembilan and Malacca in Malaysia [6][7]. The cases of BSR have also been reported in other countries like Africa, Papua New Guinea, Indonesia, and Thailand [8]. This disease causes the death of the oil palm trees in more than 80% cases midway through their economic life [9]. The primary infection of oil palms by the species of Ganoderma is due to the direct contact of living roots with colonized debris [10]. Among the different techniques, the best approach to control this disease consists of the removal of infected palms, soil mounding, fungicide treatment or a combination of these methods. The applications of chemical treatments are considered as the immediate short-term control measures. The use of this systemic fungicide, together with an appropriate technique of application may help to reduce the progress of the BSR [11]. Hexaconazole is one of the fungicides which can be used against the Ganoderma species [5,[10][11] and applied by the soil drenching method. It will be of interest to know the concentration of the residue in the soil (after treatment) with respect to time.
The term Exploratory Data Analysis (EDA) was introduced by John W. Tukey who had shown how simple graphical and qualitative techniques can be used to open-mindedly explore data. The EDA can help to improve the results of statistical hypothesis testing by forcing one to look at unbiased data before formulating hypotheses which are subsequently tested using the conventional statistic (confirmatory data analysis) [12]. EDA is a specific traditional data analysis tool was introduced by John Tukey and his associates in the early 1960s while Behrens [13] discussed on the philosophical underpinning and general heuristics of it. The EDA typically begins by examining each variable individually, combing through the data, checking the shapes of distributions and looking for outliers and rogue values. Then, the exploratory data analyst turns to look at relationships between pairs of variables and finally considers multivariate relationships [14].
However, the exploitation of EDA in environmental pollution such as pesticides, polycyclic aromatic hydrocarbons (PAH) and air pollution has never been reported. The prediction of concentrations of a pesticide in the soil, surface water, and ground water often involved the use of mathematical simulation models such as VARLEACH [27][28], PESTLA [29], PELMO [30] and LEACHP [31]. The present paper reports a field study conducted in order to determine and predict the hexaconazole concentration in the real environment conditions in the top most layer of the soil. Pesticide degradation in the field can then be predicted on the basis of these parameters and actual or predicted on-site temperature and moisture data. Therefore, we decided to take different approaches based on the EDA as an alternative. We started our analysis with the data obtained from the field experiment and transformed the data to construct the statistical model.

Methodology
Ethics Statement: N/A

Experimental site
The experimental site, as described by our previous work [3], is located in Bangi Lama, Selangor (101˚47'E, 02˚54'N). The research station is owned by Universiti Kebangsaan Malaysia (UKM) and jointly developed by the Malaysian Palm Oil Board (MPOB). The trial field consists of 225 DxP palm trees of seven years old. A complete randomized block was used as the experimental design with three replicates for each plot to accommodate three treatments namely, the recommended dosage (0.639 kg ha -1 ), double the recommended dosage (1.278 kg ha -1 ) and control (without fungicide treatment). The commercial grade of hexaconazole (Anvil 1 ) was applied to the experimental plots using the soil drenching method.
Soil sampling. The soil samples were collected from 0-50 cm (0-10, 10-20, 20-30, 30-40 and 40-50 cm) depth using a soil auger. However, in the present study only the top most layer, 0-10 cm depth of soil was used for analyses. The soil samples were collected on day 0 (on the day of spraying), 1, 3, 7, 21, 70, 90 and 120 after treatment with three replicates on the day of sampling. The soil samples were air-dried, sieved through a 2-mm mesh and stored in black polyethylene bags at -4˚C prior to analysis.
Climate. Daily rainfall and temperature were recorded from May to September 2009. The amounts of rainfall recorded were 115.1 mm, 78.90 mm, 66.20 mm, 237.70 mm and 205.5 mm for May, June, July, August, and September, respectively (Fig 1). The monthly mean of the maximum temperature values ranged from 33.8˚C to 34.9˚C as observed during the study period (Fig 1). Recovery study. The soil sample (25 g) was treated with five concentration levels of hexaconazole viz. 0.8, 0.5, 0.2, 0.1 and 0.01 mg kg -1 . The pesticide was then extracted from the soil to determine the hexaconazole residue for the quantification by using a gas chromatography.
Determination of hexaconazole in soil. The hexaconazole was determined using a similar method as in the earlier published papers [3,5]. Twenty-five grams of soil were shaken with 100 mL of dichloromethane and placed in an ultrasonic bath for half an hour. The extract was filtered from dichloromethane using a filter paper (Whatman 4) which contains sodium sulfate. Then, 50 mL of extract solution was evaporated to make up to 10 mL. The extract was then shifted to the graduated micro-vial to gain dryness by nitrogen gas. Next, the residue was re-dissolved with 1 mL acetone before being injected into the GC-ECD. The standard hexaconazole solution was used to quantify the analyte where each solution was injected twice with five replications.
Gas chromatography (GC) analysis. The analysis of hexaconazole was carried out by using the gas chromatograph (GC) (HP 6890) fitted with electron capture detector (ECD) and auto-sampler injector. The capillary column used was HP 5% MS column (30 m x 0.25 mm ID, 0.25 mm film thickness). The sample volumes of 2.0 μL were injected into the programmable splitless injector. The instrumental conditions used in the present study were similar to the earlier published papers [3,5].
Exploratory data analysis. For any given data set, traditional statistical methods like the Method of Least Squares can be used to obtain parameter estimates where the unknown parameters are estimated by minimizing the sum of the residual squares. However, the traditional least square estimators are based on some assumptions. If the data violates some of the assumptions, then the estimates and results can be misleading. Although the method of least squares often gives optimal estimates of the unknown parameters, it is very sensitive to the outliers and influential observations. Outliers can sometimes be a serious problem and consequently the result can mislead the prediction and validation of the fitted model. In such a case, some distributional assumptions must be made. Usually, it is assumed that the errors are normally distributed with zero mean and constant variance. The statistical inference based on the normality assumption is known to be vulnerable to outliers [32]. Secondly, it is assumed that the errors are uncorrelated and it would also be desirable to have a good number of observations, though not absolutely necessary.
Various diagnostic checking (behaviour of residuals) of the fitted models are seen in the literature with the help of which validation of a fitted model should be judged. The diagnostic plots are effective tools for checking the adequacy of data set to the fitted regression models. Residuals are assumed to be independent of the fitted values, meaning that the correlation between residuals and fitted values should be zero.
Subsequently, the effect of violation of the assumptions for the study data sets will be shown. In particular, the data sets are not normally distributed and diagnostic analysis revealed the poorly fitted models by traditional statistical methods. Furthermore, each data set comprised of eight observations only.
Hence, it is useful to search an alternative procedure where this small number of data points could be modelled properly. The alternative method suggested is a resistant fit after straightening out the plot. Details about straightening out plots and fitting resistant lines can be found in several references [33][34][35]. A resistant fit is a tool of EDA which does not rely too much on assumptions and is robust. First, one should check the half-slope ratio and if this ratio is not approximately 1 then a transformation is required. Usually, the Ladder of Powers is used for re-expressing the data. Once, the half-slope ratio is reasonably close to 1, and then a linear resistant fit can be done.

Recovery study
The recovery and relative standard deviation percent of hexaconazole spiked in soil samples at levels of 0.8, 0.5, 0.2, 0.1 and 0.01 mg kg -1 were 100 ± 1.92%, 105 ± 3.74%, 100 ± 5.64%, 102. ± 1.20% and 106 ± 4.28%, respectively. The detection limit of hexaconazole was 0.2 μg L -1 . Field study The data was collected from the topmost layer of the experimental plot of soil and are shown in Table 1. The soil in the trial plot was characterized by the sandy loam texture containing 27.29% clay, 62.62% sand, 10.09% silt, 0.86% total carbon and a cation exchange capacity (CEC) of 6.55.  Log-linear models have been fitted to the above data by the traditional OLS method. The results of the fitted models appear in Table 2. However, these models were not very useful since the normality assumptions used in building these models were violated, as discussed below.  A box plot of the data is shown in Fig 3 indicating that the data is skewed to the left. The skewness value is -0.505 and kurtosis value 2.350 also supported the result of the box plot. The skewness value for a normal distribution is zero and kurtosis is 3. Therefore, there is evidence that the data was not normally distributed. The diagnostic plots of OLS residuals are shown in Since it was shown that the data sets were not normally distributed and diagnostic analysis revealed the presence of outliers together with the leverage problems, the traditional least square regression methods may not be suitable to analyse this data set.
Hence, as an alternative, the EDA techniques named as the resistant fit were used in this study after straightening out the plot. For an appropriate resistant fitted model, the examination of the half-slope ratio is very essential. The half-slope ratio for this data set was found 0.596 which is not close to 1, indicating that the data was not linear. Hence, some sort of Prediction of Hexaconazole Concentration in the Soil Using Exploratory Data Analysis (EDA) transformation was required for this data set. A log e transformation for the concentration was chosen later and the half-slope ratio was recalculated and it was found to be 0.929 which is reasonably close to 1 and hence, the fitted resistant line is, In terms of the concentration the fitted model is  Table 3.
The H-spread value for the residuals was 0.224. The procedure of computation of the Hspread value can be found in [33]. The predicted intervals were constructed by using the equation, Predicted Conc: AE 1:5H À spread value of residuals The predicted concentration of hexaconazole and corresponding predicted intervals are given in Table 4. Fig 6 shows the fitted line and predicted intervals together with the observed data points. The scatter plot of the hexaconazole concentration of the double dosage is given in Fig 7. The half-slope ratio (Conc. against Day) found was 0.317 which is not close to 1 indicating that the data was not linear. Hence, some sort of transformation was required and a transformation was chosen for the Day raised to the power of 0.4. The half-slope ratio of the transformed line found was 1.007 which is very close to 1.     Table 5.
Here the H-spread value for the residuals was 0.350. Again, the predicted intervals were constructed by using the equation, Predicted Conc: AE 1:5H À spread value of residuals The predicted concentration and corresponding predicted intervals are given in Table 6 and the fitted line and predicted intervals together with the observed data points are shown in Fig 9. To measure the decline rate of hexaconazole concentration for both single and double dosages, both the models were plotted together as shown in Fig 10. It can be seen that the rate of decline was not the same, based on these models. The double dosage declines at a faster rate than the single dosage during the first week after treatment.

Conclusion
Statistical models were constructed for hexaconazole concentration for the topmost layer of the soil, both for the single and double dosages. Transformations and fitted resistant lines were applied for these dosages. The usefulness of these models are that they are simple, robust and do not rely on too many assumptions. The predicted values and intervals for both single and double dosages were also constructed and it was found that EDA can serve as a guide in predicting the concentration of hexaconazole after treatment. The advantages of using these prediction models are that it will save the time and effort in collecting field samples, cost of manpower and chemicals and indicate sampling interval after treatment. As these models were developed with only eight observations, it can be improved gradually if more observations are available. The inclusion of some other important variables like rainfall, preferential flow, microorganisms, etc. can also be considered in the future.