Global Warming: Predicting OPEC Carbon Dioxide Emissions from Petroleum Consumption Using Neural Network and Hybrid Cuckoo Search Algorithm

Background Global warming is attracting attention from policy makers due to its impacts such as floods, extreme weather, increases in temperature by 0.7°C, heat waves, storms, etc. These disasters result in loss of human life and billions of dollars in property. Global warming is believed to be caused by the emissions of greenhouse gases due to human activities including the emissions of carbon dioxide (CO2) from petroleum consumption. Limitations of the previous methods of predicting CO2 emissions and lack of work on the prediction of the Organization of the Petroleum Exporting Countries (OPEC) CO2 emissions from petroleum consumption have motivated this research. Methods/Findings The OPEC CO2 emissions data were collected from the Energy Information Administration. Artificial Neural Network (ANN) adaptability and performance motivated its choice for this study. To improve effectiveness of the ANN, the cuckoo search algorithm was hybridised with accelerated particle swarm optimisation for training the ANN to build a model for the prediction of OPEC CO2 emissions. The proposed model predicts OPEC CO2 emissions for 3, 6, 9, 12 and 16 years with an improved accuracy and speed over the state-of-the-art methods. Conclusion An accurate prediction of OPEC CO2 emissions can serve as a reference point for propagating the reorganisation of economic development in OPEC member countries with the view of reducing CO2 emissions to Kyoto benchmarks—hence, reducing global warming. The policy implications are discussed in the paper.


Introduction
to improve the forecast accuracy of the FRBPNN. Results indicated that the collaborative fuzzy neural network outperforms the FRBPNN and statistical methods in the forecasting of global CO 2 . Bao and Hui [20] applied the Grey model to build a model for the forecasting of CO 2 emissions in Shijiazhuang, China. The model was used to project the CO 2 emissions of Shijiazhuang from 2010 to 2020. In another study, the CO 2 emissions related to energy in developing countries were forecasted using the Grey model [21]. The Grey model is not effective with a large sample of data; it requires small samples of observations to be robust [22], lacks fitting ability and has a deficiency in nonlinear modeling [23]. This motivated Tan and Zhang [23] to use GA to improve fitting ability of the Grey model and combined the genetic algorithm (GA) fitted Grey into BPNN for improving its nonlinear approximation ability. The model was used to predict energy load with improved performance.
However, the BPNN is a gradient based algorithm that has the possibility of being stuck in local minima, slow convergence, highly dependent on parameter settings, and generates complex error surfaces with a multiple local minimum [24][25]. Fuzzy systems lack the capability of learning input data; human language is used to represent the input and output of the systems. Thus, incomplete or wrong rules cannot be handled well by fuzzy systems. Tuning of the systems is not a direct task [26]. The GA abolished previous knowledge of the problem if the population changes [27], and requires many parameter settings that undermine its robustness [28].
Studies on the prediction of OPEC CO 2 emission from petroleum consumptions are scarce in the literature, despite the increasing consumption of petroleum and emissions of CO 2 by the OPEC countries. Limitations of the previous studies and lack of work on the prediction of OPEC CO 2 emission from petroleum consumptions motivated the present research.
To circumvent the limitations of the gradient decent algorithms, several biologically inspired global algorithms were proposed such as GA, particle swarm optimisation (PSO), artificial bee colony (ABC), etc., and recently cuckoo search algorithm for training the ANN. However, the cuckoo search algorithm (CS) was found to be more effective than the GA, PSO, and ABC [29]. In this paper, we proposed to hybridise the CS and Accelerated PSO (APSO) for training ANN (HCSNN) to build a model for the prediction of OPEC CO 2 emissions. The HCSNN can improve the prediction accuracy and convergence speed of the ANN more than the GA, ABC, CS, and APSO as shown in the preliminary experiments [30].
In our approach, the hybrid CS communication capability of the cuckoo births has been improved by introducing APSO to search for a better location in which the optimal nest can share information with the cuckoo unlike in the previous studies. In the literature, Valian et al. [31] modified the CS by using variable probability of worse nests and step size when generating new solutions instead of the constant probability of worse nests and step size. Abubakar et al.
[32] adopted Walton et al. [33] modified CS by adding exchange information between the eggs to the model and the crossover. Also, the distance to the location of a new egg was computed using an inverse golden ratio. Abubakar et al. [32] used the Walton et al. [33] modified CS to train a Functional Link ANN to build a model for the prediction of temperature and relative humidity in Malaysia. Abubakar et al. [34] further applied the model proposed in [32] for the prediction of climate change via temperature and ozone.

Cuckoo Search Algorithm
The Cuckoo search algorithm is a new optimisation algorithm [35] developed by Yang and Deb [36], currently attracting attention from the research community. Attention is expected to continue into the future [28]. The CS is a global search algorithm for searching a global optimum solution. In CS, the fitness can be proportional to the objective function value without difficulties. Getting an optimised solution of a complicated problem using CS does not require a comprehensive search. Cuckoos are fascinating birds due to their aggressive strategies in reproduction. The 3 types of the brood parasitism strategy are as follows: (1) Intraspecific brood parasitism; (2) cooperative breeding; (3) nest takeover. Engaging in conflict directly between the host birds and cuckoos is possible. The host birds either abandon the nest or throw the alien eggs out of the nest to produce new eggs. In lévy flight distribution, animals and birds search for food in random or quasi-random, thus following a random walk, since the subsequent action relies on the present position and transition probability of the next state [37]. This behaviour has been applied in the CS optimisation, which has shown a better performance than other distribution-based random walk in exploring large scale search space. The lévy flight distribution is expressed as shown in Eq (1) based on Fourier transform (I) Yang [38]: Where α is the scaling parameter and s is the step length. Only special cases of parameters have inverse transform with explicit analytical formulae. Eq (1) can be changed to Eq (2) if the λ = 2.
The inverse integral of the transform of Eq (2) produces the Gaussian distribution and the inverse integral is expressed in Eq (3): Where M is the cost function and μ is the location parameter, when s ! 1 Eq (3) becomes: GðyÞ ¼ Where the gamma function is represented by Γ(y) and y = n, we have Γ(n) = (n−1)!. The 3 major ideas of the CS proposed by Yang and Deb [36] for rules as an optimisation algorithm for the CS are: (1) Each of the cuckoo lays one egg at a time and puts it in a randomly chosen nest; (2) the nests with the optimum quality eggs will move to the next generation; (3) the available nest host is fixed and the egg laid by a cuckoo is discovered by the host bird with the probability of worse nests to be abandoned (P a ) p a 2 [0,1]. The fitness function is selected as the objective function itself for maximum or minimum problems. In the generation of a new solution, x ðtþ1Þ i for cuckoo i, a levy flight is performed as expressed in Eq (6): Where α 1 is the lévy flight step size multiplication processes with an entry wise multiplication process. However, levy flight provides a random walk, whereas their random step lengths are drawn from the levy flight distribution for large steps. The CS initialised the population (n) for the nest, and randomly selected the best nest via levy flight. Thus, the cuckoo birds are always looking for a better place in order to reduce the chance of their eggs being discarded. The CS requires the setting of parameters for execution such as n, etc. However, the most critical parameters required to obtain the optimal solution from CS are P a and α 1 [39]. The pseudo-code for the CS is shown in Fig 1.

Accelerated Particle Swarm Optimisation
Particle Swarm Optimisation. The choreography behaviour of birds and insects motivated Kenneth and Eberhart [40] to propose PSO. A number of individuals in PSO refined their knowledge of the given search space. Each and every individual in a PSO has a particle that refers to position and velocity. In PSO two pieces of information are responsible for adjusting the particle trajectory: The best location stays at the present point and global best location is reached by the entire swarm. The PSO uses evaluation function to assign a fitness value like other optimisation techniques.
Global best is the highest fitness value reached by a swarm, while local best is the highest fitness value that an individual particle has attained. Global and local best are remembered by each particle. PSO randomly initialised population of solutions, searching for the optimum solution by evolving generations. The basic steps involved in PSO operation from the initial stage to the optimum solution are depicted in Fig 2. Accelerated Particle Swarm Optimisation. The APSO is a modified version of the standard PSO proposed by Yang et al. [41]; in APSO, convergence is accelerated by using only global best, unlike the standard PSO that uses both global best and individual best. The individual best is used for increasing diversity to obtain a quality solution, which can also be achieved using other randomness. Thus, it is not compulsory to use the individual best except in solving highly nonlinear and multimodal problems. It was found that the APSO advances the performance of the standard PSO. Compared to other variants of PSO, APSO has only two parameters: The α and β representing the learning parameters or acceleration constants (α % β).

Neural Network
The ANN is comprised of nodes in the input, hidden, and output layers. Nodes in the input layer feed inputs to nodes in the hidden layers, and continue in a forward direction up to the nodes in the output layer. The nodes in the input layer are configured based on the independent variables in the dataset, and the dependent variable determined the output nodes [42][43]. There can be more than one hidden layer; however, theoretical works, such as [44], argued that one hidden layer is sufficient to approximate any complex non-linear function. The number of nodes in the hidden layer is commonly realised through trial and error [45]. A typical structure of the ANN is shown in Fig 3. The ANN is an algorithm for processing information in parallel and can model complex and nonlinear associations using input-output training from datasets collected from the application domain. The intrinsic capabilities of the NN enable the algorithm to provide a nonlinear mapping of input and output vectors [43].
The NN can modify itself to perform the task if the optimal weights and bias of the NN are established [46]. There are several gradient-descent training algorithms for the optimisation of the NN weights and bias such as the Levenberg-Marquardt, backpropagation, resilient back propagation, scaled conjugate gradient, conjugate gradient with Powell-Beale restarts, Polak- Ribiere conjugate gradient, Fletcher-Reeves conjugate gradient, BFGS quasi-Newton and onestep secant algorithms. The most commonly used NN training algorithm is the backpropagation algorithm [47]. The backpropagation algorithm is a gradient method for minimising the error cost function. However, these gradient descent algorithms, are susceptible to limitations such as over-training of the NN, which could cause the training data to be overfitted and degrade the prediction accuracy. They have the possibility of being stuck in local minima, depending on the error surface shape, saturation, rate of convergence and so on [48]. Thus, the training of the NN using a HCS is ideal because the limitations of the gradient descent algorithms can be eliminated.

The Organization of the Petroleum Exporting Countries' CO 2 Emissions Dataset
The dataset for the OPEC CO 2 emissions from the consumption of petroleum in million metric tons (mmt) from 1980 to 2011 was collected from [49], a credible source of energy data [50]. The data are collected yearly, in view of the fact that the data are available on a yearly basis. Data availability determined the collection period and frequency [51]. The data is comprised of the 12 OPEC countries' CO 2 emissions and the total OPEC CO 2 emissions. The columns and rows of the dataset are 13 and 32 respectively. The basic statistics of the dataset are presented in Table 1 showing the maximum, minimum, mean and standard deviation (SD) for each OPEC country CO 2 emissions dataset including the OPEC for the data collection period.
The OPEC CO 2 emission is the dependent variable, whereas the CO 2 emissions from the 12 member countries of the OPEC, as shown in Fig 4, are the independent variables representing the inputs. Therefore, the CO 2 emissions of the 12 OPEC countries are used as the inputs to predict the OPEC CO 2 emissions from petroleum consumption.
The dataset was normalised to a range of [-1,1] using Eq (7) to improve prediction accuracy and convergence speed [52].
Where n o = normalise dataset, k i = raw dataset, x min = minimum value of the dataset and p max = maximum value of the dataset. The OPEC CO 2 emissions dataset was analysed using correlation to investigate the relationship between dependent variables and between dependent and independent variables. Successful prediction requires that the variables involved in the task be positively related [53]. Table 2 is the correlation matrix of the variables involved in the prediction. It was found that the relationships among the variables are positively related. This makes the variables suitable for the prediction.

The Design of the Proposed Hybrid Cuckoo Search Neural Network (HCSNN)
The major components of the proposed method are presented in a flowchart in Fig 5. The major stages comprised of the dataset, modeling, and evaluation. In the proposed approach, CS is hybridised with APSO to build the HCS. In the proposed HCS, communication capability of the cuckoo births have been improved by introducing APSO to search for a better location in which the optimal nest can share information with the cuckoo. Thus, the HCS chooses the optimal nest among all the nests via lévy flight, unlike in the standard CS (refer to section 2.1). The HCS performs the search using Eq (8) [40][41]. The standard Equation of the CS is given in Eq (6). Eq (9) is the proposed equation in which the velocity vector v i t+1 is taken from Eq (9) which is the standard Eq of the APSO [41]. The proposed Eq (9) is derived from Eq (6) and Eq (8).
Where v i t+1 is the velocity vector, v i t and x i t are positions vector for the particle, and ε n represents the random vector typically drawn from [0,1]. The current global best is represented by g Ã . The mean square error (MSE) is chosen as the objective function because the HCSNN performance is to be compared with other meta-heuristic algorithms for evaluation purposes. The MSE is better than other performance indicators such as normalised mean square error, sum of square error, etc. in comparing performance of different algorithms on the same dataset [43].
To really assess the performance of the proposed method, the HCSNN was experimented across several training and test datasets with varying data partition ratios (training-testing); given that training data has an effect on the performance of the prediction [54], a similar practice was used in [55]. Therefore, five different data partition ratios were used in this study and each was run 10 times because meta-heuristic algorithms are required to be run more than once to compute the mean, best and the worst results as meta-heuristic algorithms are not deterministic. The best solution is typically realised from multiple execution of the algorithm [56]. The input neurons of the ANN are set to 12 in view of the fact that the independent variables in the dataset are 12, and the output neuron is set to 1 because only one dependent variable is used (refer to section 3). The hidden neurons were fixed to 5 as suggested by experimental trials. There are many activation functions but tanh is preferred in the hidden layer of ANN for solving prediction problems [57], and linear in the output layer as  recommended by Beale et al. [58]. The objective of the HCS is to train the ANN to optimise its weights and bias. Running HCS requires initialisation to start running. The HCS, like other meter-heuristic algorithms, requires the setting of parameter values. There is no systematic, universally agreed method of getting the best settings of meta-heuristic algorithms [28]. In this study, we adopted the parameters Pa = 0.25, α 1 = 1, n = 25 [36], α = 0.7 [41]. The proposed HCSNN was run for a maximum of 1000 generations to build a HCSNN-with-bias for the prediction of OPEC CO 2 emissions. The pseudo-code of the HCSNN proposed in the research is presented in Fig 6. For the purpose of evaluating the effectiveness of our method, we used standard CS, GA, PSO, ABC to optimise the weights and bias of the ANN to build CSNN, GANN, PSONN, ABCNN for the prediction of OPEC CO 2 emissions. The results of the proposed and comparative methods are compared.

Results and Discussion
The numerical results of the experiments conducted using the OPEC CO 2 emissions datasets are shown and discussed in this section. Experimental simulation analysis shows that it is possible to predict OPEC CO 2 emissions in 3, 6, 9, 12, and 16 years using the proposed HCSNN.

Sensitivity of the ANN and CS configuration parameters
The results of the experimental trials to investigate several configurations of the ANN with regard to the variety of CS parameter settings are presented in Table 3. The experiments are repeated for different number of hidden layer neurons starting from 2 with an increment of 1 up to 7. The experiment was stopped at 7 hidden layer neurons in all the trials because it was observed that the MSE started reducing from 6 hidden neurons. A similar phenomenon was observed by Uzer et al. [59] in their experiments. All the experiment trials conducted has proven that 5 hidden neurons were the best for the ANN. The sensitivity of the CS parameters as shown in Table 3 has influence on the ANN performance. This is not surprising because meta-heuristic algorithms are sensitive to parameter settings, though CS require the setting of only P a and α parameters [60]. Among the CS parameters used in the experiment trials, the values suggested by Yang and Deb [36] were found to be the best settings for the CS. Many literature [61][62][63] adopted the CS settings proposed by Yang and Deb [36] for the execution of CS because of their performance.

Comparing HCSNN with the Standard CSNN and APSONN
The experiments were implemented using MATLAB R2012b on a machine (Intel Core 2 Quad CPU 2.33GHz, RAM 2GB, 32-bit operating system). The source code can directly be requested from abdullahdirvi@gmail.com. The comparison between the proposed HCSNN and the basic CSNN and APSONN were first performed. Subsequently, the proposed HCSNN is compared with other established algorithms (GANN and ABCNN). In Chiroma et al. [64], their proposed meta-heuristic algorithm method of modelling oil consumption was compared with other meta-heuristic algorithms. Tables 4-7 summarised the simulation results; the first column is the data partition ratio, whereas the second, third, and fourth columns are the mean, best, and worst results, respectively. The results were obtained for each of the algorithms after the experiments on both training and test OPEC CO 2 emissions datasets.
Tables 4-7 reported the performance of the proposed HCSNN, CSNN, and APSONN on the training and test OPEC CO 2 emissions datasets. The HCSNN was found to converge to the optimal solution faster than the CSNN, and APSONN on both training and test datasets. Therefore, the proposed HCSNN can be considered as the best algorithm because the best algorithm converges to the best solution within a short period of time [28,65]. The proposed HCSNN has improved the performance of the CSNN and APSONN prediction methods. This signifies that the proposed HCSNN has the capability of providing a better solution in a short period of time. The performance advances made by the proposed HCSNN over the standard CSNN and APSONN could probably be achieved because of the hybridisation of the standard CS and APSO, which improves the communication capability of the cuckoos to search for a better location where the optimal nest can share information with the cuckoo; hence, it improves the performance of the CS and the APSO to converge to the optimal solution very fast.

Comparing Performance of HCSNN, GANN, and ABCNN
Comparing a proposed method based on meta-heuristic algorithm to other meta-heuristic algorithms [66] is required. The proposed HCSNN performance was compared with  test dataset showed that the proposed HCSNN can provide better accuracy and convergence speed than the GANN and ABCNN on both training and test dataset. Those results have further validated the effectiveness and robustness of the HCSNN in the prediction of OPEC CO 2 emissions. The performance of the HCSNN can probably be attributed to the CS in twofold: (1) The CS striking balance between local and global search; (2) the CS requires few parameters to run successfully unlike GA and ABC that require more parameters settings than the CS.

Predicted vs. Actual OPEC CO 2 Emissions from Petroleum Consumption
The pattern of the actual OPEC CO 2 emissions and the predicted ones by the algorithms (HCSNN, GANN, ABCNN, and APSONN) are depicted in Figs 7-11. The prediction is based on the OPEC CO 2 emission test dataset reserved for evaluation purpose. The prediction is for 3, 6, 9, 12, and 16 years, respectively. The performance indicators are in Tables 4-7 and Tables 8-11. It can be observed in , that the OPEC CO 2 emissions predicted by the proposed HCSNN is more fit to the actual OPEC CO 2 emissions than the other comparison algorithms. In 3,6,9,and 12 year predictions (Figs 7-10) the other compared algorithms also fitted close to the actual OPEC CO 2 emissions except for GANN. The abolition of previous knowledge by GA could possibly be responsible for the low performance of the GANN. In 16 year predictions, ABCNN and CSNN move further away from the actual OPEC CO 2 emissions. This has clearly shown that the algorithms are not robust as the number of the predicted years increases. Thus, ABCNN, GANN, and CSNN are not consistent in their performance. The APSONN performance is consistent. However, the proposed HCSNN is consistent and more accurate than the APSONN in the prediction of OPEC CO 2 emissions, as the HCSNN has maintained similar performance throughout the prediction periods. Therefore, the HCSNN is robust, accurate and fast in the prediction of OPEC CO 2 emissions. The optimum solution for application in solving real world problem is required to be robust in addition to accuracy and convergence speed as argued by Yang and Deb [28].

Policy Implications
In view of the fact that protection against global warming requires a holistic approach, accurate prediction of OPEC CO 2 emissions from petroleum consumption can give member countries a better estimate of CO 2 emissions expected in the future, thus allowing OPEC to create a robust CO 2 emissions policy involving the 12 member countries. OPEC members are skeptical, however, about the reduction of CO 2 . This is because it may increase the price of oil to consumers, thus decreasing demand from developed countries, which accounts for 60% of total oil consumption in the world [67]. This can obstruct development and decline in revenue generation in OPEC countries, given that the main source of government revenue in OPEC countries is the sale of petroleum. Reducing oil consumption means reducing CO 2 emissions. If some OPEC countries are reducing CO 2 emissions while others are not, surely it can affect other members' CO 2 emissions (see Table 2). Thus, a holistic approach is required by all the member countries to put measures in place that will drastically reduce CO 2 emissions in all the countries if meaningful results are to be achieved. However, reducing oil consumption must be done with a caution given that oil consumption is significantly positively related to economic development as described in [3].
Since OPEC members are developing countries, the reduction of CO 2 must be done with precautions in order not to slow down economic development and generation of revenue. An accurate prediction of OPEC CO 2 emissions can serve as a reference point for an OPEC secretariat to propagate the reorganisation of economic development in member countries with the view of managing CO 2 emissions. Evidence of CO 2 emission dangers can easily be used to convince member countries to embark on economic development that can result to minimal petroleum consumption and reduced CO 2 emissions. In view of the economic implications of reducing CO 2 emissions, reduction of the CO 2 emissions in OPEC countries must be enforced with caution. Considering the contributions of OPEC countries to global warming, it is significant for OPEC to adapt its policies on climate change that can enforce stringent measures for the members to adopt an energy-efficient economy.
Meng et al. [18] argued that the CO 2 emissions emanating from countries that are developing has attracted unprecedented attention to economic development and the increasing consumption of fossil energy consumption. The Efforts been taken by the developing countries in monitoring and controlling the emissions of CO 2 have become a premise to further maintain the Kyoto benchmark on climate change alleviation.
The future prediction of CO 2 emission is one of the major factors for the management, control and modification of a state of the art policies related to CO 2 emissions [18,21]. The management and control of the emissions of CO 2 drastically reduce the negative effects of global warming [10][11].
The limitations of our study: The prediction was performed based on historical data. As such, future predicted CO 2 emissions can be affected by a prolonged wars or famines that can bring down the economic growth. As a result, the emission of CO 2 emissions can be decreased in the future. Also, the data were collected on yearly frequency. Therefore, the prediction horizon is limited to yearly basis.

Conclusions
This paper proposed a method for the prediction of OPEC CO 2 emissions based on CS, ANN, and APSO to improve accuracy and convergence speed. The dataset required for the modelling was collected from the Energy Information Administration. We built a HCSNN model to predict OPEC CO 2 emissions. Intensive experiments were conducted with HCSNN and other meta-heuristic algorithms such as CSNN, PSONN, GANN, and ABCNN to predict OPEC CO 2 emissions in 3, 6, 9, 12, and 16 years. Comparative results indicated that the proposed HCSNN advanced the prediction accuracy and convergence speed of the comparison meta-heuristic algorithms in all the years. Accurate and timely prediction of OPEC CO 2 emissions can allow OPEC member countries to accurately adapt OPEC policies related to climate change. This is because the more the prediction accuracy of CO 2 emissions, the more the accuracy of the decision to be taken on climate change policies, hence, reducing the contributions of the OPEC countries to global warming. In the future, the method presented in this study will be modified to investigate the effectiveness of the method in the estimate of CO 2 loss from the streams [68]. The method presented in this research could easily be implemented into software to develop a decision system capable of advising OPEC policy makers with predicted values of CO 2 emissions.