An improved gray prediction model for China’s beef consumption forecasting

To balance the supply and demand in China's beef market, beef consumption must be scientifically and effectively forecasted. Beef consumption is affected by many factors and is characterized by gray uncertainty. Therefore, gray theory can be used to forecast the beef consumption, In this paper, the structural defects and unreasonable parameter design of the traditional gray model are analyzed. Then, a new gray model termed, EGM(1,1,r), is built, and the modeling conditions and error checking methods of EGM(1,1,r) are studied. Then, EGM(1,1,r) is used to simulate and forecast China’s beef consumption. The results show that both the simulation and prediction precisions of the new model are better than those of other gray models. Finally, the new model is used to forecast China’s beef consumption for the period from 2019–2025. The findings will serve as an important reference for the Chinese government in formulating policies to ensure the balance between the supply and demand for Chinese beef.


Introduction
Beef has a rich protein content and low fat content. Moreover, the amino acid composition of beef is much closer to human needs than that of pork and can improve the body's resistance to disease. Hence, beef has gradually become one of the most popular meat products in China [1]. Since China is the most populous country in the world, a small increase in the proportion of beef consumption will lead to a substantial increase in total beef consumption in China. Therefore, scientifically and effectively predicting the level of Chinese beef consumption over the short and long terms and grasping the developmental trend and overall scale of Chinese beef consumption are of great value to the Chinese government. The results can help the Chinese government adjust the supply of domestic beef, formulate beef import and export policies, promote the effective supply of beef, and ensure the balance of supply and demand in the Chinese beef market and the healthy development of the beef industry.
The gray predictive method is one of the most important time series predictive methods, and it has been used in solving uncertainty problems with small data and poor information. The gray forecasting models have been applied in many fields such as agriculture, society, energy, industry, economics, and ocean research. And this predictive method has successfully solved a large number of issues in management, production, and uncertainty research. With the rapid progress of application for gray forecasting method, the traditional gray model (1,1) was improved into a lot of forecasting model types such as GM(1,N)(Gray Model with n variable and one first-order equation), GM(0,N) (Gray Model with n variables and zero-order equation), GM(2,1) (Gray Model with one variable and one two-order equation),DGM(1,1) (Discrete Gray Model with one variable and one first-order equation),the Verhulst model, SAIGM(Self-Adapting Intelligent Gray Model),etc. All of them are trying to solve the problem of the prediction and simulation accuracy [2][3][4][5]. According to the characteristics of small sample data of beef consumption in China, we built EGM(1,1,r)(the even form of the Gray Model with one variable and one first-order equation with the order r of the accumulation generation) model to solve the above problems.

Data characteristics and method selection
The total beef consumption in a country or region is affected by many factors [6][7][8][9][10], such as the total population, the age structure, the income per capita, the relative price of beef, the quality and variety of beef, the consumption habits of the residents, the attitudes and preferences of the consumers, etc. Some of these factors can be quantified, such as the total population and the income per capita, while others are difficult to quantify, such as the consumption habits of residents. The mathematical statistical model is a common prediction model [11][12][13][14]. This model is mainly based on the relationship between the dependent variable (the amount of beef consumption) and the independent variables (population, income per capita, etc.), and the prediction of the dependent variable is realized on the assumption that the change trends of the independent variables are known first. This assumption creates great uncertainty in the accurate prediction of dependent variables because it is difficult to know the exact data for some independent variables [15][16][17]. Because the factors affecting the total beef consumption are complex and some indexes are difficult to quantify, it is difficult to forecast beef consumption using the traditional causality prediction model [18][19][20][21][22]. On the other hand, the statistical data about China's beef consumption is limited. China proposed the reform target of creating a 'socialist market economy' in 1991 and has gradually achieved the transformation from a planned economy to a market economy. This transformation stage has been called 'the transition period of China's market economy' by the United States. In 2001, China became a 'market economy' country, based on China joining the WTO (World Trade Organization). Before China formally joined in the WTO (i.e., before 2001), beef consumption in China had the typical consumption pattern in a planned economy guided by the central government. In other words, at that time the output and consumption of beef in China were controlled from the top down by government planning. The government decided important factors such as the beef production yields and the allocated uses each year. Hence, the data obtained before 1991 have little referential value for studying the beef consumption in the current market economy environment.
Therefore, the statistical data concerning beef consumption from 1991 to 2018 serves as the modeling sample in this study. Meanwhile, in order to verify the model's predictive performance, the real data for 2016-2018 will be used as the benchmark data for evaluating the model's performance. Thus, only 25 data points, i.e., those for 1991-2015, are available for use in building the model, with the resulting characteristic of a 'small sample'. Hence, models with large samples, such as the ARIMA(autoregressive integrated moving average model), RBFNN (radial basis function neural networks), ELM(extreme learning machine) and SVM(support vector machine) [23][24][25][26][27][28][29] are not suitable. Because the stability of 25 data points cannot guarantee that it can continue along the existing state "inertia" in the future, the mean and variance of the data series will change obviously when the data points are less than 30. Moreover, the residuals of the 25 data are far from each other and not in the same line, so it is meaningless to test them. For example, no matter how to test the 25 samples, the p value is greater than 0.05. The gray prediction model is a commonly used method to study uncertain system prediction problems with small samples [30,5]. The method considers the fact that the evolution of a system is affected by many complex factors (the factors are uncertain, hence the name 'Gray Factor'), in which any tiny change in a factor may have a great influence on the variation of the dependent variable (the butterfly effect), and in which the change trend of the dependent variable is the result of the influence of many complex factors [4]. In other words, the dependent variable itself already contains the effect of many factors, and it is formally expressed as a certain number. (When the result is certain, it is referred to as a 'White Result'.) Therefore, we can excavate the evolutionary trend of the dependent variable according to the variations of the dependent variables and then realize the prediction of the future developmental trend of the system [31].The total consumption of beef is affected by many factors. The existing statistical analysis for factors is the method of correlation and regressions usual, but the premise is that the size of sample is large enough and the sample has to fit typical distribution, its calculation process is relatively complicated. It is very difficult for modeling when the relationship between independent variable and dependent variable is nonlinear. Although Markov model needs few amount of data, its calculation accuracy is low and storage complexity is high. Grey prediction models which is based on GM(1, 1), and DGM(1, 1) model build exponential function to realize predicting mainly through the generation accumulating of raw data sequence, but its accuracy of simulation and prediction is not ideal. Because index form has the monotonicity, its change rule does not conform to the oscillation characteristics of the original sequence. Although a self-adapting intelligent gray prediction model (SAIGM) can automatically optimize parameters and select a reasonable model structure to adapt to the real data characteristics of the modeling sequence, the mean relative simulation percentage error (MRSPE) of SAIGM model still high sometimes.
In this paper, the gray system model is used to simulate the Chinese beef consumption, which has the characteristics of a 'Gray Factor White Result'. However, although the gray prediction model has made great progress in its modeling mechanism and performance optimization since the 1980s [3,32] and some new practical gray system models have been developed [33][34], there are still some problems with the structural and parameter optimization of the traditional gray prediction model [35][36][37][38][39][40]. Therefore, an improved gray model termed, EGM (1,1,r), is built, and the modeling conditions and error checking methods of EGM(1,1,r) are studied. Then, EGM(1,1,r) is used to simulate and forecast China's beef consumption.

Paper structure and tables of notation
Following the sequence of proposing problems, improving the methods, solving the problems and analyzing the conclusions, the paper is organized as follows. In Section2, we introduce the definition of the classic gray EGM(1,1) model (as shown in Table 1 the same as bellow)and then analyze its structural defects and the unreasonable design of the accumulating order. In Section 3, we propose an improved new gray prediction model, EGM(1,1, r) (as shown in Table 1 the same as bellow), introduce the parameter estimation and optimization method, and deduce the time response function of EGM(1,1, r). In Section 4, we introduce the modeling conditions and error testing methods of the new EGM(1,1, r) model. In Section 5, we mainly use the EGM(1,1, r) model to simulate and forecast the total Chinese beef consumption. The main contents include the modeling condition test, the parameter calculations and the optimization of the EGM(1,1,r) model; the simulation performance comparisons between EGM(1,1, r) and other gray models;, and the prediction analysis of the total Chinese beef consumption. In Section 6, we summarize our conclusions and introduce the research work to be carried out in future research.
To have a full understand of notations in this paper, we draw two tables of notation as follows. Table 2,the same as bellow) the following:

EGM(1,1)model and its imperfections
In addition Z (1) is the mean generated sequence of consecutive neighbors of X (1) as follows: The mean sequence generated by consecutive neighbors of X (1) 4 The simulation time sequence of X (0) then the least squares estimate sequence of the gray differential equation (1) andâ be the same as in Definition 1 and Definition 2.Then, thefollowingdifferential equation: (1) andâ be the same as in Definition 1 and Definition 2.Then, the time response sequence of the EGM (1, 1) model is as follows: In addition, the time response formula ofx ð0Þ ðkÞ can be given by the following: x ð0Þ ðkÞ ¼x ð1Þ ðkÞ Àx ð1Þ ðk À 1Þ ¼ ð1 À e a Þx ð0Þ ð1Þ À b a The accumulation generation is a vital step when building a gray prediction model. The size of the order of the accumulation generation is an important parameter that influences the simulation and prediction performances of gray prediction models. From the modeling process, the order of the accumulation generation for the EGM(1,1)model is fixed at '1' (i.e., 1-AGO). Essentially, this is a simplification. Actually, the size of the order of the accumulation generation should satisfy the least mean relative simulation errors of the EGM(1,1)model. Then, the order is not always '1', and may be a fraction or other integer. In recent years, study findings about the order of gray prediction models have appeared. In this paper, particle swarm optimization (PSO) is used to optimize the order of the EGM(1,1)model and resolve the imperfection of the order being fixed at '1'.
The parameters a and b of the EGM(1,1)model are estimated using the gray differential equation. However, a and b act as the parameters of the whitenization equation of the EGM (1,1)model (i.e., Equation (6)). The 'misplaced replacement' of the model parameters because of the conversion from the gray difference equation to the gray differential equation is the root cause of the poor performance of the EGM(1,1)model. In order to ensure the consistency of the model parameters, we directly estimate parameters a and b according to the gray differential equation of the EGM(1,1)model. EGM(1,1) model, EGM(1,1,r) In this section, the newly improved EGM(1,1) model with the optional order of the gray accumulation generation is established, and the time response formula ofx ð0Þ ðkÞ is directly deduced from the gray differential equation. After this, the above two operations resolve the two imperfections of the traditional EGM(1,1)model that was introduced in Section 2. The new EGM (1,1) model is abbreviated as EGM (1,1,r).

And when
Rewriting Equation above, we obtain the time response sequence of the EGM (1,1,r) model as follows: According to the property of the fractional order accumulation generation operator, the simulation sequence ofx ð0Þ ðkÞ can be given by the following: That is, This completes the proof.
In this paper, we use PSO to optimize the order of the EGM(1,1,r) model, and the optimization order of the EGM(1,1,r) model is sought under the condition of the least mean relative simulative errors, as follows: The detailed search process for the optimization order for the EGM(1,1,r) model can be found in [5].

Modeling the conditions and error test method of EGM(1,1,r)
Definition 6. [30] Assume that X (0) = (x (0) (1),x (0) (2),� � �,x (0) (n)), x (0) (k)�0,and k = 1,2,� � �n. Then, the following is referred to as the smoothness ratio of sequence X: The concept of the smoothness ratio reflects the smoothness of a sequence from a special angle. In particular, it uses the ratio ρ(k) of the k-th data value x(k) to the sum X kÀ 1 i¼1 xðiÞ of the previous values to check whether or not the changes in the data points of X are stable. The more stable that the changes of the data points in sequence X are, the smaller the ratio ρ(k) is. Definition 7.Let ρ(k) be the same as in Definition 6.A sequence X (0) = (x (0) (1),x (0) (2),� � �, x (0) (n)), where x (0) (k)�0 and k = 1,2,� � �n, is referred to as a quasi-smooth sequence if it satisfies the following conditions: Quasi-smooth conditions are very important criteria for determining whether a sequence can be used to build a gray model.
After applying the accumulation operator a few times, the general nonnegative quasismooth sequence will show a pattern of exponential growth with decreased randomness. The smoother that the original sequence is, the more obvious an exponential growth pattern in the first order accumulation generated sequence will appear. Definition 8. Let X (0) = (x (0) (1),x (0) (2),� � �,x (0) (n)) be an original data sequence, where x (0) (k)�0 and k = 1,2,� � �,n, and letX ð0Þ ¼ ðx ð0Þ ð1Þ;x ð0Þ ð2Þ; � � � ;x ð0Þ ðnÞÞ be the simulation data sequence of X (0) with the EGM(1,1,r)model. Then, the error sequence of X (0) is as follows: The relative simulation percentage error (RSPE) of the simulation sequenceX ð0Þ is as follows: The mean relative simulation percentage error (MRSPE) of the simulation sequence is as follows: DðuÞ For a giventhreshold value α (the threshold is set according to the specific situation of the system), when � D < a holds true, the gray model is said to be error-satisfactory.

Forecasting China's beef consumption during 2019-2025
In this section, the EGM(1,1,r)model is employed to study the total beef consumption in China. The detailed modeling process of EGM(1,1,r) is introduced in Subsection 5.1.The simulation and prediction performances of EGM(1,1,r) are analyzed and compared with other models in Subsection 5.2. Finally, we use the EGM(1,1,r)model to forecast the total beef consumption in China from 2019 to 2025, and the predicted data are given in Subsection 5.3.To have a full understand of the research methods and improved model in this paper, we draw a flowchart as follows. (Fig 1)  Fig 1. The flowchart of the EGM(1,1, r) model.

(1) Checking the quasi-smooth conditions before modeling
According to Definition 6, a sequence can be used to build the EGM(1,1,r) model only when it meets the quasi-smooth conditions. The total beef consumption in China from 1991 to 2015 is shown in Table 3.
The ratios ρ(k) and λ(k) are computed according to Definitions 6-7, and the results are shown in Table 4, as follows.

(2) Computing and optimizing parameters a, b and r
We use MATLAB to write the calculation program of the EGM (1,1,r)

Comparing and analyzing the performances of the models
In this subsection, we detail the use of the EGM(1,1,r) model to simulate the total beef consumption in China during the period from 1991-2015. To verify the performance of EGM(1,1, r), we compare the MRSPE of EGM(1,1,r) to that of the most commonly implemented gray models, including the GM(1,1), GM(1,1) and SAIGM models. The simulated valuesx ð0Þ ðkÞ, the residual values ε(k), the RSPE Δ k , and the MRSPE of the four models are presented in Table 5.
It can be seen from Table 5 that the MRSPE of the EGM(1,1,r) model is only 5.35%, which is the lowest among the four models, and the second lowest MRSPE is that of the SAIGM model, at 5.82%. The performance of GM(1,1) is similar to that of DGM(1,1), and the MRSPE (12.26%) of GM(1,1) is more than two times that of EGM (1,1,r). Hence, this shows that the performance of EGM(1,1,r) is the best among the four models. To better show the performance differences among them, we draw four scatter line figures regarding the actual data and simulated data from Table 5 for the above four models, as follows.
From Figs 2-5, it can be seen that the simulation curve of the EGM(1,1,r) model is the closest to that of the actual data among the four models, which again illustrates the fact that EGM  An improved gray prediction model for China's beef consumption forecasting (1,1,r) has the best simulation performance. However, a prediction model cannot be judged as being good or bad based only on the MRSPE, since a good simulation performance does not necessarily mean that its predictive performance is also good. Hence, it is necessary to verify the predictive precision of a model before applying it to forecasting future data.  From the above calculation results, we see that the prediction error of the EGM(1,1,r) model is the smallest among the four models. The second smallest is that of DGM(1,1), which is slightly better than that of GM(1,1); and the SAIGM model is the worst. From Table 5, the simulation performance of SAIGM is second only to EGM(1,1,r) and far superior to those of DGM(1,1) and GM(1,1). However, the prediction performance of SAIGM is the worst among the four models. It shows that having a good simulation performance cannot guarantee that a model has a good prediction performance.

Forecasting the total consumption of beef in China
Synthesizing the above conclusions, we see that both the simulation and prediction errors of EGM(1,1,r) are the smallest among the four models. Therefore we apply this new model to forecasting China's total beef consumption. The results are shown in Table 6, as follows. From Table 6, we see that the total beef consumption in China is predicted to increase to 857.04 ten thousand tons in 2025, which is more than six times the consumption in 1991; and the annual average growth rate will be approximately 15.82%. However, for the past ten years, the growth rate has been approximately 4% and has been smooth on the whole, although the total beef consumption in China is very large. China will import more than 1.0 million tons of beef next year, based on actual domestic beef production and predicted beef consumption of proposed model. Beef imports account for about 15 percent of total beef consumption in China. Therefore, according to the above prediction results and analysis, the Chinese government can formulate relevant policies and measures in order to ensure the balance between the supply and demand with respect to Chinese beef consumption.

Conclusions
Scientifically and effectively forecasting the total beef consumption in China is of great significance for promoting the effective supply of Chinese beef. However, there are many factors that affect beef consumption in China, and they show the typical characteristics of 'Gray Factor White Result'. Hence, it is difficult for the traditional mathematical statistical model to effectively simulate and predict Chinese beef consumption and compared with GM(1,1),DGM (1,1),SAIGM models. The MRSPE of the EGM(1,1,r) model is only 5.35%, which is the lowest among the four models, and the second lowest MRSPE is that of the SAIGM model, at 5.82%. The performance of GM(1,1) is similar to that of DGM(1,1), and the MRSPE(12.26%) of GM (1,1) is more than two times that of EGM(1,1,r). Moreover, the prediction error of the EGM (1,1,r) model is only 4.21%, which is the smallest among the four models. The second smallest is that of DGM(1,1), which is slightly better than that of GM(1,1); and the SAIGM model is the worst. So a good simulation performance does not necessarily mean that its predictive performance is also good. It is necessary to verify the predictive precision of a model before applying it to forecasting future data. To this end, an improved gray system model was used to simulate and predict Chinese beef consumption. The results showed that the new model is superior to other gray forecasting models in both simulation and prediction performance. Finally, the EGM(1,1,r) model was used to predict the total beef consumption in China for the period of 2019-2025. The results showed that the total beef consumption in China will keep growing for a long time. By 2025, the total beef consumption in China is predicted to reach 857.04 ten thousand tons, which is more than 6 times the total Chinese beef consumption in 1991. Exploring the influencing factors and trend prediction of beef prices in China is the next research objective of the project team.