Statistical Determination of Rainfall-Runoff Erosivity Indices for Single Storms in the Chinese Loess Plateau

Correlation analysis is popular in erosion- or earth-related studies, however, few studies compare correlations on a basis of statistical testing, which should be conducted to determine the statistical significance of the observed sample difference. This study aims to statistically determine the erosivity index of single storms, which requires comparison of a large number of dependent correlations between rainfall-runoff factors and soil loss, in the Chinese Loess Plateau. Data observed at four gauging stations and five runoff experimental plots were presented. Based on the Meng’s tests, which is widely used for comparing correlations between a dependent variable and a set of independent variables, two methods were proposed. The first method removes factors that are poorly correlated with soil loss from consideration in a stepwise way, while the second method performs pairwise comparisons that are adjusted using the Bonferroni correction. Among 12 rainfall factors, I 30 (the maximum 30-minute rainfall intensity) has been suggested for use as the rainfall erosivity index, although I 30 is equally correlated with soil loss as factors of I 20, EI 10 (the product of the rainfall kinetic energy, E, and I 10), EI 20 and EI 30 are. Runoff depth (total runoff volume normalized to drainage area) is more correlated with soil loss than all other examined rainfall-runoff factors, including I 30, peak discharge and many combined factors. Moreover, sediment concentrations of major sediment-producing events are independent of all examined rainfall-runoff factors. As a result, introducing additional factors adds little to the prediction accuracy of the single factor of runoff depth. Hence, runoff depth should be the best erosivity index at scales from plots to watersheds. Our findings can facilitate predictions of soil erosion in the Loess Plateau. Our methods provide a valuable tool while determining the predictor among a number of variables in terms of correlations.


Introduction
Rainfall erosivity indicates the potential of a storm to erode soil. A single index of rainfall erosivity that can measure the composite effect of various rainstorm characteristics on soil erosion is highly desirable for predicting soil loss [1,2]. It is well known that soil losses are frequently due to a few intense rainfall events [1,3]. The most common erosivity index for single storms is the EI 30 index (the product of the rainfall kinetic energy, E, and the maximum 30-min intensity, I 30 ), as is used in the Universal Soil Loss Equation (USLE) [4] and in the Revised Universal Soil Loss Equation (RUSLE) [5]. The calculation of EI 30 is of high data requirements and labor intensive [2,6]. For this reason, a large number of studies (e.g. [1,2,6,7]) were devoted to developing a proxy, using more readily available data such as daily, monthly and annual precipitations, for EI 30 and the R factor of the USLE (the mean annual total of EI 30 ).
Besides the EI 30 index, other forms of erosivity index for storm events primarily include the KE > 25 index (the total kinetic energy for rainfall duration with intensity exceeding 25 mm h -1 ) [8], the PI m index (the product of the rainfall amount, P, and the peak rainfall intensity, I m ) [9], the I x E A index (the product of the excess rainfall rate, I x , and the rainfall kinetic energy flux, E A ) [10], and the so-called A index [11]. Many local erosivity indices for single storms have also been used, such as I15 in Belgium [12], EI5 in NE Spain [13], E in Palestinian areas [14] and EI 60 and I60 in Malaysia [15]. In the Chinese Loess Plateau, both Wang [16] and Jia et al. [17] suggested EI 10 as the rainfall erosivity index; however, EI 10 has been shown to be of similar effectiveness as EI 30 [18]. Notably, Chen et al. [19] detected no significant difference among correlations between soil loss and a set of EI t variables (EI 10 , EI 20 , EI 30 , EI 40 , EI 50 and EI 60 ). Furthermore, it was found that EI t did not greatly improve soil loss predictions compared with PI t [19][20][21], which can thus serve as a surrogate for EI 30 .
Both raindrops and runoff are drivers of soil erosion [22]. Runoff factors, mainly runoff volume and peak discharge, have frequently been included into an erosivity index [23][24][25][26][27][28]. The addition of runoff terms can improve the ability of models to predict soil loss or sediment yield especially for small to medium events [25,27,29]. Foster et al. [25] found that the lumped erosivity factors including rainfall amount, rainfall intensity and runoff amount performs better than EI 30 , whereas erosivity factors with separate terms for rainfall and runoff erosivity performs best. However, Foster et al. [25] acknowledged their inability to determine whether the observed improvements were statistically significant or not. In the Modified Universal Soil Loss Equation (MUSLE), the EI 30 index is replaced by a power of the product of runoff volume and peak discharge [24], as is also used in the Agricultural Policy/Environmental eXtender (APEX) model [28]. In another modified version of the USLE called the USLE-M [27], the erosivity index of single events is the product of EI 30 and the runoff ratio. In the Loess Plateau, many studies included both terms of runoff volume and peak discharge, in a lumped or separated form, into models for predicting soil loss or sediment yield [30][31][32]. Nevertheless, it is well known that runoff volume alone can adequately predict sediment yield of the flood event in the Loess Plateau (r 2 > 0.9) [33]. A proportional model of event runoff volume and sediment yield applies well over a wide range of spatial scales from hill slopes to large-sized watersheds [34][35].
To determine the erosivity index, a large number of correlations between rainfall-runoff factors and soil loss often need to be compared (e.g. [14-17, 23, 25, 26, 36-37]). For example, the EI 30 index was established as the rainfall erosivity index of the USLE by comparing correlations of more than 40 factors with soil loss [38][39][40]. As a result of the indelible sampling error, sample correlation coefficients can never be identical to population ones. Because the sample difference does not fully represent the population difference, a statistical test is needed to determine the significance of the observed sample difference. To our knowledge, no studies have applied statistical tests while determining the erosivity index with the exceptions of [12] and [19], although a number of statistical tests for comparing dependent or independent correlations exist [41].
The object of this study is to determine erosivity indices for single storms on a statistical basis using data observed in the Chinese Loess Plateau. After describing the study area and data source, we present two methods that compare a large number of dependent correlations. The two methods build on the Meng's tests [42], which has been widely used to compare correlations in psychological research. We then determine the rainfall-runoff erosivity indices among a large number of factors using the two methods. We finally made some discussions about rainfall and runoff factors with an emphasis on the best erosivity index in the Loess Plateau.

Study Area and Data
The present study uses data observed at 5 runoff experimental plots and 4 gauging stations (Tables 1 and 2) within the Dalihe River watershed (See Fig. 1 in [35] for the location), a secondary-order river of the middle Yellow River. Typical of the Loess Plateau, the loess mantle of the Dalihe watershed is generally thicker than 100 m. The climate is typically semiarid with a mean annual precipitation of 440 mm . Soil erosion is primarily caused by localised short-duration, high-intensity convective rainstorms. A single storm can commonly cause a soil loss of greater than 10 000 t km -2 . Most of the lands were intensively cultivated with little soil conservation practices during the monitoring period (1959)(1960)(1961)(1962)(1963)(1964)(1965)(1966)(1967)(1968)(1969). The terrain is very precipitous and deeply dissected.
All examined plots are located within the Tuanshangou subwatershed (latitude 37°41 0 N, longitude 109°58 0 E; See Fig. 1(c) in [35] for the location), a headwater basin of the Dalihe watershed. The plots were all under arable with crops varying between years, generally including millet, potato, mung bean, clover, sorghum and wheat. The vegetation cover rarely exceeded The station numbers correspond to those given in Fig. 1(b) in [35]. b n is the number of recorded flood events.
doi:10.1371/journal.pone.0117989.t001 25% in the plots. The recorded maximum I 10 is 2.17 mm min -1 (1961)(1962)(1963)(1964)(1965)(1966)(1967)(1968)(1969). Rill erosion is dominant on upland slopes. During the 1960s, the annual erosion intensity, averaging 41 000 t km -2 at Plots 4, 2 and 3 ( Table 2), was maximized in 1966. Although rills only occurred on 5 out of 54 rainfall days, these five days contributed almost all of the annual soil loss (>96%) in 1966. Downslope, the valley side slope is generally very steep (>35°), allowing the emergence of permanent, incised gullies and mass wasting events. Unless stated otherwise, all data used in this study were obtained from the Yellow River Water Conservancy Commission (YRWCC). The YRWCC stream-gauging crews conducted all measurements. Hyetograph data were obtained using a rainfall gauge near Plot 3 (See Fig. 1-(c) in [35] for the location). The observation interval was generally smaller than 10 min, even 1 min in many cases of high rainfall intensity. The field monitoring programs of runoff and sediment have been described in detail in [43,44].
Based on the instantaneous measurement of water discharge and sediment concentration, the runoff depth, h (mm: total runoff volume normalized to drainage area), and the specific sediment yield, SSY (t km −2 ) of single events were calculated. Conventionally, the term "soil erosion" is used for hill slopes, and the term "sediment yield" is used for a river system or watershed. For simplicity, we use SSY to represent both cases hereafter. The event mean sediment concentration, SC e (kg m −3 ) was computed by dividing SSY by h. The maximum instantaneous sediment concentration (SC max , kg m -3 ) was also used to represent the level of the sediment concentration of a single event.
The data we used are on a single storm basis. The runoff factors we examined include 4 factors: h, q max (peak flow discharge normalised for drainage areas, m 3 s -1 km -2 ), hq max (the product of h and q max ) and h+q max (the sum of h and q max ) . The use of h+q max follows [23,30]. The rainfall factors we examined include 12 factors: P (mm), T (rainfall duration, min), I (mean rainfall intensity, mm min -1 ), I 10 (mm min -1 ) , I 20 (mm min -1 ), I 30 (mm min -1 ), EI 10, EI 20 , EI 30 , PI 10, PI 20 and PI 30 . The storm kinetic energy, E (J m -2 ), was calculated as follows: where e r is the rainfall kinetic energy per unit depth of rainfall per unit area (J m -2 mm -1 ), and p r is the depth of rainfall (mm) for the rth interval among m intervals of the storm hyetograph. where i r (mm min -1 ) represents the mean rainfall intensity for the rth interval. After unit conversion, Equation (3) is almost identical to the rainfall intensity-energy equation of the USLE [4]. The discrepancy between the two equations is less than 10% for rainfall intensities from 1 to 40 cm h -1 .

Methodology
Assumed that r 1 and r 2 represent the correlation coefficients of any two rainfall factors with SSY. To compare r 1 and r 2 , Sinzot et al. [12] and Chen et al. [19] used the following statistic: where z 1 ¼ 1 2 ln 1þr 1 1Àr 1 and z 2 ¼ 1 2 ln 1þr 2 1Àr 2 are the Fisher z-transformed values for r 1 and r 2 , N is the sample size. However, Equation (4) is applicable only to independent correlations [46] and cannot be used to compare r 1 and r 2 because they have a common dependent variable, SSY.
Meng's tests [42] are widely applied for comparing correlations between a dependent variable and a set of independent variables. This study used these tests because they take a rather simple and thus easy-to-use form and perform as well as other statistical tests in terms of controlling the Type I error and power [41]. To compare r 1 and r 2 , a Z (standard normal) test (termed "Meng's Z 1 test" in the following section for simplicity) is used [42]: where r x is the correlation between the two rainfall factors under examination, where r 2 ¼ ðr 2 1 þ r 2 2 Þ=2, and f should be set to 1 if the right term of Equation (7) is larger than 1.
If the comparison involves k rainfall factors (k > 2), the statistic (termed "Meng's χ 2 test" hereafter) used to test the heterogeneity of the correlations of the k factors with SSY is as follows [42]: where z i is the Fisher z-transformed correlation coefficient for the ith rainfall factor (i k), and z is the mean of the z i values. In the definition of H given by Equation (5), r 2 becomes the mean of the r 2 i , and r x becomes the median intercorrelation among the factors under testing. The resulting χ 2 statistic is χ 2 distributed on k-1 degrees of freedom.
By comparing a correlation with the average of the k-1 other correlations, Meng et al. [42] also designed a standard Z test (termed "Meng's Z 2 test" hereafter) to determine whether a contrast exists among the k factors under examination: where r λx represents the correlation coefficient between z i and λ i . The values of λ i are the contrast weights assigned to each z i . The sum of the λ i s must be zero. If we wish to determine whether the first factor differs among four factors in terms of their correlations with SSY, for instance, λ i s should be-3, 1, 1 and 1, respectively. Based on Meng's tests, we would use two methods to determine the rainfall-runoff erosivity indices. Method one repeatedly uses Meng's Z 2 test (Equation (8)) to remove factors that are poorly correlated with SSY in a stepwise way. Method two performs all paired comparisons using Meng's Z 1 test (Equation (5)). The Type I error, however, would increase when multiple comparisons are conducted simultaneously. We used Hochberg's Sharpened Bonferroni correction to counteract the problem of multiple comparisons [47,48]. Given a p value resulting from Meng's Z 1 test, the corrected value according to the Hochberg approach is p' = Rp, where R is the rank value in descending order of the given p value among all obtained p values. The p values given below are one-tailed for the Meng's χ 2 test, and two-tailed for all other tests.
Using the two methods above, we would determine the rainfall erosivity index among the 12 rainfall factors and the runoff erosivity index among the 4 runoff factors. We limited our analyses of rainfall factors to the six experimental sites in the Tuanshangou subwatershed (#3 in Table 1 and the five plots in Table 2) due to the lack of reliable rainfall data at larger scale. A total of 222 events were used. Events without detailed hyetograph data were excluded. The analyses of runoff factors involve 379 events observed at all nine experimental sites listed in Tables 1 and 2.

Results
Method One-a stepwise procedure using Meng's Z 2 test Table 3 presents the correlation coefficients between SSY and the 12 rainfall factors we examined. Factors other than T are generally well correlated with SSY (p < 0.01). The obtained correlation coefficients for T, I and P are much smaller than those for I t (I 10, I 20 and I 30 ), EI t (EI 10, EI 20 and EI 30 ) and PI t (PI 10, PI 20 and PI 30 ). The relationship between I 30 and SSY for four of the sites was plotted in Fig. 1.
To determine whether a rainfall factor represents a contrast, we performed 12 Meng's Z 2 tests at each of the six sites in the Tuanshangou subwatershed. The result (Test 1 in Table 4) shows that the correlation coefficients of T, I and P with SSY are significantly smaller (p < 0.005) than the average of factors other than itself at every site. Factors of PI 20 and PI 30 are not contrasts (p > 0.05) at any site. Hence, these five factors (T, I, P, PI 20 and PI 30 ) were excluded as candidates for the erosivity index. Meng's Z 2 tests of the seven remaining factors shows that The four runoff factors of h, q max , h+q max and hq max are all highly correlated with SSY (p < 0.01) at all nine sites, from plots to watersheds ( Table 5). The derived correlation coefficients between h and SSY are either at the maximum or simply slightly smaller than the maximum (generally < 0.01) at each site. Meng's Z 2 tests (Table 6) show that the correlation between h and SSY is significantly higher than the average of the three remaining factors at seven of nine examined sites (p < 0.0006). In contrast, the correlation between q max and SSY is significantly lower than the average of the three remaining factors at eight sites (p < 0.031). Hence, h should be preferred to q max as the predictive factor of SSY. The factor of h+q max is better correlated with SSY than the average of the three remaining factors at four sites (p < 0.04), whereas the factor of hq max is less correlated with SSY than the average of the three remaining factors at six sites (p < 0.02). This demonstrates that the combination of q max with h would impair rather than improve the ability of h to predict SSY.
Method Two-multiple Meng's Z 1 tests adjusted using the Hochberg approach Meng's Z 2 tests used above have clearly demonstrated that T, I and P are inferior to other factors for application as the erosivity index. To reduce complexity, these factors are not considered in this section.
To test the significance of the difference between correlations of the nine rainfall factors of I t , EI t and PI t with SSY, we performed 36 pairwise comparisons using Meng's Z 1 test at each of six sites within the Tuanshangou subwatershed. The resultant p values, together with the p' values after the Bonferroni correction, are presented in Table 7. Fifty-one among the 216 comparisons produced significant differences (p < 0.05) in the absence of the Bonferroni correction, with most (33) involving comparisons between PI t and EI t . When the Bonferroni correction was performed, significant differences remained for only 9 comparisons (p' < 0.04), all of which involved the comparison between PI t and EI t . The results adjusted using the Bonferroni correction demonstrate that EI t is better correlated with SSY than PI t in some cases, and the correlations with SSY was not significantly different among six factors of EI t and I t (p' > 0.11).
To compare the strength of correlations between SSY and the four runoff factors, we performed six Meng's Z 1 tests at each of nine sites within the Dalihe watershed. At seven sites, h is more correlated with SSY than q max (p < 0.001, p' < 0.01; Table 8), regardless of whether the Bonferroni correction was applied. Only at Plot 4 was the obtained correlation coefficient between h and SSY smaller than that between q max and SSY (Table 5). This difference, however, was not statistically significant (p = 0.26, p' = 0.52). A total of 18 comparisons at the nine sites were made between the correlations of h and the combined factors of h and q max (i.e. h+q max and hq max ) with SSY. The Meng's Z 1 test suggested a significant difference for 11 among the 18 comparisons, and almost all (10) remain significant after the Bonferroni correction (Table 8). Among the ten comparisons, h is more correlated with SSY for nine (p < 0.007, p' < 0.007) and less correlated for only one (p < 0.01, p' < 0.01; #12). This again indicates that both q max and its combination with h are inferior to the single factor of h for the SSY predictions.

Rainfall factors
For rainfall factors, the results of two methods slightly differ: Method one suggested that I 20 , I 30 , EI 10, EI 20 and EI 30 are superior to other factors as a predictor of SSY; Method two excessively accepted I 10 as an optimal predictor. This result may be related to the Bonferroni correction, which increases the likelihood of accepting the null hypothesis of identical correlations thereby increasing the risk of committing the type II errors [49].
The Loess Plateau is typically dominated by infiltration excess overland flows, and the runoff yield is determined by rainfall intensity rather than rainfall amount. In the Tuanshangou subwatershed, the median T is about 170 min. In contrast, the runoff duration at Plots 4, 2 and 3, with a median value of approximately 16 min, hardly exceeded 40 min. Rainfall during the low-intensity period is thus of little consequence to runoff yield and thus, to soil erosion. As a result, P is a poor indicator of SSY, as was also reported in [25,40].
The single factor of I 30 , although equally as effective in predicting SSY as factors of I 20 , EI 10, EI 20 and EI 30 , can be preferentially used as the rainfall erosivity index in practices because I 30 is in form simpler than EI t and can be measured somewhat more accurately than I 20 . Wang [16] also noted that the predictive ability of EI t is only marginally higher than that of I t in the Loess Plateau. The calculation of E involves data which are rarely available. Our finding shows that E Rainfall-Runoff Erosivity in the Loess Plateau is not necessarily included into the rainfall erosivity index thereby facilitating the obtainment of rainfall erosivity in the Loess Plateau. Our calculations at six sites within the Tuanshangou subwatershed indicate that I 30 summed over a year can explain 71 to 89% of the variation in yearly soil loss, an accuracy that is Rainfall-Runoff Erosivity in the Loess Plateau comparable to that of the use of the EI 30 index in the USA [40]. Considering the fact that our data are from cropped plots, I 30 can be directly applied to predicting soil loss in cropped areas, as opposed to the USLE, which first predicts erosion for the unit plot (bare fallow areas 22.1 m long on a 9% slope) and then predicts for the area of interest by introducing the topographic factors and the cover and management factors. Nevertheless, the mean of the annual cumulative I 30 promises to act as the R factor of the USLE due to the sparse vegetation cover in the plots.

Runoff factors
For runoff factors, the two methods present the same result: h is not only superior to q max but is also superior to the combined factors of h and q max as a predictor of SSY. Similar to the rainfall case, flow discharge during flood events is primarily concentrated during the high-flow period, especially during the peak-flow period. Consequently, q max correlates well with h (r > 0.77) and in turn, with SSY at every site (r > 0.83, Table 5). However, sediment concentrations at moderate discharges are more or less the same as those at high discharges in the Loess Plateau. Extremely high concentrations even primarily occur at low discharges from plots to watersheds (see Fig. 6 in [44] and Fig. 2 in [34]). This mismatch between sediment concentration and water discharge can be related to hyperconcentrated flows, which are well developed from upland slopes to river channels in the Loess Plateau [50,51]. It is known that no direct relationship exists between sediment concentration and water discharge for hyperconcentrated flows [52]. In stream channels of the Chabagou watershed (#9), SC max generally occur at flow discharges that are approximately 30-50% lower than q max [53]. Hence, contrary to the rainfall case, h is more correlated with SSY than q max .
Runoff factors can provide better SSY predictions than rainfall factors in the Loess Plateau in terms of correlations (See Tables 3 and 5). The obtained correlation coefficients between h and SSY is larger than those between I 30 and SSY at all six sites within the Tuanshangou subwatershed, and five of them being statistically valid (p < 0.02). Interrill erosion is closely related to rainfall factors, whereas rill erosion is primarily dependent on runoff factors [25,54]. The higher correlation of runoff factors with SSY relative to rainfall factors can thus be linked to the dominance of rill erosion and the mass wasting over the interrill erosion in our study area.

The best erosivity index
The rainfall-runoff factors we examined are generally inter-correlated. As a result, it can hardly be expected to improve the prediction accuracy by introducing more factors. SSY equals the product of h and SC e . There is no need to include factors that do not affect SC e into the erosivity index if h has been included. We hereafter examine the correlations between the rainfallrunoff factors and SC e .
Except for T, I and P, nine other rainfall factors correlate well with SC e at all sites (p < 0.01). However, almost all of these correlations become insignificant with only two exceptions when only major sediment-producing events are considered (see Table 3). As in [35], we defined major sediment-producing events as high-concentrated events that accumulatively contribute 90% to the total sediment yield of all examined events. Scatter plots of SC e and I 30 at all sites are generally parallel to the x-axis for major sediment-producing events (Fig. 2), a result contrary to that observed by Kinnel [29] and Chaplot et al. [55]. The same observation holds for scatter plots of SC max and I 30 (Fig. 3).
For major sediment-producing events, SC e and SC max also remain independent of the four runoff factors although there are five exceptions among the 36 derived correlations (Table 5). Fig. 4 and Fig. 5 depict the relationships between SC e and q max and between SC max and q max , respectively.
In terms of the correlation with SSY, the single factor of h is the best erosivity index among factors we examined at scales from plots to watersheds. It is not expected that the prediction accuracy would be further improved by introducing additional rainfall-runoff factors because these factors are ineffective in altering SC e for major sediment-producing events. As was the case of q max , little was added to the prediction accuracy by combining h with I 30 . Among the six sites within the Tuanshangou subwatershed, h+I 30 performs better than h only at one site (p = 0.03; Table 9), and hI 30 is not as good as h at three sits (p < 0.046). When the runoff coefficient (a, given by h divided by P) was introduced, as did in the USLE-M [27], neither aI 30 nor aEI 30 provide better predictions of SSY than h at any site in terms of correlations. Moreover, aI 30 and aEI 30 are less correlated with SSY than h at three and two sites, respectively (p < 0.006; Table 9).

Conclusions
Based on Meng's tests, this study presents two methods to determine the erosivity index among a number of rainfall-runoff factors by comparing their correlations with SSY. The first method involves a stepwise procedure to remove factors that are poorly correlated with SSY. The second method involves multiple comparisons that are adjusted using a Bonferroni correction. It appears that few studies have compared correlations on a statistical basis, not only within the soil erosion community but also within the entire geoscience community. Our methods therefore have wide significance, not only for determining the best predictor, but also in other respects, such as comparing model performance, which is often indexed by the correlation between observed and modeled values.
Using the methods described above, we determined the erosivity indices of rainfall and runoff in a typical Chinese Loess Plateau watershed. Among 12 rainfall factors under examination, Rainfall-Runoff Erosivity in the Loess Plateau I 20 , I 30 , EI 10, EI 20 and EI 30 were found to be the most correlated with SSY at scales from plots to subwatersheds (< 1 km 2 ). We suggested the use of I 30 as the rainfall erosivity index, although it is equally effective as the three remaining factors. The value of I 30 summed over one year is also a good predictor of annual soil loss (r 2 > 0.7).
Runoff factors are more correlated with SSY than rainfall factors almost at all examined sites. Among the four studied runoff factors, h is correlated best with SSY at scales from plots to watersheds. Moreover, the combination of h with other rainfall-runoff factors, including rainfall intensity and peak discharge (as used in the MUSLE [24]), does not show enhanced ability to predict SSY compared with the single factor of h because these factors are of little importance in determining sediment concentration for major sediment-producing events. Introducing the runoff coefficient (as used in the USLE-M [27]) also added little to the prediction accuracy. Hence, we considered the single factor of h as the best erosivity index, although I30 would be useful in many cases considering the difficulty of measuring runoff.