Combined Multivariate Statistical Techniques, Water Pollution Index (WPI) and Daniel Trend Test Methods to Evaluate Temporal and Spatial Variations and Trends of Water Quality at Shanchong River in the Northwest Basin of Lake Fuxian, China

Understanding spatial and temporal variations in river water quality and quantitatively evaluating the trend of changes are important in order to study and efficiently manage water resources. In this study, an analysis of Water Pollution Index (WPI), Daniel Trend Test, Cluster Analysis and Discriminant Analysis are applied as an integrated approach to quantitatively explore the spatial and temporal variations and the latent sources of water pollution in the Shanchong River basin, Northwest Basin of Lake Fuxian, China. We group all field surveys into 2 clusters (dry season and rainy season). Moreover, 14 sampling sites have been grouped into 3 clusters for the rainy season (highly polluted, moderately polluted and less polluted sites) and 2 clusters for the dry season (highly polluted and less polluted sites) based on their similarities and the level of pollution during the two seasons. The results show that the main trend of pollution was aggravated during the transition from the dry to the rainy season. The Water Pollution Index of Total Nitrogen is the highest of all pollution parameters, whereas the Chemical Oxygen Demand (Chromium) is the lowest. Our results also show that the main sources of pollution are farming activities alongside the Shanchong River, soil erosion and fish culture at Shanchong River reservoir area and domestic sewage from scattered rural residential area. Our results suggest that strategies to prevent water pollutionat the Shanchong River basin need to focus on non-point pollution control by employing appropriate fertilizer formulas in farming, and take the measures of soil and water conservation at Shanchong reservoir area, and purifying sewage from scattered villages.


Introduction
The demand of freshwater and the deterioration of water quality have both rapidly increased in China [1][2][3][4][5]. As the largest deep fresh lake and the important freshwater resources in China, Lake Fuxian is very important in Yunnan province, even in China. The water quality of this lake is better than the drinking water sources that corresponds to Class II of China National Water Quality Standard (Table 1) [6,7]. The recharge water of Lake Fuxian mainly comes from the inflow rivers. However, most of the inflow rivers are polluted seriously. The water quality of these rivers is worse than landscape water that correspond to Class V of the China National Water Quality Standard and the main parameters which exceed the standard are Total Nitrogen (TN) and Total Phosphorus (TP), [8,9] which are the main factors of eutrophication [10][11][12][13]. Improving the water quality of the inflow rivers is therefore essential for protecting the water quality of Lake Fuxian. About 1.4 billion-RMB (US$225-million) have been spent on protecting Lake Fuxian from pollution during the past 17 years [14], but the water quality of inflow rivers was still worse than class Ⅴ. The local government has starting a three-years (2014-2016) and 10.5-billion-RMB (US$1.686-billion) program for environmental conservation of Lake Fuxian. The focus is on the comprehensive improvement of water environment of inflow rivers, the construction wetland and the non-point source pollution control.It will help to take the proper pollution control action and avoid funds waste if we have the information about main pollution source and the spatial-temporal regulation of contamination occurred. [1,3,[15][16][17][18][19]. Few studies have investigated the pollution of the inflow rivers of the Lake Fuxian. [8,9,20] Furthermore, the spatial-temporal variations and trends of river water quality have not been fully explored in the literature for the inflow rivers of the Lake Fuxian. [20] A Daniel Trend Test provides a quantitative evaluation of the changing trend. It makes the change easy to understand [20][21][22][23][24][25][26]. One of the main contributions of this paper is to investigate both temporal and linear spatial variation of water pollution at the Shanchong river basin by using the Daniel Trend Test method. The Water Pollution Index (WPI) is recommended by Ministry of Environmental Protection of the People's Republic of China to evaluate the water pollution [27]. We use the WPI to compare the pollution level between different pollution parameters and reveal the main pollution parameters in this study [20,25,27]. Cluster Analysis and Discriminant Analysis are multivariate techniques which are normally used to group objects into classes consisting of similar features. Meanwhile, these techniques have been widely used in river water quality assessment [15,[28][29][30][31][32][33][34][35][36].
In this study, we apply different statistical techniques to a data matrix obtained from March to October in 2013 to extract information about the spatio-temporal variations and trends in water quality parameters. The information is quantitatively evaluated by Daniel trend test. The main pollution parameter is found through WPI method. The pollution source is analyzed,

Ethics statement
No specific permits were required for the described field studies and our field studies did not involve endangered or protected species.

Study area
The

Data sources
7 field surveys (March, April, May, Jun, July, September and October, 2013) were implemented at 14 sampling sites, of which the first survey in March only include 12 sites due to the lack of water in the river at sampling sites 5# and 6#. For the same reason there were 9, 13, 11, 10, 13 and 13 sampling sites in the April, May, Jun, July, September and October survey, respectively. Criteria of sampling sites setting, there is eight sampling sites (5#-12#) staying in the Shanchong river from upstream to downstream and there are about 500 m between every two sites, while appropriate adjusting the position of some sites by considering the sampling convenience. One (1#) sampling site stays in upstream of the Shanchong river reservoir, three sampling sites (2#-4#) stay around the Shanchong river reservoir and another two (13# and 14#) stay in the tributary of the Shanchong river. In the upstream of 13# site there are a village and farmland and which is very close to the village, about 20 m. Moreover, in the upstream of 14# site there is just farmland. The water samples were collected at a depth of 10 cm, placed into plastic bottles (2.5 L), transported to the laboratory and stored at 0-4°C for subsequent chemical analysis. The chemical measurements were performed in the laboratory within 24 hours after the collection of the water samples. There are 4 water quality parameters -Total Nitrogen (TN), Total Phosphorus (TP), Chemical Oxygen Demand (Chromium)(COD cr ) and Ammonia Nitrogen (NH 3 -N). They have been monitored by using standard methods ( Table 1)[39]. The China National Water Quality Standard (CNWQS) [7] and thebasic statistical information and monitoring method of the water quality parameters in the Shanchong river basin are summarized in Table 1.

Data analytical methods
Water Pollution Index. The calculation of WPI is based on the water quality standard (Class I-V), for example: the water quality standard class V of TP is 0.4mg/L, and class V of TN is 2.0mg/L, if the concentration of TP is 0.4 mg/L, the WPI of TP is 100, and if the concentration of TN is 2 mg/L, the WPI of TN is 100 as well. So, the WPI can use to compare the pollution level between different pollution parameters and reveal the main pollution parameters [20,25,27].
In this study, all raw concentration data of pollution parameters from the monitoring are calculated and transformed to the Water Pollution Index (WPI) that best explains the main pollution factor and allows for easy comparison. This is done according to the Technical Guideline for Surface Water Environmental Quality Assessment(HJ422-2008) [27], and defined as: where C (i) is the monitoring concentration of water quality parameter i; C l (i) is the concentration of prescribed minimum in the category to which water quality parameter i belongs; C h (i) is the concentration of prescribed maximum in the category to which water quality parameter i belongs; WPI l (i) is the index of the prescribed minimum in the category to which water quality parameter i belongs; WPI h (i) is the index of the prescribed maximum in the category to which water quality parameter i belongs; and WPI (i) is the index of water quality parameter i.
When the monitoring concentration of water quality parameter is beyond Class Ⅴ(GB3838-2002)(7), the WPI computer equation is where C 5 (i) is the upper limit of Class V (GB3838-2002) [7] of water quality parameter i. The WPI is equal to 20 when the monitoring concentration of water quality parameter belong to Class I (GB3838-2002) [7].
Daniel Trend Test. The Daniel Trend Test method is usually used in analysing temporal variation trends [20][21][22][23][24][25][26]. In fact, linear spatial series (such as upstream to downstream of a river) are similar to temporal series in terms of that from the past to the present time is a linear sequence as well. However, just one study has used Daniel Trend Test to explore spatial variation trend [20].
According to the Technical Guideline for Surface Water Environmental Quality Assessment (HJ422-2008) [27], the quantitative evaluation of the variation tendency of temporal and spatial of water quality is done by means of Daniel Trend Test. This method belongs to the class of non-parametric tests and adopts Spearman rank correlation coefficient to inspect the significance of changes in trend. We use the raw concentration data at Daniel Trend Test calculation. More specifically, where R s is the rank correlation coefficient, D i is the difference between X i and Y i . X i is the order number (from small to big) of the raw concentration value of water sample from 1 to N., Y i is the order number arranged by time sequence number or the spatial arrangement of the serial number, and N is the number of sampling sites or the number of periods. The trend is aggravated or increased if the R s is greater than zero, and R s is less than zero if the trend is the opposite. Furthermore, the trend is significant if the R s is beyond the W p that is the critical value of R s ( Table 2 and Table 3).
Multivariate Statistics Method. Hierarchical Cluster Analysis is a widely used cluster analysis method [30][31][32][33], and we employed this approach by means of the Ward 's method using Squared Euclidean distance as a measure of similarity in the present study. Discriminant Analysis is performed on each data matrix using stepwise method in constructing discriminant functions to evaluate both the spatial and temporal variations in river water quality of the basin. Spatial-temporal clustering analysis was be done in our study.
Usually, the raw data used for Cluster Analysis should be transformed in order to eliminate the influence of variable dimension. But there is no need to transform the data if the WPI data are used in the analysis instead of the raw concentration data, because WPI has no difference in dimension and allow for comparison among of different water quality parameters.The data used to Temporal Hierarchical Cluster Analysis are mean value of WPI of all water samples at each field survey, and the data used to Spatial Hierarchical Cluster Analysis are mean value of WPI of each cluster generated by Temporal Hierarchical Cluster Analysis.
All the statistical analyses are performed using SPSS 18.0 for windows and Excel 2010 for windows. Latitude and longitude were measured using a hand-held GPS (MAGELLAN eXplorist 500). The Shanchong river basin boundaries were delineated on a 30-m spatial resolution digital elevation model (DEM); the stream network was represented using the Soil and Water Assessment Tool (SWAT) model at the ArcGIS 9.3 Desktop GIS software.

Results and Discussion
Temporal variations trend Fig. 2 shows the temporal variation of WPI in each sampling site. A key finding is that the WPI of TN is almost the highest of all sites in all field surveys. It indicates that nitrogen pollution is more serious than phosphorus pollution at Shanchong river basin. The reason may be the soil phosphorus content, the excess fertilization and proportion of nitrogen fertilizer and phosphorus fertilizer used at our research area. The research from B.Lars, et al showed that the leaching of nitrate increased sharply when the use of nitrogen fertilizer exceed 100 kg/ha [40]. There are about 190 kg/ha nitrogen fertilizer applied at Yuxi district where Shanchong river basin locates [41]. Therefore, leaching of nitrate may increase substantially at our research area. Another long term research showed that leaching of phosphorus linear increased when soil's available phosphorus content exceeds 60 mg/kg soil [42]. Therefore, the leaching of phosphorus was moderate at our research area because the soil's available phosphorus content are between 1.6-40.7mg/kg soil in this area [37]. According to the Yunnan statistical yearbook 2013 [41], the proportion of nitrogen fertilizer and phosphorus fertilizer is 5:1 at Yuxi district [37], this may be another reason that nitrogen pollution excess the phosphorus at our study area.
The Daniel Trend Test shows that the temporal variation of all sampling sites are clear. From Table 2, which shows that the WPI of most monitoring parameters were increased (R s >0) from March to October. However, just a few are decreased (R s <0), which are COD cr (2# and 10#), NH 3 -N(8#),TN(4# and 8#) and TP(3#). It shows that the main trend of pollution is aggravated from the dry to the rainy season. The reason is rainfall in the rainy season accounts for 94.8% of rainfall throughout the year (According to the the dataset of China daily grid precipitation which provide by China Meteorological Data Sharing Service System, http://cdc. cma.gov.cn), and rainfall erosion pollutants containing nitrogen and phosphorus from farmland soil to river.
Hierarchical Cluster Analysis used in temporal variations yields a dendrogram (Fig. 3A), 7 months were grouped into two clusters. Cluster 1 (the first period) includes May, June, July and September, which corresponds to the rainy season. Additionally, cluster 2 (the second period) includes March, April and October, which corresponds to the dry season, with the exception of October. Specifically, most of the pollutants have been washed away by the earlier runoff and reduce contamination in October, although it belongs to the rainy season. Therefore, 7 months are divided into two different clusters (rainy season and dry season). This temporal pattern of water quality actually makes more sense because of the obvious discrimination between the rainy season and dry season, and the water quality is mainly affected by the nonpoint source pollution in the present study area.
Discriminant Analysis is used to further evaluate Temporal variations in water quality. Discriminant Analysis of temporal variation is performed after the whole data set is divided into two seasonal groups (rainy season and dry season). classification matrices (CMs) and Discriminant functions (DFs) obtained from the stepwise method are shown in Table 4. The stepwise mode DFs using 2 discriminant variables, and yield the corresponding CMs assigning 100% of the cases correctly. Thus, the Discriminant Analysis results of temporal variation suggest that TN and TP are the most significant parameters to discriminate between two periods, which means that these two parameters account for most of the expected temporal variations in the river water quality ( Table 4). Box and whisker plots of the all parameters showing temporal variation are given in Fig. 4A. The WPI of TN was higher in the rainy season as compared to the dry season obviously, whereas, TP, NH 3 -N and COD cr have little difference in the present study area.

Spatial variations trend
The spatial variation of WPI in each field survey from upstream to downstream along the river are shown in Fig. 5. The data used to analyse spatial variation of WPI come from 1#-12# sampling sites which were all in the Shanchong river. A behavior similar to that described for temporal variation was observed again, as the WPI of TN was almost the highest after May, whereas the WPI of CODcr was always the lowest compared to other pollution parameters. Indeed, it is not easy to note a clear trend of either increase or decrease in the values of WPI, especially at March and April. The Daniel Trend Test could provide a quantitative evaluation of the change in trend. According to Table 3, the spatial variation of each field survey is clear, in the sence that the result shows that the WPI of most monitor parameters are increased (R s >0) from upstream to downstream. Specifically, just a few are decreased (R s <0) which were TP (March, April and July) and CODcr (March, June, September and October). It indicates that the main spatial trend of pollution is aggravated from upstream to downstream. The reason may be the rainfall erosion pollutants from farmland soil to river, meanwhile, leaching of containment matters to ground water, finally may effect the downstream of river, make pollution aggravated from upstream to downstream of the river. Hierarchical Cluster Analysis is used in spatial variations to detect the similarity of the groups between the sampling sites. We use Spatial Hierarchical Cluster Analysis in the rainy season and dry season, respectively. This analysis yields a Dendrogram (Fig. 3B and Fig. 3C), grouping all 14 sampling sites of the basin into three statistically significant clusters in the rainy season (Fig. 3C) and two statistically significant clusters in the dry season (Fig. 3B). In the dry season, cluster 1 (1#-6# and 8#-11#) corresponds to relatively less polluted sites, of which five stations (1#-5#) are situated at the upstream sites and 6#, 8#-11# are situated at the downstream site of the river. In cluster 1, the group suggests the assimilative capacity and self purification of the river. Additionally, Cluster 2 (12#-14# and 7#) corresponds to relatively highly polluted sites which are situated at middle of the river and receive pollution mostly from domestic wastewater.
In the rainy season, cluster 1 (1#-5#) corresponds to relatively less polluted sites and is situated at the upstream sites where around Shanchong river reservoir with scarcity of the people and farmland. Additionally, cluster 2 (6#-12# and 14#) corresponds to relatively moderately polluted sites and these stations receive pollution from the sources of agricultural activities, which is the typically non-point source. Lastly, cluster 3 (only 13#) corresponds to relatively highly polluted sites and this station receive pollutes from domestic wastewater and non-point sources. Furthermore, we find the sites 10, 11 and 12 which stays close to outlet of Shanchong river have in general higher amounts of TN during all seasons. The reason may be the excess nitrogen fertilization and the non-point source pollution which have been discussed above.
Spatial variations in water quality are further evaluated through Discriminant Analysis. Spatial Discriminant Analysis is performed after dividing the whole data set into two major classes of less polluted and highly polluted sites as obtained through Cluster Analysis of the dry season and three major classes of less polluted, moderately polluted and highly polluted sites as obtained through Cluster Analysis of the rainy season. The stepwise method provides Discriminant functions (DFs) and classification matrices (CMs), which are shown in Table 4. The stepwise mode DFs using 2 discriminant variables (TN and NH 3 -N) yield the corresponding CMs assigning 100% (dry season) and 92.9% (rainy season) of the cases correctly. Furthermore, the spatial Discriminant Analysis results suggest that TN, NH 3 -N are the most significant   Spatio-Temporal Variations at Shanchong River parameters to discriminate different spatial groups both in dry and rainy season, which means that these two parameters account for most of the expected spatial variations in the river water quality ( Table 4).
Box and whisker plots of the all parameters showing spatial variations are given in Fig. 4B and Fig. 4C. The monitoring region has been class to highly polluted and less polluted sites in the dry season (Fig. 4B). Moreover, the WPI of TN, NH 3 -N and TP are obviously higher in highly polluted sites as compared to less polluted sites in the dry season. Furthermore, the WPI of COD cr which has the lowest pollution parameters compare to other parameters and it has little difference between the less polluted and highly polluted sites.
Additionally, all sampling sites have been class to highly polluted, moderately polluted and less polluted sites in the rainy season. A decrease in WPI of TN and NH 3 -N (Fig. 4C) from highly polluted to less polluted sites is observed in the rainy season. The WPI of TP at less polluted sites is between the moderately polluted sites and highly polluted sites. The WPI of Spatio-Temporal Variations at Shanchong River COD cr is slightly different among highly polluted, moderately polluted and less polluted sites. The possible reason is the fish culture inside the Shanchong river reservoir and furthermore, the fragile ecosystem around the Shanchong river reservoir basin which has a lot of eucalyptus and almost without shrubs and herbs around it and made bare in surface. Furthermore, the slope of the hill which around the Shanchong river reservoir is greater than 45°. Finally, these factors aggravate soil erosion. It may be the cause of phosphorus pollution at less polluted sites seriously than moderately polluted sites.

Conclusions
The primary results of this study can be summarized as follows: (1)The Hierarchical Cluster Analysis that is used in temporal variations generates two clusters by their hydrological characteristics (rainy season and dry season). Moreover, the TN and TP are the most significant parameters to discriminate between the two periods.
(2)The Hierarchical Cluster Analysis that is used in spatial variations, groups all 14 sampling sites of the basin into three statistically significant clusters (highly polluted, moderately polluted and less polluted sites) in the rainy season and two statistically significant clusters (highly polluted and less polluted sites) in the dry season. Furthermore, the TN and NH 3 -N are the most significant parameters to discriminate different spatial groups both in the dry and rainy season.
(3)The main trend in pollution is aggravated from the dry to the rainy season and the same spatial trend from upstream to downstream.
(4) The WPI of TN is the highest of all pollution parameters, whereas the CODcr is the lowest. The WPI of TN is obviously higher in the rainy season as compared to the dry season, whereas, the water quality parameters of TP, NH 3 -N and CODcr have little difference between the rainy season and dry season. The possible reason is the excess nitrogen fertilization and non-point source pollution.
(5) The WPI of TN, NH 3 -N and TP are obviously higher in highly polluted sites (situated at middle of river and received pollution mostly from domestic wastewater) as compared to less polluted sites (situated at the upstream around Shanchong river reservoir and downstream nearly steam outlet) in the dry season. Moreover, a decrease in WPI of TN and NH 3 -N from highly polluted sites (13# that situated at middle of river) to less polluted sites (situated at the upstream around Shanchong river reservoir) is observed at the rainy season. The WPI of TP at less polluted sites is between the moderately polluted sites (situated at downstream) and the highly polluted sites. The possible reason was the fish culture and the aggravating soil erosion at less polluted area.
(6)The main pollution factors are non-point source from farming activities along the Shanchong river, soil erosion, and fish culture at Shanchong river reservoir area, and the domestic sewage from rural residential area.
(7)The Daniel Trend Test method is usually used in the analysis of temporal variation trends. Because the linear spatial series (such as upstream to downstream of river) has the same characteristics as the temporal series, in this study we use the Daniel Trend Test to evaluate both temporal and spatial variation trends of water quality of the Shanchong river. It implicates that this method is the proper tool for evaluating the linear spatial series data.
Our results suggest that adopting water pollution prevention strategies of the Shanchong river basin should be in three directions. First, to employ appropriate fertilizer formulas in farming to cut down non-point source pollution. Second, to take the measure of soil and water conservation at Shanchong reservoir area to prevent the deterioration of water quality of the reservoir. Third, to construct scattered sewage treatment system at rural residential area to purification of sewage.