Water quality assessment and source identification of the Shuangji River (China) using multivariate statistical methods

Multivariate statistical techniques, including cluster analysis (CA), discriminant analysis (DA), principal component analysis (PCA) and factor analysis (FA), were used to evaluate temporal and spatial variations in and to interpret large and complex water quality datasets collected from the Shuangji River Basin. The datasets, which contained 19 parameters, were generated during the 2 year (2018–2020) monitoring programme at 14 different sites (3192 observations) along the river. Hierarchical CA was used to divide the twelve months into three periods and the fourteen sampling sites into three groups. Discriminant analysis identified four parameters (CODMn, Cu, As, Se) loading more than 68% correct assignations in temporal analysis, while seven parameters (COD, TP, CODMn, F, LAS, Cu and Cd) to load 93% correct assignations in spatial analysis. The FA/PCA identified six factors that were responsible for explaining the data structure of 68% of the total variance of the dataset, allowing grouping of selected parameters based on common characteristics and assessing the incidence of overall change in each group. This study proposes the necessity and practicality of multivariate statistical techniques for evaluating and interpreting large and complex data sets, with a view to obtaining better information about water quality and the design of monitoring networks to effectively manage water resources.


Introduction
Water is the material basis for the existence of earth creatures, and water resources are the primary condition for maintaining the sustainable development of the earth's ecological environment [1]. With the increasing consumption of water resources, the contradiction between the supply and demand of water resources has intensified, which puts forward greater requirements for the utilization and protection of surface water resources [2].
The surface water quality of a region depends to a large extent on environmental factors (temperature changes, precipitation and soil erosion) and human input (discharge of municipal and industrial wastewater and over-exploitation of water resources) [3]. Among them, the discharge of urban sewage and industrial wastewater is a continuous source of pollution, so effective control of sewage discharge is of great significance to the improvement of water quality [4,5]. Surface water runoff is a seasonal phenomenon that is mainly affected by the climate of the basin [6]. In addition, seasonal changes in precipitation, surface runoff, interflow, groundwater flow, and pumping in and pumping out have a strong influence on the river flow and the subsequent pollutant concentration in the river [7]. Therefore, correct identification of potential sources of surface water quality pollution is the basis and prerequisite for water quality management.
Shuangji River is a polluted river. Its main source of water comes from urban sewage treatment plants and paper-making sewage treatment plants. It not only plays an important role in assimilating or removing urban and industrial wastewater and farmland runoff, but is also the main inland water resources used for household, industrial, and irrigation purposes [8], Therefore, it is necessary to prevent and control river pollution and have reliable water quality information for effective management. Given the spatial and temporal changes in river water chemistry, regular monitoring programmes are needed to reliably estimate water quality [9]. This leads large and complex data matrices composed of a large number of physical and chemical parameters, which are often difficult to interpret, making it challenging to draw meaningful conclusions [10].
Multivariate statistical analysis is a branch developed from classical statistics and is a comprehensive analysis method [11,12]. It can analyze the statistical laws of multiple objects and multiple indicators when they are related to each other, including cluster analysis (CA) [13], discriminant analysis (DA) [14], principal component analysis (PCA) [15] and factor analysis (FA) [16]. Multivariate statistical analysis is a suitable tool for multi-component chemical and physical measurements for meaningful data reduction and interpretation [17]. It is a valuable tool for identifying factors and sources that may affect water systems and cause changes in water quality [18].
In this article, we took the Shuangji River as the research object for the first time, set up 14 main detection points along the river and detected and analyzed 19 physical and chemical parameters in water samples. The detection time lasted for 2 years. Different multivariate statistical techniques were used to analyse the obtained datasets, to analyse the similarity or dissimilarity between monitoring periods or monitoring points, to identify the water quality variables that cause the spatial and temporal changes of river water quality, and to determine the impact of water sources (natural and anthropogenic factors).

Study area
The Shuangji River (N-34˚22 0 -34˚30 0 , E-113˚13 0 -113˚37 0 ), a tributary of the Huai River, originates from the eastern side of Wuzhiling in northwestern Xinmi County and flows through 57 administrative villages in 8 towns in Xinmi County, covering 57 kilometres and controlling the Xinmi River Basin, which has an area of 868 km 2 (Fig 1). The Weishui River (T1), Zhaoyangshui River (T2), Liquan River (T3), Yang River (T4), Ze River (T5), and Wu River (T6) are the main branches of the Shuangji River (Fig 1). The three sides of Xinmi County are located in the eastern part of Henan Province, which is surrounded by mountains on three sides. The terrain is high in the west and low in the east. It is a closed watershed with no external water supply. The average annual rainfall is 660 mm. The Shuangji River is basically free of external runoff recharge, and the main body of the river is affected by domestic sewage and industrial wastewater (Shuangji River is an open river and can be used without any permit. Anyone can study and use the Shuangji River. Therefore, the study of the Shuangji River in this article does not require a permit).

Sampling collection and pretreatment
The design of the sampling network covers the identification of a wide range of key locations, including tributaries and water inputs that have a greater impact on the river [19], and reasonably represent the water quality of the river system (Fig 1). Sites T1, T2, T3, T4, T5 and T6 are the main tributaries entering the Shuangji River. Sites M1, M2, M3, M4, M5, M6, M7 and M8 are the key points on the main bank of the Shuangji River, where site M8 is under municipal control and provides the exit water of Xinmi County. Sites M1-M3 and T1-T3 are closer to the urban area, and the main source of pollution is urban sewage entering the river. Sites M4, M5, T4 and T5 are located in the industrial zone, and the paper industry in their vicinity is relatively developed; as a result, the pollution is mainly due to paper industry wastewater entering the river. Sites M6-M8 are downstream of the river and have no inflow of external water.
The dataset included 19 water quality parameters that were monitored by sampling at 14 monitoring points for 2 years (2018-2020). The factors monitored in this study included pH, dissolved oxygen (DO), chemical oxygen demand (COD), ammonia-nitrogen (NH 3 -N), total phosphorus (TP), chemical oxygen demand (CODMn), fluoride (F), petroleum hydrocarbons (oil), linear alkylbenzene sulfonates (LAS), copper (Cu), zinc (Zn), cadmium (Cd), arsenic (As), mercury (Hg), hexavalent chromium (Cr 6+ ), total cyanides (CN), volatile phenols (VP), sulphide (S) and selenium (Se). When collecting water quality samples, a standard open barrel sampler (1.5 litre capacity) was used to collect water samples. This sampler can collect water samples from different depths of water to ensure the representativeness of the data. Before collecting the water sample, the 2 litre polyethylene plastic bottle was washed with metal-free soap, rinsed several times with distilled water, soaked in 10% nitric acid for 24 hours, and finally rinsed with ultrapure water for sampling. All water samples collected were first stored in an insulated cooler and placed in a refrigerator at 4˚C and sent to the laboratory for analysis on the day the water samples were collected.

Analytical procedure
The water quality parameters, analytical units and analytical methods are summarized in Table 1. The pH value and DO value of each water sample were determined on site using digital pH values (JY-PH6.0) and DO measuring instruments (YT-RJY). Water samples of approximately 1000 mL were taken at each sampling point in the field and filtered through a polycarbonate filter (0.45 μm pore size). The pretreatment of the sample was divided into two parts. One part of the sample was used for physical and chemical parameters and anion analysis and was directly tested, while the other part was first treated with 2 mL of concentrated HNO 3 before being subjected to metal analysis. All samples were analysed within 48 hours. The COD was measured by the dichromate reflux method (DH310C1COD) [20], and NH 3 -N was measured with Nessler's reagent (NH 3 N-1040) [21]. The TP was measured by ammonium molybdate spectrophotometry (HM-812) [22], and the CODMn was measured by the permanganate index method (Thermo Scientific 3131) [23]. Fluoride (F) was measured using an ion-selective electrode (BHF5300) [24], and the total cyanides were analysed using pyridine barbituric acid spectrophotometry (TCN-508) [25], while sulphide (S) was determined using methylene blue spectrophotometry (ST201A) [26]. Petroleum hydrocarbons (Oil) were measured using infrared spectrophotometry (GC1290) [27]. The linear alkylbenzene sulfonates (LAS) were measured using methylene blue spectrophotometry (UltiMate3000) [28], and volatile phenols (VP) were measured using spectrophotometric determination with 4-amino-antipyrin (BELL) [29].
The main cation was determined by subjecting the acid-treated water samples to a 20-fold dilution with ultrapure water. For the trace elements and toxic elements, the volume of the water samples was reduced by a factor of four at 60˚C on an electric hot plate. Cu and Zn were determined by a flame atomic absorption spectrometer (FAAS) using an ethane-air flame (CAAM-2001N) [30], while Hg was determined by cold-vapor atomic absorption spectrometry (CVAAS) (Ultima Expert) [31]. Cd and Cr were measured using an electrothermal atomic absorption spectrometer (ETAAS) (Avio 200) [32], while As and Se were analysed using the hydride generation method (HGAAS) (AA-6033C) [33]. The accuracy of the analytical data were ensured by triplicate samples, blank test controls and careful standardization. The ion balance of each sample was within ±5%.

Data treatment and multivariate statistical methods
Although water sampling was conducted every month at all sites, due to the impact of the COVID-19 pandemic and bad weather, some points could not be sampled, and the missing data were replaced by the average value. The basic statistics of the two-year water quality

PLOS ONE
Water quality assessment of the Shuangji River (China) dataset (3192 observations) are shown in Table 2. The data for multivariate statistical analysis usually conform to a normal distribution; therefore, before conducting the multivariate statistical analysis, each variable was tested for conformity to the normal distribution by analysing the skewness and kurtosis statistics. The test results showed that all factors were in line with or close to the normal distribution. The ranges of skewness and kurtosis were-0.45 to 0.91 and -0.97 to 0.53, respectively. For CA and PCA, taking into account the differences in the magnitude and measurement units of different water quality indicators, all selected parameters were also z-scale normalized with mean = 1 and variance = 0.
In this study, all data were analysed through a variety of multivariate statistical analysis techniques to explore the parameters that caused changes in water quality at different temporal and spatial scales [34]. For effective pollution control and water management, a large amount of water quality data needs to be explained. Controlling river pollution and mastering reliable water quality information are necessary for effective management. Multivariate analysis of river water quality datasets by CA, DA, PCA and FA, CA, PCA, and FA were applied to experimental data and normalized by z-scale conversion to avoid misclassification due to large differences in data dimensions, and DA was applied to the original data. All mathematical and statistical calculations were performed using Excel 2010, IBM SPSS Statistics 26.0 and Statistica 12 [35][36][37].

Cluster analysis
Cluster analysis (CA) is a multivariate statistical method for classifying objects according to their distance or proximity [38]. The system objects can be classified into categories or clusters based on the similarity or difference of their objects [39]. The hierarchical CA method adopted in this paper is the most widely used clustering method. This method clusters the closest or most similar objects into clusters through successive aggregation and finally groups these clusters into larger clusters. The Euclidean distance usually indicates whether two samples are similar, and the "distance" can be expressed by the "difference" between the analysis values of the two samples [40]. In this study, the Ward method was used with the squared Euclidean distance as a measure of similarity, and a hierarchical aggregate CA was performed on a normalized dataset. The distance between clusters was determined using analysis of variance, and the sum of squares of the two clusters generated in each step was minimized. CA analyses river water quality datasets to group spatial and temporal variability by similarities, thereby creating a spatiotemporal tree among samples. The dendrogram provides a visual summary of the clustering process, showing an image of each group and those in its vicinity, while the dimensions of the original data are greatly reduced. The link distance is reported as Dlink/Dmax, which represents the quotient of the link distance divided by the maximum distance and multiplied by 100 in a specific case to standardize the link distance on the y-axis. The standardized data were clustered by the Ward method and square Euclidean distance [41].

Discriminant analysis
Discriminant analysis (DA) is used to analyse the difference between two or more naturally occurring groups [42]. It can establish a discriminant function when the previous class is known and assign observations to known groups. If the DA is valid for a set of data, a correctly and incorrectly estimated classification table will produce a high correct percentage. DA distinguishes between two or more naturally occurring groups by quantitative attributes and aims to provide a statistical classification of samples, which can be performed by CA. The DA technique establishes a discriminant function for each group, which operates based on the original data [43], as shown below: Where i is the number of groups (G), k i is the constant inherent to each group, n is the number of parameters used to classify a set of data into a given group, and w j is the weight coefficient that is assigned by DA to a given selected parameter (p j ). In this study, three groups of temporal (three seasons) and spatial (three sampling areas) evaluations were selected, and the analysis parameters used to assign the measurement of one monitoring point to one group (season or monitoring area) were taken as n. The discriminant analysis of the original data was run in standard mode, forward stepwise mode and backward stepwise mode to construct discriminant functions to evaluate the temporal and spatial changes in river water quality. The site (spatial) and season (temporal) are group-dependent variables, and all measurement parameters are independent variables.  T1  M2  T2  T3  M3  M4  T4  T5  M5  M6  T6  M7

PLOS ONE
Water quality assessment of the Shuangji River (China)

Principal component analysis (PCA)/factor analysis (FA)
Principal component analysis (PCA) is actually a dimensionality reduction method. Its main purpose is to use fewer variables to explain most of the variation in the original data and to convert many highly correlated variables into independent or unrelated variables [44]. Usually, new variables that are fewer in number than the original variables and that can explain the variation in most of the data, the so-called principal components, are used to explain the comprehensive index of the data. The basic idea of principal component analysis is to first draw a "best" fitting line for n points so that the sum of squares of the vertical distance of these n points to the line is the smallest and is called the first principal component of this line [45]. Then, the second principal component that is independent of the first principal component and has the smallest square sum of vertical distances from n points is found. Analogously, until m principal components are obtained, the value of m is usually such that the variance of the first few principal components accounts for more than 85% of the total variance [46]. Factor analysis (factor analysis) is a multivariate statistical method that uses a few potential random variables-factors-to describe the covariance relationship among many variables [47]. In this paper, the factor obtained by the rotation of the maximum variance criterion is a linear combination of the original variables [48]. Under the premise of ensuring the least information loss, the original data are described as accurately as possible to achieve the dimensionality reduction of multivariate data. In general, the analysis results only select factors with eigenvalues greater than 1.

Temporal/spatial similarities and grouping
The dendrogram generated by the time cluster analysis divided 12 months into three clusters at (Dlink/Dmax) � 100 <70, and there were significant differences between the clusters (Fig 2). The first cluster (first period) included June and July, corresponding to the high water flow period; the second cluster (2nd period) included August, September, October and November,

PLOS ONE
Water quality assessment of the Shuangji River (China) corresponding to the flat water flow period; the third cluster (3rd period) contained all the remaining months (December, January to May), corresponding to the low water flow period. Therefore, the temporal change in river water quality depends largely on local climatic conditions (spring, summer, autumn, winter) and hydrological conditions (low flow, average flow, and high flow periods). Obviously, the Shuangji River Basin is a typical seasonal river in North China. Since the Shuangji River is mainly a polluted river, the main body of the river comes from the sewage treatment plant along the bank, and the change in water quality reflects the change in the treatment effect of the sewage treatment plant. In summer, the sewage treatment plant has a better treatment effect, and the summer rainfall is large, and the river flow is large, so the river water quality in summer is better and divided into one category. In winter, the sewage treatment plant has poor water quality due to temperature and operation, and the rainfall is small, and the river flow is small. Therefore, the river water quality in winter is poor and divided into one category.
The spatial CA also generated a dendrogram with three clusters at (Dlink/Dmax) � 100<50 (Fig 3). Group A comprised M1 and M2; group B comprised T6 and M3 to M8; group C comprised T1 to T5. It can be clearly seen from the Fig 3 that one group was the main branch of river (M1 to M8), while the other type was the tributaries of the river (T1 to T6). The tributary water sources of the Shuangji River mainly come from the drainage of upstream coal mines and reservoirs. Compared with the main river, the tributary water sources are very clean and were therefore classified as a cluster. The main river category was divided into three categories at (Dlink/Dmax) � 100<30. The first category included M1 and M2, which were highly polluted areas. The second category included M3, M4 and M5, which were in moderately polluted areas. Among the pollution sources, the main source of pollution in the high-pollution areas was that the surrounding rural domestic sewage was directly discharged into rivers and urban sewage after treatment by sewage treatment plants. Due to the improvement in urban living standards, urban domestic water consumption has exceeded the carrying capacity of sewage treatment plants, and a new sewage treatment plant is currently under construction, resulting in poor water quality in the upper Shuangji River. The medium-pollution areas were mainly polluted by the industrial wastewater discharged into the river (It can also be known from the  Table 2). The main discharge enterprise was the paper factory, and the main pollution factor was COD. However, due to the recent strict national requirements for wastewater discharge, the wastewater of the paper mill has been discharged through the factory to meet the standard discharge, which has not caused much river pollution, resulting in the areas being only moderately polluted. The low-pollution area was located in the lower reaches of the Shuangji River in Xinmi County. There was no external water pollution. The river has passed the purification of constructed wetlands and its own self-purification ability to achieve better water quality, so it belongs to the low-pollution area.

Temporal/spatial variations in river water quality
The temporal variation was evaluated using DA, with the clusters based on CA. DA aims to test the importance of discriminant functions and to determine the most important variables related to the differences between clusters. As shown in Table 3, the Wilks' lambda and chisquare values of each discriminant function ranged from 0.273 to 0.808 and from 34.834 to 202.505. The p-level value was lower than 0.01, indicating that the time DA was reliable and effective.
The discriminant functions (DFs) and classification matrices (CMs) obtained by the standard, forward stepwise and backward stepwise modes of DA are shown in Tables 4 and 5. Both the standard mode and the forward stepwise mode were able to achieve discriminant accuracy rates of 80%, using 19 and 15 factors, respectively. However, the backward stepwise mode used only four factors (CODMn, Cu, As and Se) to achieve a discriminant accuracy rate close to 70%. The temporal DA showed that the four factors CODMn, Cu, As and Se were the most important parameters to distinguish the three periods obtained by clustering and accounted for most of the expected temporal changes in water quality.
The box plot of the four important parameters obtained by the backward discriminant analysis are shown in The results of the spatial analysis of DA were similar those of CA. The Wilks' lambda and chi-square values of each discriminant function were between 0.063 to 0.448 and 129.962 to 432.084 (p<0.01), respectively, indicating that the spatial discriminant analysis was credible and valid ( Table 6).
The methods for obtaining the discriminant functions and classification matrices of the spatial DA were the same as those for the temporal DA and used the standard, forward stepwise, backward stepwise modes. The results are shown in Tables 7 and 8. The standard stepwise mode and the forward stepwise mode used 19 and 17 discriminant variables, respectively, and discriminant accuracy rates of 97.62% and 97.62% were obtained. However, in the

PLOS ONE
Water quality assessment of the Shuangji River (China) backward step-by-step mode, the DA used only 7 discriminant parameters to produce a discriminant accuracy rate of 92.86%, which indicated that COD, TP, CODMn, F, LAS, Cu, and Cd were important parameters of the spatial variables. Box and whisker plots of the discriminant parameters recognized by DA are given in Fig 5. In these seven groups of graphs, the minimum value of all factor averages is group C because group C was a tributary of the Shuangji River, and the main water sources were from coal mines and reservoirs, which belonged to the clean water quality group. Among them, the maximum values of the average values of COD, TP, CODMn and LAS were all in group A because group A is the water quality of the upper Shuangji River, and the main sources of pollution in this area were the direct discharge of rural domestic sewage into the river and urban sewage plant drainage, which belonged to the group with poor water quality. The maximum value of the average value of F and COD was group B, and the main source of pollution was industrial wastewater.

Principal component analysis/factor analysis
Before factor analysis, the Kaiser-Meyer-Olkin (KMO) and Bartlett's sphericity tests were performed to check the correlation and partial correlation between variables to judge whether the data were suitable for factor analysis [49]. The value of the KMO statistic ranges between 0 and 1 [50]. In the actual analysis, when the KMO statistic is above 0.7, the effect of the factor analysis of the data is considered to be better. The KMO result was 0.755, and Bartlett's sphericity result was 1342.53 (p<0.05), showing that PCA can play an effective role in reducing dimensionality. FA/PCA analysis is aimed at the standardized data and compares and analyses the composition patterns between water samples to determine the important factors that affect each water sample [51]. The PCA of all datasets yielded six (principal component)PCs, which explained  68% of the total variance with eigenvalues > 1 ( Table 1). The first PC (29.7% of the total variance) was correlated (loading >0.7) with COD, TP, Cu and VP. The third PC (9.2% of total variance) was correlated (loading>0.7) with LAS. However, the second, fourth, fifth and sixth PCs, although they accounted for the total variance of 10.4%, 7.2%, 5.9% and 5.5%, respectively, were not correlated (loading>0.7) with any of the parameters. Combining the local

PLOS ONE
industrial structure and distribution, it characterizes emissions related to industrial industries such as handmade paper manufacturing, coking, chemical raw materials and chemical products manufacturing, and metal products, which are consistent with the current main industrial industries in Xinmi (Henan Province). The Scree plot determines the number of PCs to keep by understanding the underlying data structure [52]. In this study, the Scree plot (Fig 6) showed a significant change in slope after the sixth eigenvalue. The original variable on the PC subspace is called the load, which was consistent with the correlation coefficient between the PC and the variable.
The axis of rotation defined by PCA will produce a new set of factors, each of which mainly involves a subset of the original variables, and the degree of overlap is as small as possible, so the original variables were divided into several independent groups [53]. Therefore, factor analysis (FA) of the current Shuangji River dataset further reduces the contribution of the

PLOS ONE
nonsignificant variables obtained from the PCA. The maximum variance rotation of the PC (original) explained the six different VFs with eigenvalues> 1, explaining approximately about 68% of the total variance. After the rotation by the maximum variance method, the value of PC was further revealed, and in VF, the participation of the original variable was clearer (Table 9). Liu et al. (2003) [54] classified the factor loadings as 'strong', 'moderate', and 'weak', which corresponded to absolute loading values of > 0.75, 0.75-0.50 and 0.50-0.30, respectively. VF1 (17.7% of the total variance) had strong positive loadings on Cu and S and moderate positive loadings on NH 3 -N and TP, indicating pollution from mineral composition and domestic sewage. This is because the main source of rivers in Xinmi City is drainage from domestic sewage plants, and the main source of pollution is domestic pollution sources, and the upstream of the tributaries are mainly coal and metal industries, causing some heavy metal pollution. VF2 (12.6% of total variance) had strong positive loadings on Cd and moderate positive loadings on Se and COD, indicating that the source was industrial wastewater pollution. This is related to the main paper industry, metal products, and steel casting manufacturing industries along the Shuangji River. VF3 (10.7% of total variance) had strong positive loadings on F and Oil and moderate positive loadings on Cr 6+ , mainly manifested as fluoride pollution and heavy metal pollution. This clustering indicates that the source of pollution was the discharge of wastewater from the chemical industry, which is related to some chemical industries and metal manufacturing industries along the river. VF4, which explained 9.9% of the total variance, had moderate positive loadings on CODMn, NH 3 -N and Zn, indicating that the pollution was from

PLOS ONE
Water quality assessment of the Shuangji River (China) mineral-related hydrochemistry and domestic sewage, mainly from the discharge of wastewater from paper-making enterprises, domestic sewage, and metal manufacturing wastewater into the river along the coast. VF5 (8.9% of total variance) had moderate positive loadings on pH and LAS, which can be interpreted as coming from detergents and personal necessities in domestic sewage. VF6, which explained 7.9% of the total variance, had moderate negative loadings on DO. This finding suggests that the pollutants in the water consumed a large amount of oxygen. The FA/PCA results indicated that most changes were composed of soluble salts (natural) and organic pollutants (artificial). FA/PCA indicated that the main pollutants are COD, CODMn, NH 3 -N, TP, Cu,Cr 6+ , Zn, S, Se, Cd, F, Oil and LAS. These pollutants mainly come from domestic sewage discharge COD, CODMn, NH 3 -N, TP; papermaking wastewater discharge COD, CODMn; textile industry, chemical product manufacturing wastewater discharge F, Oil and LAS; metal product manufacturing, optoelectronic device manufacturing and other industrial wastewater discharge Cu, Cr 6+ , Zn, S, Se and Cd. The FA can identify the parameters that have the greatest contributions to changes in river water quality. The method of assessing the spatiotemporal changes in water quality based on FA/PCA have been applied to water quality evaluation at an early stage.

Conclusions
Water quality monitoring programmes generate complex multi-dimensional data, which requires multivariate statistical processing to analyse and interpret its basic information. In this study, different multivariate statistical techniques were used to evaluate the spatial and temporal variations in the surface water quality of the Shuangji River. Cluster analysis (CA) divided the 12 months and 14 sampling points into three categories according to the similarity of river water quality characteristics and pollution. It provided an effective basis for the classification of surface water in the studied area and can effectively reduce the number of sampling points to analyse the river under the premise of lower loss of information. Discriminant analysis (DA) provides the best results for spatial and temporal analysis. It used only four factors (CODMn, Cu, As, Se) to distinguish the seasons temporally and achieved a 68% (79% reduction) accuracy rate and used only seven parameters (COD, TP, CODMn, F, LAS, Cu, and Cd) to allocate the three areas and achieve a 93% (63% reduction) accuracy rate. Although the FA/PCA pointed out the 7 parameters required to explain 68% of the data variability (37% of the original 19 parameters), only a small amount of data was reduced. However, the six VFs obtained from the PC indicated that the quality parameters of the river water were mainly divided into natural (soluble salts) and anthropogenic (organic pollution) components. Therefore, multivariate statistical techniques are an excellent exploration tool for analysing and interpreting complex datasets related to water quality and understanding their temporal and spatial changes.