Temporal and spatial correlation patterns of air pollutants in Chinese cities

As a huge threat to the public health, China’s air pollution has attracted extensive attention and continues to grow in tandem with the economy. Although the real-time air quality report can be utilized to update our knowledge on air quality, questions about how pollutants evolve across time and how pollutants are spatially correlated still remain a puzzle. In view of this point, we adopt the PMFG network method to analyze the six pollutants’ hourly data in 350 Chinese cities in an attempt to find out how these pollutants are correlated temporally and spatially. In terms of time dimension, the results indicate that, except for O3, the pollutants have a common feature of the strong intraday patterns of which the daily variations are composed of two contraction periods and two expansion periods. Besides, all the time series of the six pollutants possess strong long-term correlations, and this temporal memory effect helps to explain why smoggy days are always followed by one after another. In terms of space dimension, the correlation structure shows that O3 is characterized by the highest spatial connections. The PMFGs reveal the relationship between this spatial correlation and provincial administrative divisions by filtering the hierarchical structure in the correlation matrix and refining the cliques as the tinny spatial clusters. Finally, we check the stability of the correlation structure and conclude that, except for PM10 and O3, the other pollutants have an overall stable correlation, and all pollutants have a slight trend to become more divergent in space. These results not only enhance our understanding of the air pollutants’ evolutionary process, but also shed lights on the application of complex network methods into geographic issues.


Introduction
Since 2012, the Chinese government has invested a huge amount of resources in establishing more than 1500 air pollution monitoring centers to dynamically record and publish the air quality index [1]. However, it still remains a challenge in effectively quantifying how these pollutants evolve across time and cities [2], how the occurrence of environmental pollution is temporally and spatially correlated. Nevertheless, measuring the temporal and spatial correlation patterns of air pollutants has a profound significance in understanding cities' connections PLOS  as well as pollutants' shifting patterns, thus providing a marvelous channel to analyze geographical, meteorological conditions as well as social-economic spillover effect and most importantly, curbing the air pollution with correct remedy [1,3]. To achieve this end, we resort to complex network methodologies and try to quantify the correlations from two aspects. When it comes to the time side, we resort to the fractal analysis to examine city's self-similarity of the six pollutants. As for the space side, we try to view these cities as scattered nodes and cities' cross correlations of the pollutants time series as the edges in a graph. We then work on extracting the hierarchy structures and refining small correlated groups (known as cliques) in the constructed graph using planar maximally filtered method. Finally, we test the stability of distance and correlation relationship to consolidate our previous analysis. Both correlations are of particular importance in enhancing our understanding of each pollutant's temporal and spatial patterns. This is one of the few papers trying to understand pollution's correlated patterns from the complex network perspective. Most papers about air pollution have two focuses: Air pollution's causes [4,5] and effects [6,7]. However, our starting point is different in that it serves to deepen the knowledge of each pollutant's evolutionary patterns. In this regard, [3] do similar work, they also analyze the pollutants' temporal distribution properties at the city level. Our intraday pattern results are partially consistent with theirs. However, their work is like a basic statistical mechanism analysis, which inspires us to deeply mine the latent information. [1] study the spatial oscillation patterns of six air pollutants. Their research attaches great importance on meteorological conditions and satellite observations, therefore providing reasonable explanations for some of pollutants' intraday patterns and long-term correlations. However, their work differs from ours in several aspects. First, they only analyze air pollution in eastern cities of China in winter, while our analysis covers all the cities of China and four seasons. Second, they successfully explain the air pollution from both geographical and meteorological conditions, while due to data availability we only research the pollution's correlation from geographical distance. Third, they are trying to unveil the spatial oscillation patterns, while our targets are the temporal and spatial correlation structures. In spite of these differences, some of their spatial oscillation patterns are well-identified in our research. Regional-scale temporospatial correlations of air pollutants in China have also been investigated intensively in recent years [8][9][10][11]. It is found that the spatial and temporal correlation structure is based on regional variations or part of pollutants' variations [8,11]. To some extent, the methods used among these papers are quite conventional. Compared to these papers, we transform spatial correlation into spatial network to quantitatively measure the spatial agglomeration or separation. As noted before, our research is the first time to cover almost every medium-sized Chinese cities, making the findings more general and complete.
This paper, on the other hand, is also one of the few trying to apply complex network methods into spatial correlation issues. Although these methods have achieved great success in many areas such as stock market clustering [12] and gene decoding [13], they still remain relatively new in pollution's geographic issues. The physical distances directly or indirectly affect the spatially embedded intercity correlations, making network's architecture radically different from that of random networks [14,15]. In this sense, this paper casts a new light on the application of network methods into the pollution's spatial correlation issues.

Data sets
We obtained the hourly pollutant data from Shanghai Qingyue Open Environmental Protection Data Center (QOEPDC), and the hourly data contain about 400 observed mainland cities and from 2015-01-01 to 2015-12-31. The six pollutants are PM 2.5 , PM 10 , CO, O 3 , NO 2 and SO 2 .
For pollutant k (k = 1, Á Á Á, 6 are PM 2.5 , PM 10 , CO, O 3 , NO 2 and SO 2 respectively), city i (i = 1, Á Á Á, 350)'s observed hourly time series x ðkÞ i;t spans the entirety of 2015. Ideally, each time series should have an identical length T = 365 × 24 = 8760 hrs. However, due to monitor station recording errors, the actual time series have different lengths less than 8760. In order to preserve data completeness and improve analysis accuracy, we select 350 cities which have more than 8000 observations. Before proceeding to the data analysis, we first check the data quality and find that zeros comprise less than 1% of the time series. These zeros obviously result from the recording errors and we replace them with the average of corresponding previous and next hour's concentrations. Same alterations are also done to a few extremely high and impossible values.

The temporal correlation structure
Hurst exponent is a critical variable to quantify whether the trends of air pollutants revert to the mean (low long-term correlations) or to the cluster (high long-range correlations) [16][17][18][19][20][21]. It's defined in terms of the asymptotic behavior of the rescaled range as a function of the time span of a time series, where R(n) is the range of the first n values, S(n) is the deviation, E[Á] is the expected value and C is a constant. Theoretically, the Hurst exponent H lies between 0 and 1, and is cut off by 0.5: When 0 < H < 0.5, the time series is switching between high and low values alternatively (mean-reverting process); while when 0.5 < H < 1, it is a long-term dependent process featuring the trend that high value is followed by another higher value. Quantifying the long-range correlation has multiple implications. First, Hurst exponent is an indicator of autocorrelation, which enables us to explore the fractal structure of the evolutionary process. Second, from the policy-making perspective, when one pollution event occurs, more serious pollution events are more likely to happen afterwards, and policy makers thus have a hint to restrict outdoor activities and accordingly fight against the pollution. Last, our estimation of pollutant's Hurst exponent not only enhances our knowledge about air condition's autocorrelation structure but also draws a complete picture about how the strength of autocorrelation of each pollutant in each place differs from the others. Considering the influence of intraday patterns and possible seasonal variations, we prepare three data sets: The raw data, the normalized data by dividing the hourly average where x ðkÞ i ðd; hÞ is pollutant k's concentration level in city i at h on day d, N s is the number of days in season S (S = 1, 2, 3, 4). Because r ðkÞ i ðd; hÞ is computed on the basis of each city, it automatically eliminates our concerns of the trend issues. In the following part, detrended moving average (DMA) algorithms [22][23][24][25][26][27][28][29][30] is applied to compute H. The basic idea for DMA algorithms is to remove the trend by considering the second order difference between original time series and its moving average function (detailed procedures can be seen in Refs. [27,28,31]).
To consolidate how the Hurst exponents are spatially heterogenous, we follow the scheme in Ref. [32] to calculate the spatial stratified heterogeneity. All cities' Hurst exponents H i s are stratified into h = 1, 2, Á Á Á, L stratum based on socioeconomic factor or geographical factor H i,h , and the spatial stratified heterogeneity is defined as where N h is the number of cities in stratum h and " H h is the average Hurst exponent of stratum h. As proven in Ref. [32], a test statistic is constructed as q follows a non-centered F distribution. In this paper, we clarify two kinds of spatial stratified heterogeneities. The first one is stratified by socioeconomic status, where the studied cities are partitioned into 6 groups based on their social economic ability (more information can be retrieved at http://www.stats.gov. cn/english/). The second one is based on geographical locations using the 31 administrative partitions.

The spatial correlation structure
The Pearson cross correlation is used to quantify the similarity between city i and city j for pollutant k and is defined as where " x ðkÞ i is the average concentration of pollutant k in city i. The air quality correlation matrix typically serves as a connection form between these investigated cities which can be viewed as a complex system with interactions and entangles. The correlation matrix has provided crucial information about the system structure. In recent years, the network method and graph theory that incorporate the correlation matrix have increasingly been used to study the complex system from the perspective that observed individual is as the node and the correlation is as the edge linking these individuals [12,33,34]. The correlation based upon clustering procedure allows us to dig into the hierarchical structure of the system [12,[35][36][37]. Generally, clustering practically will reduce the dimensions of the researched multivariate time series, and enable us to group the individuals according to the similarity. In this section, we will scrutinize the spatial patterns and cities' intra-cluster structure of each pollutant using the network clustering algorithms based on the correlation matrix.
Tumminello et al. proposed the correlation filtering algorithms by maximizing the planar structure and named it planar maximally filtered graph (PMFG, hereafter) [12]. Tumminello, Lillo and Mantegna have compared the several clustering procedures and concluded the PMFG as an extension of minimum spanning tree (MST) that allows loops and cliques in the graph to provide richer information about the correlation structure [36]. The construction procedure for correlation based upon PMFG is rather direct: Starting from the descending sorted list of pair wise correlations c i,j , then adding each link between the two cities i and j if and only if the resulting graph can still be embedded on the surface of genus g k after such insertion. The generated simple, undirected, connected graph will have the same hierarchical structure of the minimum spanning tree, but admit loops to retain more relevant information. in the same province with same color. Same-colored cities tend to be close in geographic distance, and this spatial affinity property is pervasive in the six pollutants. The PMFGs refine many small loops and connected structures known as cliques [12]. These cliques are viewed as small clusters of cities that share high correlations in pollutant's evolutionary dynamics. In the following part, we work on identifying and analyzing these cliques that embedded in the PMFGs.
We also conduct a moving window scheme to analyze how these cliques evolve across the whole year motivated by the fact that the air pollution across the whole country features in dynamics and rebalance resulting from a bunch of geographic, climatic as well as human behavior factors. This study acts as a robustness test over the time stability of refined cliques. To reduce the uncertainty of data length and mitigate the influence of outliers on our final results, we set the window length w = 720hr and move forward as m = 24hr in each step. Within each window, we compute the Pearson correlation matrix as ½x ðkÞ i;l À hx ðkÞ i i½x ðkÞ j;l À hx ðkÞ j i ð6Þ where d ðkÞ i is the standard deviation of city i's time series of pollutant k. In each window we obtain the corresponding correlation-based PMFG networks. respectively. To construct the network, we use the ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ð2 À 2c i;j Þ q as a transformed notation for the correlation. https://doi.org/10.1371/journal.pone.0182724.g001

Intraday patterns
Understanding the time trend of each pollutant will give us a general view of each pollutant's evolutionary process and help us effectively detrend the time series so as to draw the real correlations [38]. It's natural to start from the intraday patterns due to the fact that the pollutants may be significantly influenced by diurnally cyclical temperature and illumination changes [39,40]. These intraday patterns showed in a daily periodical phenomenon have been found pervasive in natural sciences such as temperature variability [41,42], rainfall perception [43] and social sciences such as market trading activities [44][45][46], human mobilities [47]. In this section, we begin with parsimonious models to display the intraday patterns of the six pollutants. Let x (k) (d, h) denote the 350 cities' averaged concentration of pollutant k at the h-hour on day d. The normal definition of intraday patterns is as follows: P 24 h¼1 x ðkÞ ðd; hÞ is the average value of pollutant k on day d. This definition also takes into account the seasonal variation of pollutant concentration, but rescales the concentration with respect to the average value on each day.
These three definitions commonly present the intraday patterns but differ in relative magnitude. Eq (7) retains the original unit and magnitude, while Eqs (8) and (9) scale the raw data by dividing that day's maximal or mean concentration. Fig 2 shows the three defined intraday patterns by averaging the 350 observed cities' concentrations in each hour. Generally, each pollutant's averaged concentration time series is featured in cyclical patterns within one day. Except for O 3 , the other five pollutants' intraday patterns are composed of two contraction periods (from 12 AM to 5 AM and from 10 AM to 15 PM) and two expansion periods (from 5 AM to 10 AM and from 15 PM to 23 PM). These five pollutants' concentrations simultaneously hike to the peak level around 10 AM and then reduce to the lowest level around 15 PM. These fluctuations imply the "periodic" daily human activities because NO 2 and SO 2 mainly come from vehicles and coal combustion. However, O 3 's concentration continues reducing until 9 AM and then bounds to the peak level around 15 PM due to the photochemical reaction [48]. The peak time for O 3 is a trough time for the other five pollutants. After 15 PM O 3 is on the way to decline until midnight. The three definitions share almost identical intraday patterns and differ in relative magnitudes. Another conspicuous discrepancy lies in the relative volatility within one day. In Fig 2(c), the concentration of O 3 has the highest volatility, and NO 2 ranks the second while other four pollutants have only ±0.1 relative change around the mean level.
There are still two concerns about the robustness of air pollutants' intraday patterns. One is whether each individual city shares the same intraday trends as the aggregated does in Fig 2, the other is whether the aggregated intraday patterns are persistent during the four seasons. Fig 3(a) displays four typical cities' intraday patterns: Shanghai, Chongqiong, Shijiazhuang and Urumchi (the four cities are in different geographic areas and economic zones, also represent the four development levels of Chinese cities) and (b) averages the hourly level within each season. Both figures show a consistent framework with the previous studies. They specifically differ in relative magnitudes. For example, Fig 3(a) shows that the six pollutants in Shanghai generally fluctuate more steadily than other cities do and Shanghai also has a relatively low pollutants level. To a large extent, this is determined by Shanghai's location and its service-oriented economy. Shijiazhuang and Urumchi are both highly polluted cities, but the sources of pollutants in the two cities are quite different. Shijiazhuang's intensive heavy industry is the leading cause and Urumchi's location and climatic causes outweigh others. Fig 3(b) shows the intraday patterns across the four seasons are almost identical. The subtle difference resides in the minimum NO 2 level, which is a bit lower in summer and autumn than that of spring and winter. These findings, both at the city level and the season level, are quite consistent to Ref. [3]'s summarized results regardless of adopting different data sets and sample cities. These roughly-constructed but well-identified intraday patterns inspire us to scrutinize each pollutant's time series periodicity in a detailed way.

Lomb power analysis
In this section, we introduce the normalized Lomb power [49,50] to confirm the cyclic patterns that have been captured in Figs 2 and 3. Similar to Fourier transformation, Lomb power analysis works on converting the cyclic time series into frequency domain so as to obtain the periodical parameters. For evenly sampled time series, Lomb power is equivalent to conventional Fourier transformation spectrum analysis. For unevenly sampled time series, Lomb power analysis performs better by effectively mitigating the long-periodic noise caused by long gapped records [49]. The Lomb power P T (f) is defined as where x ðkÞ s ðs ¼ 1; Á Á Á ; 8760Þ is the averaged time series of pollutant k with size T = 8760, " x ðkÞ and σ (k) are the mean and standard deviation of the time series, and the time offset τ is determined by Obviously, the six time series share an almost identical peak power around f = 11.58 μHz and P T (f) = 68.31 dB/HZ, which equals to a period of 23.99 hrs and is corresponding to the diurnal pattern of the pollutants [50]. Except for O 3 , 2f is also a peak level for the Lomb power, and even higher than the first peak, which is explained by the intraday cycles noted before: Within one day, the evolutionary characteristic of the pollutants is viewed as two cycles, and the second peak is corresponding to such an approximately half day period. Another straightforward feature shown in Fig 4 is the evenly spaced harmonic peaks, they serve to consolidate the intraday patterns and these patterns can be safely decomposed into two contraction and two expansion periods. Moreover, this decomposition is both statistically significant and intuitively reasonable.
In this section, we have satisfactorily uncovered each pollutant's intraday patterns: From both city level and season level, two regimes within one day are identified. However, O 3 is an exception because of its asynchronous changing time with other pollutants. The Lomb power analysis enhances our understanding of this periodicity from power spectrum peaks. Practically, these intraday patterns recognized as important air quality evolutionary clues may be of great value in scheduling outdoor activities [51] and controlling air pollution [3].

Temporal correlation structure
We adopt the DMA method to estimate the Hurst exponent of each pollutant time series in each place. Then, we project these Hurst exponents into the Chinese map and also plot the histograms to illustrate their distributions. Fig 5 shows the raw data's Hurst exponents distributions both in geographical style (left panel) and histogram style (right panel). Generally, most of researched time series have Hurst exponents significantly higher than 0.50, which signals strong long-term correlations for air pollution. This provides solid evidence for the  The high long-term correlations of the pollutants (except for O 3 ) are partly consistent with the successive random dilution (SRD) explanation [53,54]. An initially concentrated pollutant will experience a random dilution and mixing process in the air, of which the process is lognormally distributed. As stressed in Ref. [54], the extreme concentration variability in time, with intensity peaks many times higher than the average, may be viewed as a consequence of a multiplicative dilution process. On the other hand, the air pollution time series possess high long-term correlations like other natural phenomenon such as river overflow and rainfall perception, since the periodic impact originated from human activity is bound to contribute some long-term correlations. In view of this, it's quite important to restrict the emissions once a pollution event happens, and sufficient time for the random dilution process will change the long-term structure of air pollutants.
As observed from Fig 5, two fine particulate matters share similar spatial and probabilistic distribution properties of Hs. To find out how the pollutants' Hs are correlated, we tabulate the correlation coefficients of any two pollutants' Hurst exponents in Table 1. The table shows that one pollutant's long-term correlation properties have a positive link with that of another pollutant, especially for the two fine particulate matters. This high correlation between pollutants stems from two possibilities: First, all the observed cities are commonly influenced by the wave of pollution with another wave of pollution following after, which results in positive correlations between the pollutants. Secondly, even if the pollution doesn't occur simultaneously, the regional and asynchronous pollution periods are linked with each other through some methods or driven by some common causes. For example, PM 2.5 and PM 10 have the highest correlation of all the pairs due to the similar source of the two pollutants. O 3 's Hurst exponents generally correlates weakly with other pollutants, which is consistent to the previous finding that the peak time of O 3 is the trough time of other pollutants and O 3 's distribution is quite different from others. In other words, this asynchronism reduces correlation between the O 3 's Hurst exponents and that of other pollutants. However, there are still some overlapped contraction and expansion periods between O 3 and other pollutants, which ensures the correlations are still positive. Except for above-mentioned pollutants, the other three pollutants' Hurst exponents maintain the correlation level ranging from 0.4 to 0.5, a moderate strength of positive correlation.
Inspired by [32,55], we measure the heterogeneity stratified by the social economic indicator (labeling each city 1-6 based on its socioeconomic development level) and the geographic indicator (using administrative partition as the indicator). The results show that the Hurst exponents are more spatially homogenous in terms of the socioeconomic partitions than geographic partitions. In other words, cities belong to the same development level tend to share similar long-term correlation structure in pollutant time series. This conclusion has several implications. First, cities within the same development level are more likely to share similar energy-intensive heavy industry structure [56]. Second, the geographical closeness means a similar air pollution mixing and diluting ability. Therefore, to curb the emission, this heterogeneity will inspire us to choose a cooperative model in an effective way. As noted before, Hurst exponent is a critical statistic to measure the long-term memories of the air pollutants. As other commonly used basic statistics, Hurst exponent could reflect the trend of the time series. Many papers have documented the potential relationship between Hurst exponent and some summary statistics [57]. Here we assess the connections between Hurst exponents and four basic statistics (mean, standard deviation, skewness and kurtosis) so as to consolidate air pollutants' temporal correlations. Table 2 reports mixed results about the relationship between Hurst exponents and the four basic statistics. The Hurst exponent is strongly correlated with the first and second moments of raw data, but if the pollution concentration's daily pattern is detrended (in Panel B and C), the correlation reduces to a very low level (even anti-correlation). Another interesting finding is the correlation between Hurst exponents and skewness or kurtosis. Skewness is a measure of the asymmetry of distribution and kurtosis measures the tailedness of the distribution. Except O 3 , the other pollutants show a negative relationship between Hurst exponents and the two statistics, in terms of the skewness. It can be interpreted as for more negative skewed pollutants distributions (more high pollutant levels than low levels), the time series tend to be higher in H due to higher likelihood of one polluted day is followed by another more polluted day. However, O 3 is an exception due to the commonly low levels of Hurst exponents and its disordered spatial distribution. As for the kurtosis, it's similar to the skewness in that most of the variance results from infrequent extreme deviations, thus leading to a lower Hurst exponent in pollutant level.
So far, we have concluded the pollutants' temporal characteristic as long-term correlation with variations existing between cities and pollutants. This long-term correlation trend is a reasonable explanation for the fact that the polluted days always recurred in clusters. By Spatial correlation structure and refined cliques Obviously the correlation spectrum are featured by clusters. As we arrange our city label sequence according to provincial administrative divisions, cities in the correlation spectrum agglomerate based on their short geographic distance or identical administrative division. These clusters reflect a similar evolutionary process between cities and this similarity is a integrated result of natural conditions and human activities. Generally, neighbouring cities are more likely to share similar meteorological and terrain conditions, as well as development levels. Therefore, the spectrum shows very straightforward squared clustering patterns. Another point for the overall correlation is they are all appreciably positive which shows the general comovement of the air pollution across the major cities in China [58]. Even so, the six pollutants still possess specific different correlation patterns: The overall average correlation of O 3 (" c ¼ 0:45 in Table 3) is much higher than the other pollutants, which implies that O 3 's evolutionary patterns all over Chinese cities is much homogenous. Second to O 3 , the NO 2 's averaged correlation is about 0.32 and the other three pollutants share nearly similar average correlations from 0.20 to 0.26. The unimodal distribution for the six correlations presents us the scenario that for the correlation structure, both independent and perfect correlation are unlikely to happen. The mode correlations varies from 0.25 to 0.50, when checking the lowest correlations, we find these low correlated areas are usually in Yunnan Province and Ningxia Province, of which the cities in these regions enjoy advantageous natural conditions that free them from the outside pollution. In addition, these cities have a relative low percentage of industrialization, preventing them from releasing excessive pollutants. Therefore, they take on an isolated effect from other areas in terms of the air pollution. Table 4 reports the percentages of 3-clique and 4-clique structures that distribute within 1 identical province or among 2-4 different provinces. It measures the diffusion power of the small similar structures. In the six pollutants, except SO 2 , about half of three highly correlated cities are just limited to one province and another 40% spreads to another province, very few of which (only around 10%) have the far reaching power to expand to the third province. Cities' SO 2 cliques, however, are more likely to stay in adjacent two provinces. The 4-clique has a similar pattern to the 3-clique: The percentage of 4-cliques distributed in 4 distinguished provinces comprises less than 10%, and most of the 4-cliques stay in the same province or their adjacent province. Cross-sectionally, PM 2.5 and PM 10 have the most 1-province cliques and SO 2 the least, which is consistent with the previous correlation spectrum that SO 2 is the least localized pollutant due to its source of fossil fuel combustion at power plants. The clique distribution, to some extent, resulting from integrated results of local emissions and global transmission. In this sense, the six pollutants can be sorted into 3 groups. First, particulate matters (PM 2.5 and PM 10 ) have the lowest transmission power. The most correlated community for these matters to transfer is within 2 provinces, considering that these matters mainly come from traffic emission and dust [59]. It's vital to restrict traffic emission and improve city green land area [9]. O 3 and NO 2 come as the second group in terms of the dispersion power. As noted before, sunlight is tightly associated with the two pollutants [48,60]. Hence, controlling the concentration of these two pollutants should mainly focus on its heavy industry emission locally. As for the other two pollutants, regional control is far from enough, interregional cooperation would be more effective than the local's effort. Table 5 tabulates the strongest correlated 3-clique and 4-clique of each pollutant. In panel A, the highest correlated 3-cliques of PM 10 , NO 2 and O 3 are in the same province, and the three cities in 3-clique of PM 2.5 and CO actually belong to the city group in Yangtze River Delta. Interestingly, the highest correlated 3-clique for SO 2 is composed of the three capital cities in Northeast China, which are viewed as historical industrial bases, as evinced that SO 2 is not a local pollutant and its variations are highly connected because of industrial activities. The extended 4-clique results of Table 5 panel B differ from 3-clique in the newly added city except for O 3 . This additional city surrounds the existing 3-clique geographically, setting the PM 2.5 for an example, the PMFG filters the Taicang, Kunshan and Shanghai as the highest strong 3-clique, and the added Changshu in 4-clique is very close to the previous three cities. These cities are the manufacturing centers and energy-intensive centers of the Yangtz River Delta. Moreover, the amount of private cars ranks high in China, resulting in a strongly correlated clique in terms of the most localized pollutants. The 4-clique for O 3 has been switching from Hunan Province to Liaoning Province rather than append another city on the basis of 3-clique, showing the unstable structure of O 3 's 3-clique, One possibility is that Liaoning Province has more industrial companies to accommodate more individuals when the correlated community expands. In Table 5, we also tabulate each strongest clique's averaged correlations (measured by the average of 3 pair-wise correlations in 3-clique and 6 pair-wise correlations in 4-clique) and the extension from 3-clique to 4-clique reduces the average correlation by about Table 4. Statistical properties of 3-clique and 4-clique structure. This table reports the summary statistical properties of the 3-clique and 4-clique subgraphs extracted from the correlation matrix based upon Planar Maximally Filtered Graph (PMFG). For each pollutant we report the number of the 3-clique and 4-clique subgraphs belongs to 1-4 administrative provinces respectively. Temporal and spatial correlation patterns of air pollutants in Chinese cities 0.02 for each pollutant. The last column displays the averaged disparity measure y i :

Pollutant 1 province 2 provinces 3 provinces 4 provinces
where s i = ∑ j 6 ¼ i, j 2 clique c ij . If the correlation is uniform across each intercity pair within the clique, 3-clique's y i = 1/2 and 4-clique's y i = 1/3. The last column shows an overall uniform correlations within each strongest correlated clique.
In this section, we resort to the administrative divisions as a rough measure of the cities' geographic distance, although the results that most highest correlated cliques are centered within one or two provinces are pretty straightforward. Critics may point out that two cities in different provinces are even closer than two cities in the same province. In an ongoing research, we are quantitatively measuring the spatial correlated structures with their mutual distances.

Cliques' wan and wax
The moving window scheme from Eq (6) shows how the percentages of the nodes in a clique belonging to one province, two provinces, three and four provinces evolve all over the whole 2015. Fig 7(a) show the 3-clique dynamic patterns. Among all the cliques, except the O 3 , the other five air pollutants' one province 3-clique percentages commonly have a declining trend, which acts as a strong signal for the anti-localization and diffusion of the pollutants. However, the percentages of cliques that belong to two different provinces are rather stable varying from 40% to 50%. And the reduced part in one province percentages flows into the three provinces percentages. Especially for the SO 2 , the higher correlated 3-clique is more pervasive in three different provinces, and up to 60% of all 3-cliques at the end of 2015. On the other hand, the localization of O 3 is quite straightforward: 45% of 3-cliques are in one province and another 45% are in two different provinces, leaving only 10% dispersion in three different provinces. When we extend the 3-clique to 4-clique in Fig 7(b), the scenario is quite different. O 3 's localization is not stable any more and the mutation period occurs around September. Before September, the percentage of 4-clique whose cities are in the same provinces stabilizes at around 40%. However, after September the 4-clique structures become more diversified. Another Temporal and spatial correlation patterns of air pollutants in Chinese cities evident breakpoint happens to PM 10 around late May. Before May, most of the 4-cliques pertain to same province or two provinces, indicating the strong local effect of PM 10 . However, after May this tide has been reversed to the extent in which three provinces and four provinces individuals dominate the 4-clique. The rest four pollutants are slightly decreasing the local tendency and increasing the diverging correlations. How climatic conditions and human activities influence the divergent trend needs to be evaluated further. One thing is for sure that the two breakpoints are both climate change points and industrial activity peak time in China [61]. To sum up, we are fine to conclude that the correlation structure of the pollutants are in a course of slightly divergent dispersion in space. This finding, to some extent, consolidates the hypothesis that Chinese air pollution's diffusion power is further reaching. Consisting with the previous static analysis, we plot the evolutions of the most strongly correlated cliques across each sliding window and find that even though different strongest cliques will occur in different windows, only limit to the several individual cities labeled in Fig 8. By and large, both Shanghai-centered and Beijing-centered city groups are ubiquitous in each pollutant's cliques. Although the two cities are service-oriented economy, this result is not surprising because there are thousands of state-run and private-owned manufacturing industries distributed along the two economic belts and the density of population is among the highest as shown in Fig 9, resulting in highly correlated spatial community of air pollutants in these cities. On the other hand, the policy implication is straightforward, collaborative management and joint monitoring will be more effective in controlling the pollution in these two areas. The 4-clique generally has boarder coverage than 3-clique and the connections labeled as solid Temporal and spatial correlation patterns of air pollutants in Chinese cities lines are denser. Moreover, the length of these edges connecting cities signals the exterior extension degree. In this sense, O 3 's strongest cliques display visibly localized properties, while strongest correlated cliques of CO and SO 2 even connect extremely western and eastern cities of China. The strongest correlated cliques' occurrence probability (measured by the number of clique's lasting days and labeled with five colors) reveals that the above-mentioned two city groups are more likely to be encompassed into these highly correlated cliques.
To gross all the stable strongest correlated cliques using cities evolution tree [62], we find in Fig 9 that five parts of China are of great significance in understanding the dynamic spatial structure. The Beijing-centered Jing-Jin-Ji belt and the Shanghai-centered Yangtze-River Delta as well as the Guangzhou-centered Pearl-River Delta are strongly correlated because of distributed intensive manufacturing firms [58]. The cities in Northeast China and Northwest China are strongly correlated, which is mainly caused by the large amount of pollutants emitted due to the heating supply in winter [1]. Moreover, particulate pollution in these parts is under the common influence of regionally accumulated pollutants due to the lack of strong winds.
In brief, the dynamic analysis above allows us to check the stability of both these cliques themselves and their relationship with geographical distances. Except one or two particular pollutants, PMFG filters these correlated cliques which is characterized by an overall stable but slightly divergent trend in space. In addition, the most strongly correlated cliques apparently center around some developed areas for rather a long period. These findings will encourage us to think about a cooperative management model to curb the localized cliques and decentralized method to control divergent pollutants such as SO 2 . Temporal and spatial correlation patterns of air pollutants in Chinese cities Temporal and spatial correlation patterns of air pollutants in Chinese cities

Discussion
Chinese air pollution arises rapidly along with the economic development and is torturing the whole country [58]. It requires tenacious determination to fight against the air pollution. This paper serves to deepen our understanding of the air pollutants evolutionary process from temporal and spatial correlations.
To begin with, we depict the intraday patterns for five pollutants in terms of two contraction periods and two expansion periods. O 3 , as an exception, shows its asynchronism. Moreover, this trend is pervasive all over Chinese cities and across all four seasons. The Lomb spectrum analysis proves that these intraday patterns are daily periodical across the whole year. These results help us overview the general structure of air pollutants' time series and lay foundations for the understanding of temporal and spatial correlations.
From the temporal side, most cities' air pollutants' time series show strong long-term correlations, which is consistent with the trend that smoggy days are always followed by one another. This finding can be partly explained by the successive random dilution model [53], where the air pollutants undergo a random dilution and mixing process and accumulate again, resulting in a multiplicative dilution process [54]. To further explore the spatial heterogeneity of these Hurst exponents stratified by socioeconomic and geographical indicators [32,55], we find that the Hurst exponents are more heterogenous partitioned by the geographical indicator, which shows that the long-term structures of air conditions are closer in cities at similar development levels. This finding also partially shows that human factors outweigh natural factors in determining the long-term trend of air pollution [54]. We also find two particulate matters share the similar temporal trends; other three pollutants (CO, NO 2 and SO 2 ) also behave similarly in the long-term correlations. This particularity of O 3 is largely due to its asynchronous changing process with other pollutants. The relationship between Hurst exponents and several basic statistics is also displayed, although the results are mixed, we capture a negative correlation between H and skewness or kurtosis.
The policy implication of the long-term structure is twofold. First, except for O 3 , the other five pollutants' long-term correlations inform us that weakening the multiplicative dilution and accumulation process of air pollutants requires a comprehensive set of actions based on an integrated approach to make substantial improvements [63]. Second, assessing the spatial stratified heterogeneity, the relationship between Hurst exponents and other statistics makes regional variations of pollutants' long-term structure clear, providing an empirical support in the prediction of pollutants' evolution [64].
On the spatial side, starting from the Pearson correlation structure, we are the first to cover almost every medium-sized Chinese cities to unveil the spatial correlations compared to previous researches [65,66]. It's shown that O 3 tops the six pollutants in terms of overall correlations. All the six correlation spectrum are featured with clusters. We then successfully refine small cliques in the correlation structure aided by PMFG [12]. These cliques reflect how the spatial similarity of the pollutants' evolutionary process looks like and show how the six pollutants disperse spatially. We confirm that neighbouring cities are more likely to form clusters [58]. Based on the spatial correlations of each pollutant, we classify the pollutants into three groups with increasing dispersion power: Particulate matters (PM 2.5 and PM 10 ), O 3 and NO 2 which merely spread to the third or forth provinces, SO 2 and CO which are easy to form cliques with cities far away. The tabulated highest correlated cliques show that manufacturing centers are more likely to form strong correlation structure [1,36,58]. These well-identified small cliques are of great value in understanding the pollution's spatial correlations [12,67].
Finally, we test the correlation's dynamic stability across the year through a moving window scheme. It is found that O 3 has breakpoints in both 3-clique and 4-clique around September, and PM 10 also shows its breakpoint around late May, while other pollutant present a general stable divergent and diffusive trend in spatiality. These two breakpoints can be partly explained by climate change points and industrial activity peak times in China [61]. The finding that the correlation structure of pollutants is slightly divergent serves a piece of solid evidence that air pollution in China is reaching further away, making the environmental issue severer [1,58].
Although these conclusions are carefully drawn and cautiously presented, we still have huge potential to improve. First, the causes of a decaying correlation between two cities are rather complicated, with the distance being one of the many factors. Other meteorological conditions such as wind largely account for this spatial correlation patterns. Modelling this spatial correlation with distance, possibly leads to a unilateral conclusive result. In the future study, we should try to collect other meteorological data to enhance the causes of spatial connections. Second, although we unveil the pollutants' spatial patterns from both static and dynamic analyses, the shifting patterns remain a puzzle. To achieve this end, two difficulties are ahead. The first is to introduce a diffusion model among so many cities and the second is to identify the correlation directly coming from pollutants' shifting rather than data noise. Anyway, it provides a promising direction for future research.