The Molecular Composition of Dissolved Organic Matter in Forest Soils as a Function of pH and Temperature

We examined the molecular composition of forest soil water during three different seasons at three different sites, using electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry (ESI-FT-ICR-MS). We examined oxic soils and tested the hypothesis that pH and season correlate with the molecular composition of dissolved organic matter (DOM). We used molecular formulae and their relative intensity from ESI-FT-ICR-MS for statistical analysis. Applying unconstrained and constrained ordination methods, we observed that pH, dissolved organic carbon (DOC) concentration and season were the main factors correlating with DOM molecular composition. This result is consistent with a previous study where pH was a main driver of the molecular differences between DOM from oxic rivers and anoxic bog systems in the Yenisei River catchment. At a higher pH, the molecular formulae had a lower degree of unsaturation and oxygenation, lower molecular size and a higher abundance of nitrogen-containing compounds. These characteristics suggest a higher abundance of tannin connected to lower pH that possibly inhibited biological decomposition. Higher biological activity at a higher pH might also be related to the higher abundance of nitrogen-containing compounds. Comparing the seasons, we observed a decrease in unsaturation, molecular diversity and the number of nitrogen-containing compounds in the course of the year from March to November. Temperature possibly inhibited biological degradation during winter, which could cause the accumulation of a more diverse compound spectrum until the temperature increased again. Our findings suggest that the molecular composition of DOM in soil pore waters is dynamic and a function of ecosystem activity, pH and temperature.


Introduction
Dissolved organic matter (DOM) is an important component of the global carbon cycle [1], [2]. The journey of terrestrial DOM from plants to the ocean starts when DOM is leached from N, 10°27' 08" E, 440 m above sea level (a.s.l.)], Thuringia, central Germany. The unmanaged and deciduous forest is composed mainly of European beech (Fagus sylvatica, 65%) and ash (Fraxinus excelsior, 25%). The A horizon is 5 to 15 cm deep followed by a clayish T horizon on 50 to 60 cm deep fertile cambisol (clay loam). The experiments in the National Park were conducted with the permission of the National Park Administration.
The third site is in a stand of maple trees (Acer pseudoplatanus), planted in 1990 within a pine and spruce forest (Pinus sylvestris and Picea abies) near the village of Thann in southeast Germany (49°19' 24" N, 12°27' 37" E). The granite parent material is followed by cambisol with high sand content and 5 to 7 cm organic layer on top [47]. No specific permissions were necessary to work on the Wetzstein and Thann site and no protected species were involved.
The temperature data were continuously logged at Wetzstein and Hainich (NTC resistance thermometer type 107, Campbell Scientific with radiation protection and data logger type CR23X, Campbell Scientific) and summarized to provide the daily average temperature. The length of the growing season was determined by summing the number of days with daily average temperature > 5°C. For the Thann site this was determined using data from Weiden, the nearest weather station available from 'Deutscher Wetterdienst' at http://www.dwd.de.
The water samples were taken using permanently installed glass ceramic suction plates (1-1.6 μm pore size) at 5 cm depth from the A horizon. We examined a total of 16 samples from three main sites in March, May and November 2005 (Table 1 and S1 Fig.). If only one sample was available from one designated spot per main site, we examined replicate measurements or used samples from the next deeper depth (10 cm).

Sample preparation and FT-ICR-MS analysis
After each sampling event, the bottles connected to the glass suction plates were evacuated to 20 kPa. Soil water was sucked into the bottle until ambient pressure was reached. Because of the biweekly sampling period for soil water, the DOM resulted from soil water collected over a period of two weeks (Table 1 and S2 Fig.). In the case of low sample volume, time points were combined. The samples were immediately freeze-dried in the lab and stored at room temperature in the dark. Prior to FT-ICR-MS analysis, the samples were redissolved in ultrapure water and desalted via solid phase extraction (Varian PPL; 1 g; [48]). The average extraction efficiency was 63% on a carbon basis. The DOC concentrate was diluted for FT-ICR-MS measurements with ultrapure water and methanol to 1:1 methanol/water (v/v) and a final concentration of 20 mg DOC/l. Acidified ultrapure water was stored in the same type of bottles as redissolved DOM samples and was used as a procedural blank for extraction and FT-ICR-MS measurements.
FT-ICR-MS measurements were performed at the University of Oldenburg (Germany). The samples were continuously infused into the ESI source at 120 μl/h and an ESI needle voltage of -4 kV. The FT-ICR-MS (Bruker Solarix, 15 Tesla) was used in negative ionization mode, and over m/z 150-2000 with an ion accumulation time of 0.25 s, 500 single scans were added to one spectrum. The internal calibration of the spectra was carried out with an in-house mass reference list. the procedural blanks were removed from the peak list of the samples. C, H, O, N and S (N 2, S 1; [24], [36]) were considered for molecular formula assignment with an in-house algorithm. Only singly charged ions occurred in our mass spectrum. In total, we assigned formulae to 5003 of the 7778 peaks. Most of the unassigned masses are isotopologues of the identified formulae. To compare samples and exclude the peaks that were not measured significantly in all samples, we defined a limit of detection (LOD) that was applied to all samples (LOD Group ). For this purpose, the signals I j,n of each measurement j with n peaks were normalized to the sum of the intensities S(I j,n ) of each measurement resulting in normalized peak intensities I j.n, Norm (I j.n,Norm = I j,n / S n (I j,n )). For each measurement the lowest I j,n,Norm was set as an individual limit of detection (LOD Ind ). LOD Group was defined as the maximum of every LOD Ind [29]. All peaks with a relative intensity < 2 Ã LOD Group (limit of quantification) were set to 1.5 Ã LOD Group to exclude variability between samples based on peaks near the LOD Group [49]. Introducing LOD Group resulted in a list of 1918 formulae for statistical analysis (S1 Table). These were based on 1062 to 1334 individual peaks per sample, of which 607 formulae were detected in all measurements from all samples. Therefore, 46 to 57% of all formulae per measurement occurred in all other measurements.
Peak reproducibility was determined with measurements of a deep sea DOM reference sample [50], [51] performed at the beginning and end of each of the two measurement days resulting into four replicate measurements. These measurements were handled as a separate data set with its individual LOD Group resulting in 2045 to 2075 individual peaks with assigned molecular formulae for each measurement (S2 Table). According to [52], we determined a) the number of peaks shared between the four replicate measurements (1703), b) the proportion of peaks shared between the four replicates (83%); and c) the median of the relative standard deviation for the peak intensities of peaks found in all injections (15%). For data interpretation, we only considered compounds with assigned molecular formulae because we wanted to extract information with respect to DOM chemical characteristics. Isotopologues do not provide additional molecular information and were also disregarded from further consideration. From the molecular formulae we calculated the following parameters: number of carbon (C#), hydrogen (H#), oxygen (O#) and nitrogen (N#) atoms, hydrogen to carbon ratio (H/C), oxygen to carbon ratio (O/C), double bond equivalents (DBE) and molecular weight (MW), double bond equivalents to carbon ratio (DBE/C), double bond equivalents to oxygen ratio (DBE/O), the difference between DBE and number of oxygen atoms (DBE-O), the aromaticity index, AI [53] and the nominal oxidation state of carbon (NOSC) [24]. Whereas H/C indicates the amount of hydrogen saturation, DBE indicate the number of π bonds and rings in a compound. Additionally, high DBE/C values indicate aromatic or condensed aromatic structures and the AI can be used to unambiguously identify aromatic (AI > 0.5) and condensed aromatic (AI ! 0.67) compounds [53]. DBE-O is a rough measure for C = C bonds because it omits any possible C = O bond [23]. O/C describes the degree of oxygenation [30]. Expressing the average oxidation state of all carbons in one formula, NOSC provides information on the biogeochemical reactivity of a compound [24]. Van Krevelen diagrams are a convenient and common method to display an overview over all H/C and O/C ratios of a data set [26], [54]. From the location in the van Krevelen diagram, a given formula can be roughly assigned to classes of compounds such as carbohydrates, lipid, peptides, lignin, and tannins [16], [30]. It must be emphasized, however, that the assignment of structural compound groups is ambiguous, with the exception of aromatic and condensed aromatic structures.
The statistical analysis was based on the data matrix containing normalized signal intensities and the corresponding molecular formulae. To find similarities and differences in the molecular data, we applied unconstrained and constrained ordination methods. Principal component analysis (PCA), as an unconstrained indirect gradient analysis, detects assumed gradients in the data set. Redundancy analysis (RDA), as a direct constrained gradient analysis, uses additional environmental information (Table 1) to explain the variance in the data set. The applicability of RDA, assuming a linear ordination model, was tested using detrended constrained canonical analysis (DCCA). The resulting length value of the longest gradient was < 1.3, rejecting the unimodal model and supporting suitability for the linear model [55]. The ordination methods were performed using Canoco 5 for Windows (Microcomputer Power, Ithaca, New York, United States). Matlab R2012a (The MathWorks, Inc., Natick, Massachusetts, United States) was used for further data preparation, data handling and statistical tests.
To test whether average values between different sites were significantly different, we applied Kruskal-Wallis followed by Mann-Whitney U tests. The Kruskal-Wallis test indicates whether at least one pair of parameters is significantly different from each other. The Mann-Whitney U test tests each pair separately. Both tests were performed using SPSS 16.0 for Windows (SPSS Inc., Chicago, Illinois, United States). The statistical significance was determined for all tests by applying a significance level of p < 0.05.

Environmental data, pH and DOC concentration
The pH at the three sites ranged between 4.0 (minimum at Wetzstein) and 5.5 (maximum at Hainich, Table 1) and the mean pH values at the respective sites were 4.1 (± 0.1) at Wetzstein, 4.4 (± 0.1) at Thann, and 5.1 (± 0.2) at Hainich. Mann-Whitney U tests confirmed significantly different pH conditions between the sites. The DOC concentration ranged between 9.2 mg/l (minimum at Hainich) and 152 mg/l (maximum at Wetzstein) and the mean DOC concentrations were 99 (± 30) mg/L at Wetzstein, 63 (± 24) mg/l at Thann, and 17 (± 7) mg/l at Hainich. Mann-Whitney U tests confirmed significantly different DOC concentrations between Hainich compared to Thann and Wetzstein. Comparing the DOC concentrations between Thann and Wetzstein resulted into a p-value (p = 0.053) slightly above our significance level. We found a negative correlation between the pH and DOC concentration of the individual sample across all sites with a Spearman's rank correlation coefficient, r s , of -0.87.
All three sites had a similar seasonal trend in temperature and a similar irregular precipitation distribution across the year (S2 Fig.). The samples were collected at comparable time points of the growing season in March, May and November ( Table 1). The annual precipitation in 2005 was 735 mm at Hainich, 600 mm at Thann, and 580 mm at Wetzstein.

Principal component and redundancy analysis (PCA and RDA)
PCA summarized 61% of the variability in the molecular data by the first three principal components (PC): PC1, 32.9%; PC2, 15.7%; PC3, 12.5%. PCA clearly separated the three sampling sites along axis 1 (Fig. 1). Both the pH and DOC concentration correlated with PC1. None of the known environmental parameters were correlated with PC2. PC3 appeared to be related to the sampling season. The majority of the samples collected in March had negative PC3 values, the May samples were more positive around the origin of the PC3 axis (Wetzstein and Thann) and most November samples had positive PC3 values. In summary, PCA revealed clear differences in DOM molecular composition. These differences were related to both the sampling site and season.
The forward selection function in combination with Monte Carlo permutation tests applied in the RDA demonstrated that DOC concentration, pH and the length of the growing season explained 42% of the variability in the molecular data. Consistent with the PCA (Fig. 1), the first RDA axis separated the sampling sites and axis 2 the seasons. The variability along RDA axis 1 is correlated with DOC concentration and pH, the variability along axis 2 with season ( Fig. 2). The molecular diversity (number of molecular formulae) decreased with the progression of the growing season from March to May and November. This indicated that the total number of formulae per measurement was related to temperature-related seasonal differences.
We applied variation partitioning with conditional effects implemented in Canoco 5 to test if pH and DOC concentration together explained variability independent from the length of the growing season. We grouped the pH and DOC concentration together because they were negatively correlated and displayed opposing effects in PCA and RDA ( Fig. 1 and 2). Variation partitioning with conditional effects was based on three different RDAs. The first tested the total explained variation by pH, DOC concentration and length of growing season together. Because these were all the significant environmental factors, their explained variation of 42% was set to 100% of total explained variation. The second RDA tested how much of the total explained variation was explained by DOC concentration and pH together (84.9%), while the covariate role was assigned to length of growing season. The third RDA tested how much of the total explained variation was explained by the length of the growing season (21.6%), while the  Dissolved Organic Matter in Forest Soils covariate role was assigned to pH and DOC concentration. The intersection c represented the amount of shared variability of both tested groups. To obtain this intersection, the unique parts were subtracted from 100%: c = (100-84.9-21.6)% = -6.5%. This slightly negative intersection can be regarded as not different from zero [56] and demonstrates that the DOC concentration and pH explain the variability independent of the length of growing season. The F statistics in Canoco 5 confirmed the significance of the fractions 'pH and DOC concentration + length of the growing season + intersection', 'pH and DOC concentration' and 'length of growing season'.
To further examine the peaks correlating with the pH and DOC concentration, we applied variation partitioning to test the simple effects of each individual parameter by ignoring the other parameters. Again, this was based on three individual RDAs. The first RDA tested the total explained variation by pH and DOC concentration together (32.4%) and was set to 100% total explained variation. The second RDA tested how much of the total explained variation was explained by the DOC concentration (83.9%) without assigning a covariate role to pH. Resulting from that, 16.1% of the total explained variation (100%-83.9%) was explained by pH. The third RDA tested how much of the total explained variation was explained by pH (93.4%) without assigning a covariate role to DOC concentration. DOC concentration explained 6.6% of the total explained variation (100%-93.4%). Consequently, 77.3% (100%-16.1%-6.6%) of the total explained variation was explained by the shared effects of pH and DOC concentration. This high shared amount of explained variability was due to the negative correlation of pH and DOC concentration. Again, the F statistics in Canoco 5 confirmed the significance of the fraction 'DOC concentration + intersection', 'pH + intersection' and 'DOC concentration + pH + intersection'. Here, we applied variation partitioning with simple effects because we intended to extract the molecular formulae that correlated both with pH and DOC concentration.

Separating the effect of DOC concentration and pH
Because RDA is based on regression analysis, we could also compute the coefficients of determination (R 2 ) of every molecular formula with each environmental parameter. We used R 2 from the partial RDAs executed in the variation partitioning to identify those molecular formulae that were significantly correlated with pH and DOC (R 2 > 0.5). This resulted in two data sets: the formulae correlating with the DOC concentration ('corr DOC') and the formulae correlating with pH ('corr pH'). Each of these data sets was separated into two further data sets. The formulae correlating with DOC concentration were separated into formulae correlating positively ('pos corr DOC') and negatively ('neg corr DOC') with DOC concentration. The same procedure was carried out for pH (Fig. 3). Because DOC concentration and pH were negatively correlated, they shared a certain amount of correlating formulae. These shared formulae were combined in the data set with formulae positively correlated with DOC and negatively with pH 'DOC " + pH #' (S5 Table) and in the data set with those formulae negatively correlated with the DOC concentration and positively with pH 'DOC # + pH "' (S6 Table). The remaining formulae were assigned to four other data sets of formulae: formulae that were correlated positively ('only DOC "') (S3 Table) or negatively ('only DOC #') (S4 Table) with DOC and those correlated positively ('only pH "') (S8 Table) or negatively ('only pH #') (S7 Table) with pH. Only 4 formulae were uniquely negatively correlated with DOC ('only DOC #'), while 33 to 137 molecular formulae represented the other types of unique correlations (Fig. 3).
An initial overview of the similarity and dissimilarity between the individual data sets of the extracted formulae is obtained by plotting their position in H/C versus O/C space (i.e., van Krevelen diagrams). To include the data for all six data sets in one figure and present it clearly, we reduced the information by plotting the centroids of each data set (Fig. 4). The trends found for the shared correlations ('DOC " + pH #' and 'DOC # + pH "') are similar to those for the unique correlation with pH ('only pH "' and 'only pH #') indicated by intersecting centroids. The formulae 'only pH "' and 'DOC # + pH "' were characterized by higher H/C and lower O/ C, whereas 'only pH #' and 'DOC " + pH #' were characterized by lower H/C and higher O/C. Therefore, higher molecular saturation and less oxygenation were statistically related to higher pH and lower DOC concentration. The centroids of those formulae correlating uniquely with DOC concentration were only clearly separated from each other by H/C, i.e., higher unsaturation was related to higher DOC concentration. While 'only DOC "' was similar in H/C as 'only pH #' and 'DOC " + pH #', it was separated by lower oxygenation. Consistently, 'only DOC #' was different from 'DOC # + pH "' and 'only pH "' by H/C, while having a similar degree of oxygenation.
To compare the different data sets (Fig. 3) of the isolated molecular formulae in more detail, we used several parameters that are derived from the molecular formulae (C#, H#, O#, N#, H/ C, O/C, DBE, MW, DBE/C, DBE/O, AI and DBE-O). We standardized the data for each of the parameters to the parameter maximum among all six data sets to display the boxplots for all parameters on one scale (Fig. 5, Table 2). Mann-Whitney U tests of the non-standardized data ('only DOC "' $ 'only DOC #', 'only pH "' $ 'only pH #', 'DOC " + pH #' $ 'DOC # + pH "') revealed statistically identical medians only for the data sets 'only DOC "' and 'only DOC #' concerning parameters O#, N#, O/C and MW. All other groups of molecules were significantly different from each other.
Consistent with the van Krevelen analysis (Fig. 4), the trends for 'only pH "' compared to 'only pH #' were very similar to the differences between 'DOC # + pH "' and 'DOC " + pH #'.  The trends demonstrated higher saturation (high H/C and lower DBE), lower oxygenation (O/ C), smaller molecules (MW) and the presence of nitrogen-containing compounds at a higher pH and lower DOC concentration. The unique influence of pH ('only pH "' $ 'only pH #') was similar to the shared effect. Approximately 50% of the formulae that were related to high pH contained at least one nitrogen atom, about half of which even contained two nitrogen atoms (Fig. 5). Following this trend with pH, these nitrogen-containing compounds occurred mainly in the samples from the Hainich site (62 to 78), and rarely in the samples from Thann (2 to 7) and Wetzstein (0 to 8).
Because the medians of H#, H/C and DBE were higher in 'only DOC #' we conclude that the main difference between the two data sets of 'only DOC "' and 'only DOC #' was saturation. The unique effect of DOC ('only DOC "' $ 'only DOC #') was most obvious in the medians of H/C and H#. This was also indicated by the centroid data in the van Krevelen diagram (Fig. 4). Our results demonstrate that high DOC concentrations have a unique effect by increasing molecular unsaturation. The main trends in the unique effects of pH are similar to the shared effects of pH and DOC concentration. Overall, with higher pH and lower DOC concentration we found higher saturation, lower oxygenation, smaller molecules and abundant nitrogen-containing compounds.

Seasonal differences in molecular characteristics
There was a clear trend of decreasing molecular diversity during the growing season from March to May and November for all main sites (Fig. 6a) which is consistent with the RDA results (Fig. 2). Additionally, the number of nitrogen-containing compounds decreased in the course of the year at all three sites, although the total number of nitrogen-containing compounds varied among the sites, with the highest number appearing at the Hainich site (Fig. 6b).  For a more detailed interpretation of these molecular trends, we identified those molecular formulae that were significantly correlated with the length of the growing season. This was based on the partial RDA within the variation partitioning between pH and DOC and the length of the growing season. Only four formulae fulfilled the criterion of R 2 > 0.5 and they were all negatively correlated with the length of the growing season. Because so few formulae were found, they are not focus of the discussion.
As an alternative approach to analyze seasonal trends on a molecular level, we identified those molecular formulae that occurred uniquely in each month. The resulting numbers of unique formulae were 105 for March, 33 for May and 140 for November. The main differences between monthly unique peaks were in H/C (Fig. 7), both for nitrogen-containing and   nitrogen-free compounds. This increase in saturation over the growing season was more pronounced for nitrogen-free compounds compared to nitrogen-containing compounds.

Comparison with the Yenisei
We tested if the formulae that were significantly correlated with pH were the same in this study and the previous study in the Yenisei [29]. Both studies shared 23 molecular formulae that positively correlated with pH, and 57 that negatively correlated with pH. Consequently, 15% of the formulae that correlated positively with pH in this study and 22% of those in the Yenisei transect study were identical. In the case of formulae that were negatively correlated, 31% of those in this study and 26% of those in the Yenisei transect study were identical. Higher molecular saturation (H/C) was clearly related to higher pH (Fig. 8). In addition, we found in both studies that the molecular weight of the formulae was highest at high DOC concentration, low pH, and high temperature (Fig. 9). There was also a clear trend in NOSC, which was highest at high DOC concentration, low pH and low temperature (Fig. 9).

Discussion
Knowledge of the factors that control DOM molecular composition is scarce. Here we tested the hypothesis that there is a universal relationship between DOM molecular composition and pH and temperature across different aquatic systems. We studied the differences in the molecular characteristics of forest soil water DOM. The pH variation was achieved by selecting three different forest sites in Germany that differ in soil pH. The temperature effect was separated by analyzing samples from three different seasons. Similar to a previous study in the Yenisei tributaries, we observed a statistical correlation between pH and DOM molecular composition. We demonstrated that pH, DOC concentration and the length of the growing season together explained 42% of the variability in the DOM composition. Even within a small pH range between pH 4.0 and 5.5, the molecular composition of DOM significantly correlated with pH and DOC concentration. DOC concentration and pH had a shared effect whereas the influence of the length of the growing season was independent of them.
As with any analytical method, FT-ICR-MS does not have an unlimited analytical window. This analytical window is very broad for ESI-FT-ICR-MS in the case of polar water-soluble organic compounds; however, small ionic compounds and colloidal matter are not likely to be detectable via this technique. This restriction must be kept in mind when interpreting our data.

Trends driven by pH and DOC concentration
Correlations do not prove causal effects and in this study we cannot a priori exclude a correlating influence of pH and vegetation on the DOM molecular composition because different tree species grow at the three forest sites. The consistent pH trend observed in both the Yenisei tributaries and German forest soils, however, provides strong support for our hypothesis that pH is a main driver for DOM molecular composition. Consistently, a significant percentage of the formulae that correlated with pH were identical in both studies. Even if this does not prove that they originate from identical molecular structures, they at least have similar molecular properties, e.g., in terms of NOSC, saturation and molecular weight. This observation is consistent with findings from Strobel et al. [40]. They demonstrated that the chemical composition of the forest floor DOM of an equivalent DOC concentration does not depend on the tree species or soil type.
Higher DOC concentrations and lower pH were related to greater molecular unsaturation, greater oxygenation and larger molecular size. These characteristics indicate a higher abundance of unsaturated C = C and C = O bonds, an increased occurrence of oxygen functionalities and greater molecular size at a lower pH. These molecular characteristics and the corresponding distribution patterns in the van Krevelen diagram (Fig. 4) indicate that phenolic tannin compounds may be more abundant at a lower pH [30], [57]. Tannins are produced as secondary metabolites and are known to be produced at a higher rate when plants grow under environmental stress such as nutrient-poor soil, low pH or drought conditions [58][59][60].
In addition to the above characteristics, we observed a higher abundance of nitrogen-containing compounds at a high pH and low DOC concentration. The structure of these compounds remains unknown. One likely possibility is amino groups, which can be the predominant nitrogen-containing class in DOM and the soil organic matter of forest soils [61], [62]. Amino bonds occur, e.g., in amino sugars and peptides that are largely products of microheterotrophs and are assumed to be indicators of microbial activity [38]. The occurrence of tannins or similar polyphenols can reduce organic matter decomposition rates because tannins can precipitate and immobilize organic nitrogen compounds [14], [63], [64]. This mechanism may explain the lower diversity of nitrogen compounds at Wetzstein and Thann, the low pH sites. The greater diversity of nitrogen-containing compounds at the Hainich site may be the result of higher biological activity that could be related to lower tannin content and more neutral pH conditions. Additionally, the low MW at a higher pH may also be related to enhanced microbial activity. Processing the plant-derived material of high MW results in the release of compounds of lower MW and the buildup of the microbial biomass [65], [66]. Future studies at sites with different soil pH but the same vegetation and vice versa are needed to unambiguously separate the importance of the "nature" of DOM (the sources control DOM quality and reactivity) and the "nurture" that assumes that environmental variables drive DOM quality and reactivity [14], [26], [67], [68].

Seasonal differences in DOM composition
In addition to the correlation of pH and DOC concentration with DOM molecular composition, RDA and variation partitioning demonstrated the seasonal dependency of the DOM molecular composition. The alignment along the PC3 axis according to the sampling month also indicated a seasonal influence. Both unconstrained and constrained ordination demonstrated that the greatest seasonal difference in molecular composition occurred between March and November ( Figs. 1 and 2). The seasonal differences are related to the diversity of molecular compounds and the occurrence of nitrogen compounds (Figs. 2 and 6). There is a distinct trend of a greater number of compounds in March, whereas the number decreases into May and November. To explain this observation, we suggest the temperature dependence of microbial degradation coupled with little microbial degradation over winter, which may lead to the seasonal accumulation of compounds that are only slowly processed in winter. This supports recent findings suggesting that plant-derived compounds are only present when microbial activity and the decomposition rate are low [69]. During winter, the mineralization of nitrogen compounds is likely inhibited, which would explain the higher amount of these compounds present in March (Fig. 6). With higher temperatures, the biodegradation rate increases, possibly leading to a less diverse spectrum of compounds and to a greater extent of mineralization of nitrogen-containing compounds such as peptides or amino sugars [38], [62]. These conclusions are supported by our analysis of formulae unique to each month. Because the DOM of microbial origin is characterized by a low content of aromatic functionalities, the greater unsaturation in March indicates plant-derived (polyphenolic) molecules with a low degree of degradation [69], [70]. Due to rising temperatures over the course of year, microbial degradation increases, and may explain greater saturation. A similar trend was also observed for nitrogencontaining compounds but was less affected by increasing saturation compared to nitrogenfree compounds (Fig. 7). This may suggest a lower degree of degradation and higher rate of recycling, which supports other findings of preserved nitrogen-containing compounds during soil decomposition and humification, whereas organic compounds based only on C, H and O undergo increased degradation in their role as an energetic base [71], [72].
In summary, we discerned two aspects of seasonal differences across all sites: over the course of the growing season, the molecular diversity of DOM decreased and the molecular saturation increased. In the previous Yenisei study, the latitudinal gradient was correlated with the mean annual temperature and was considered as a space-for-time substitution to predict future climate change [29]. We suggested that the latitudinal variations were derived from temperaturedependent decomposition of DOM. Consequently, we concluded that increasing temperatures in the higher latitudes might increase DOM decomposition. Also in this study on soil DOM, we suggested the seasonal differences to be derived from temperature dependent degradation processes. In both studies, the DOM characteristics indicate substances with a lower degree of degradation occur in samples that are under the influence of lower temperature, i.e., samples taken at March at the beginning of the growing season and samples taken at higher latitudes. In addition, we demonstrated in both studies that the effect of pH on DOM composition was independent from a temperature influence, i.e., different compounds were affected by variations in pH and temperature.

Conclusions
DOC concentration and pH are strongly correlated with DOM molecular composition. Because of the covariance of both parameters, their individual relationship to DOM composition was discerned for a limited number of molecular formulae. To fully separate the influence of DOC concentration, pH, vegetation and other parameters, future studies should consider a larger number of samples along independent pH and DOC concentration gradients. In addition, future studies should address a wider range of different environments. Our study design did not explicitly allow the identification of vegetation influences on DOM composition. However, an astonishing similarity of results from a previous study in Yenisei tributaries and this study in German soil pore waters is strong evidence that the identified pH dependence is universal. Even on the small pH range in the soils, the relationship between pH and DOM molecular composition became evident. We propose reduced microbial activity at low pH as the reason for the observed trends. We also observed a higher abundance of polyphenols (such as tannin) at low pH that may be vascular plant-derived. These compounds may adversely affect microbial metabolism and lead to the immobilization of nitrogen-containing compounds, which in turn would explain the observed distribution pattern of nitrogen compounds.
We found a seasonal trend in DOM composition in the soils that was consistent with the latitudinal trend in the Yenisei study. In particular, the total number of DOM compounds and the molecular unsaturation decreased in the course of the growing season in the soils, and from North to South in the Yenisei tributaries. We propose that microbial activity was slowed down during colder periods, causing an accumulation of compounds during winter. This accumulation could explain the greater molecular diversity of DOM. On a structural level, this is expressed by a higher abundance of aromatic and unsaturated compounds from plant-derived material in March (and in the North of the Yenisei), changing to more saturated material of microbial origin in May and November (and in the South of the Yenisei). The oxidation state of DOM (NOSC) also varied systematically along DOC, pH and temperature gradients, indicating that DOM is potentially more reactive during colder periods and in regions of low pH. This supports our conclusion that low temperature and low pH cause reduced biological degradation of DOM.
Overall, the findings support our hypothesis that abiotic factor such as pH and temperature dominate the molecular composition of DOM across biomes and different types of aquatic systems. Together with our recent Yenisei transect study, this indicates that basic environmental parameters, such as pH and temperature might be key controlling factors in the carbon cycle.  Table 1) are highlighted with grey bars. (EPS) S1 Table. List of measured mass, exact mass, number of C, H, N, O and S atoms, and relative intensities for each measurement (XLSX) S2 Table. List of measured mass, exact mass, number of C, H, N, O and S atoms, and relative intensities for each measurement of deep sea DOM reference sample replicate measurements that were used to describe peak reproducibility. The determined limit of detection for the group of four samples (LOD Group ) was 7.90 Ã 10 -5 and the limit of quantification for the group of four samples was set to 2 Ã LOD Group . All peaks with a relative intensity < 2 Ã LOD Group were set to 1.5 Ã LOD Group (1.19 Ã 10 -4 ). (XLSX)

Supporting Information
S3 Table. Molecular formulae information for 'only DOC "': List of measured mass, exact mass, number of C, H, N, O and S atoms, double bond equivalents (DBE) and aromaticity index (AI, [53]).  Table. Molecular formulae information for 'only pH "': List of measured mass, exact mass, number of C, H, N, O and S atoms, double bond equivalents (DBE) and aromaticity index (AI, [53]). (XLSX)