Longitudinal wastewater sampling in buildings reveals temporal dynamics of metabolites

Direct sampling of building wastewater has the potential to enable “precision public health” observations and interventions. Temporal sampling offers additional dynamic information that can be used to increase the informational content of individual metabolic “features”, but few studies have focused on high-resolution sampling. Here, we sampled three spatially close buildings, revealing individual metabolomics features, retention time (rt) and mass-to-charge ratio (mz) pairs, that often possess similar stationary statistical properties, as expected from aggregate sampling. However, the temporal profiles of features—providing orthogonal information to physicochemical properties—illustrate that many possess different feature temporal dynamics (fTDs) across buildings, with large and unpredictable single day deviations from the mean. Internal to a building, numerous and seemingly unrelated features, with mz and rt differences up to hundreds of Daltons and seconds, display highly correlated fTDs, suggesting non-obvious feature relationships. Data-driven building classification achieves high sensitivity and specificity, and extracts building-identifying features found to possess unique dynamics. Analysis of fTDs from many short-duration samples allows for tailored community monitoring with applicability in public health studies.


Introduction
Wastewater sampling presents a means to monitor the general health [1], chemical exposure [2,3], and size [4] of a population in a rapid and noninvasive manner. Many studies have been performed at wastewater treatment plants (WWTPs), as these sites are relatively easy to sample, and yield aggregate information on large populations, e.g. from an entire city [2]. As an example of the correlations that can be captured by such studies, an increase in antipsychotics, antidepressants and other therapeutic drugs was observed in wastewater between 2010 and 2014 in Athens, Greece during a time of significant economic turmoil in the country [5]. Given the proper sampling and analysis methods, wastewater can provide meaningful, community-specific public health information.
Most wastewater epidemiology and metabolomics studies have focused on the aggregate chemical load on large populations, typically using targeted metabolomics to acquire highly sensitive, context-specific information about select small molecules present at WWTPs. Cityand country-wide studies have focused on monitoring licit [5,6] and illicit drugs [7][8][9], including sports doping agents [10,11], and have used these results to estimate public drug consumption [12]. Targeted applications have ranged from monitoring stress-related molecules [13], plasticizers [3,14], and pesticides [2] to metabolites associated with alcohol [15] and tobacco [16]; general population biomarkers [17]; and the environmental release of pharmaceuticals [18]. In addition to chemical identification, wastewater metabolomics can also be used to estimate population size [19,20]. However, aggregate analysis of large populations potentially misses public health-relevant information on temporal dynamics and sub-population characteristics.
There are multiple ways to incorporate temporal information in wastewater metabolomics that depend on sampling methods and location. One common route is to collect many composite samples, often over 24 hours, via extended continuous sampling and typically at sites of aggregated wastewater [4,21,22]. This route is logistically easier, as it only requires a single site when using WWTP-based sampling. However, the temporal component provides an averaged signal, even with multiple single day composite samples. An alternative, the approach we take here, is to perform close-to-source, short continuous sampling (or periodic grab sampling) without combination [23]. The second route often requires multiple locations and high sampling frequency (hourly to daily or near-daily), necessitating a large number of samples. Extensive sampling is required to alleviate the problem of signal noise and stochasticity associated with short sampling of small populations. However, a benefit of this approach is that it provides a precise temporal snapshot of the molecules present at a given time and, with longitudinal samples, temporal dynamics with minimal signal averaging. This sampling may provide information on individual contributions to community wastewater, and when compositional shifts or chemical exposure events occur.
We conducted a multi-month, untargeted metabolomics analysis of wastewater from three individual buildings to understand the information contained in longitudinal sampling of small populations. To study the temporal resolution needed to characterize a small population, we performed direct, intermittent sampling over three months from a multipurpose-use building (Building 1) and two residential buildings (Buildings 2 and 3). While through-time statistical values (mean intensities) of the features showed minimal differences between buildings, analysis of feature temporal dynamics (fTDs) uncovered extensive differences. Temporal feature clustering and modeling, internal to each building, revealed numerous groups of shared fTDs that often displayed random but large intensity fluctuations. These dynamics would likely be unobserved or averaged out using alternative sampling approaches. Similarity of fTDs suggested links from putative metabolites to unknowns as well as between features with drastically different mz and rt values, both within and across buildings. To extract additional building-distinguishing information, we trained interpretable machine learning (ML) models using daily feature profiles. Extending and generalizing our analysis methods, we found additional fTDs that correlated with those of select putative molecules, suggesting features for follow-up analysis.

Traditional statistical approaches do not capture the full temporal differences between buildings
Feature summary statistics (mean and standard deviation through time of feature ion intensities) provide a simple method to conceptualize and coarsely categorize feature stability. This stratification allows one to triage the features according to the research question of interest. Using this approach, we identified stable and unstable features that were generally similar between buildings. Further subcategorization and analysis of the unstable features revealed unique day-to-day dynamics, suggesting that summary statistics do not fully capture temporal dynamics that are essential components of a small population's wastewater metabolome.
Longitudinal multi-month sampling allows for temporal variation-based feature grouping. We observed two distinct groups of features during multi-month sampling of three spatially close buildings: temporally stable, and temporally unstable. Sampling occurred over three months, with the most dense sampling (multiple times per week) occurring in the first 3 weeks, followed by sampling approximately 1-2 times per week for the remainder of the period (Fig 1A). Liquid chromatography mass spectrometry (LCMS) produced 1425 features. 363 features were putatively identified at minimum reporting standard (MRS) [24] level 2 (primary mass matched to database and secondary mass spectrum matched to in silico fragmentation spectrum), 257 only at level 3 (primary mass matched to database), while most (805) were unannotated (level 4). The features were separated by through-time standard deviation; analyzing the distribution of standard deviations, two peaks were observed (one at~1 and the other > 2, S1 Fig) for which stability cutoff values were set to separate these cases; values < 2 were considered stable, and � 2, unstable. For both categories, the majority of features could not be annotated with a database chemical class, subclass or direct parent; apart from this, many different and generally low metabolite count subclasses were observed, with the most prevalent being: amino acids, peptides and analogues, carbohydrates (and conjugates) as well as fatty acids and conjugates (Methods, S2 Fig). Of the stable features, 53% were stable in all three buildings (Fig 1B). While only 34% of the unstable features were unstable in all buildings, with large fractions uniquely unstable in one or two buildings ( Fig 1B).  Untargeted metabolomics revealed numerous putative human and human-associated small molecules, both in the stable and unstable categories, that displayed a range of through-time mean intensities, standard deviations, and building-to-building variability. We observed several chemicals and common metabolic products of human activity, including metabolites from caffeine metabolism (xanthine-based metabolites [25]), dietary tryptophan processing (indoxyl sulfate [26]) as well as expected urinary metabolites (urate [27], phenylacetylglutamine [28], and 2-hydroxyethanesulfonate [29]- Fig 1C). The majority of these metabolites were stable in all three buildings, with the exception of indoxyl sulfate, which was unstable across all three buildings. Putative drug-related metabolites displayed substantial feature instability and primarily consisted of acetaminophen metabolites plus possible drugs for restless leg syndrome (ropinirole) and low blood pressure (midodrine). Putative artificial sweeteners (acesulfame and saccharin) appeared stable across the three buildings, but chemicals naturally found in humans as well as in many health and cosmetic products (pantothenate (vitamin B5) and its precursor pantothenol- Fig 1C) displayed high variability. A large number of the drug-and cosmetic-related features appeared unstable in many buildings, particularly in Building 1.
Through-time statistical analysis suggested that individual features often appeared at similar mean intensities for all three building-to-building comparisons-especially the stable features. Linear relationships were observed for feature intensity comparisons between all buildings (R = 0.95, 0.92 and 0.96 for the Building 1-to-2, 1-to-3 and 2-to-3 comparisons, respectively- Fig 1D). While these high correlation coefficients were calculated using all features, the unstable features were more dispersed, and minimally correlated between buildings. Several unstable features appeared at low intensities in Building 1 relative to the other two, including 4'-N-desmethylolanzapine (a benzodiazepine) and two acyl-glycines among others (Fig 1D). Outside a lack of specific prescription drug use, it is challenging to explain such differences in intensities. Aside from these few cases and assuming consistent ionization across samples, these data suggest that many features exist at comparable average concentrations in buildings with different populations.
Statistical analysis between weekends and weekdays only uncovered statistically significant features when all weekdays were included. Comparing features from only Saturdays to Wednesdays using false discovery rate (FDR) corrected Mann-Whitney U-test (MW) P-values (Q-value), no features with Q < 0.05 were observed in any buildings (S3 Fig). However, including the remaining weekdays, Building 1 displayed hundreds of significant features while the other two buildings possessed no significant features. For these significant features, the largest subclasses included: amino acids, peptides and analogues, fatty acids and conjugates along with carbohydrates and their conjugates. However, the majority of features were not annotated or possessed no chemical class annotation.
Few temporally stable features are statistically different between buildings using a stringent cutoff. Only 12 stable features were present at significantly different mean levels between the three buildings (FDR corrected Kruskal-Wallis, KW, P-value < 0.00001). This corrected P-value was chosen to analyze only the 'most significant' features between buildings; however, a range of values were explored (for all features, stable features, and unstable features, S4 Fig). This cutoff may be altered for studying different communities. Of the 12 significant features, 9 were observed at higher levels in Buildings 2 and 3 relative to Building 1 (Fig 2A).   Table). Unlike manually searching for specific metabolite types, statistical analysis found the urine metabolite 2-hydroxyethanesulfonate and sweetener saccharin among the significant features. Chemical classification proposed that the two level 3 identified features corresponded to a isoquinoline and a sulfonamide. Few commonalities were observed between these features as they appeared across a wide range of mz, rt, and intensity values.
Unstable features can be further split into three dynamics-based classes. Temporal analysis of the unstable features suggested three general temporal profiles, providing a more fine-grained classification, and a means for conceptualizing their dynamics (Fig 2B-2D, S3 Table). Class 2 unstable metabolites possessed through-time intensity standard deviation levels greater than 3.5; this cutoff was used to separate the unstable features in the second peak of the standard deviation distribution from those with even higher variability ( Fig 2B and S1  Fig). Representative examples of Class 2 include indoxyl sulfate and the sulfate-modified neurotransmitter-derived metabolite 3-methoxy-4-hydroxyphenylglycol ( Fig 2D). Class 1 contained features with a through-time median intensity less than 14 (after removal of the Class 2 features), this value separated the main group of unstable features from those with low overall intensities ( Fig 2C and S1 Fig). This class included metabolites that are typically observed at low levels in wastewater, but that are occasionally present at high levels. Class 3 consisted of the remaining features possessing mid-level log intensities with irregular intensity changes. Class 3 included both a ferulic acid sulfate and putative glycine conjugated cholic acids ( Fig  2D). Chemical classification of these three temporal profiles found the majority to be unannotated with the remainder, generally, binned into classes, subclasses and direct parent groups with single membership, suggesting no increased prevalence of any one chemical class (S5 Fig). This scheme of subclassification may be transferred to alternative data, for which the specific cutoff values should be re-tuned. Temporal analysis departs from using a single statistical parameter to describe a time-series; using only three simple groups highlighted different feature temporal dynamics (fTDs)-this prompted a more comprehensive dynamics analysis.

Characterizing and modeling the dynamics of individual buildings with clustered fTDs
Temporal analysis supplied a more nuanced view of the features, and thus buildings, than general stability types. To characterize the extent of temporal dynamics, we used unsupervised clustering which revealed a large diversity of feature dynamics in a community's wastewater metabolome. To understand whether one sampling period could be predictive of another for the same building, we fit clusters with a Gaussian process (GP), the results of which suggested that dynamics, particularly large deviations from the mean, cannot be predicted on days not sampled. Such modeling highlighted the importance of frequent temporal sampling. Differences between buildings were further highlighted by focusing on the temporal profiles of putative classes of molecules related to metabolism and lifestyle.
Clustering fTDs uncovers groups of features with highly similar temporal dynamics. K-means clustering of individual building features displayed several prominent families that differed between buildings. One hundred clusters were used to group z-normalized features, providing highly similar intracluster dynamics, mostly within the range of -1 to 1 (Fig 3A). The high temporal sampling revealed that many clusters exhibited sudden, single-day spikes or drops in intensity (Fig 3A, dark blue and red regions). Additionally, many days displayed similar intensity patterns across-clusters; for example in Building 1, on October 6 th the top 8 clusters showed similar z-normalized intensities (not a comparison of the absolute intensities), and on Saturday, December 16 th the majority of clusters demonstrated a general drop in normalized intensity for many features. When comparing across the three buildings, no obvious trends were observed in terms of mean cluster dynamics, which varied between buildings. The majority of clusters in all buildings were composed of features spanning a large range of mz and rt values ( Fig 3A purple and green columns). Several of the largest clusters were primarily made from features lacking annotation (Building 1: clusters 2 and 5; Building 2: clusters 1 and 2; Building 3: clusters 1 and 3, S6-S8 Figs). The remaining top clusters were composed of different chemical classes and subclasses, the majority of which possessed one or two counts with no clear (sub)class preferences.
GP cluster modeling displayed minimal day-to-day predictive power, and mean reversion, suggesting that high frequency (daily) sampling is important for capturing the observed unpredictable alterations. As a previous 24-hour metabolomics analysis [23] demonstrated strong diurnal patterns in wastewater, we used a small length scale parameter for the GP. Using a conservative 3-hour length scale, information decayed within hours following perturbation, providing no day-to-day predictive power (Fig 3B). Modeling the observed data mixed with additional theoretical buildings (i.e., sampling further downstream or with longer sampling periods, S1 Text for methods), demonstrated that for feature clusters, the probability of observing large deviations from the mean drops with only tens of additional buildings (S9 Fig). These results highlight the importance of short, high-frequency, and close-to-source sampling; this indicated that our own sampling scheme likely missed large wastewater dynamics.
Frequent sampling allows for tracking dynamics of human-related metabolite groups. The three buildings displayed different dynamics for specific human and lifestyle-related putative metabolite groups. The groups studied included glucuronide-modified compounds [30], caffeine-related metabolites [25], biologically modified acetaminophen [31], along with glucoside conjugated molecules (Fig 4, S10 Fig, and S4 Table for full names). These human-associated putative groups possessed diverse temporal dynamics in each group and building. For instance, many of the features showed large changes across buildings; however, the days on which the specific dynamics occurred differed for each given putative metabolite. Within each building, select putative metabolites from each group displayed similar temporal dynamics, perhaps due to similar biochemical processing (e.g. different metabolites of acetaminophen). However, not all temporal profiles in a group were always similar, for instance the October levels of many glucuronides and caffeine metabolites displayed pronounced differences. While the ability to identify additional features is required for larger, targeted chemical tracking, this analysis highlighted the potential of high-frequency wastewater sampling to monitoring groups of health and lifestyle-related compounds.

Temporal data draws new feature correlations
Analyzing individual fTDs within and between buildings allows comparison of building-tobuilding similarity or lack thereof. To do so requires calculating feature similarities using through-time distance measurements or correlations. These methods reveal features and clusters that are correlated and others that are anticorrelated, something not necessarily possible with single time point measurements.
Different buildings show few similar fTDs while many are correlated within a building. A large number of highly similar fTDs were observed internal to buildings but almost no similarities were observed between buildings. To study the similarity between z-normalized time series, we analysed feature pair time series at different Euclidean distance cutoffs (example distances shown in Fig 5A). We chose two similarity cutoff thresholds; 1.5 for high stringency and 2.82 for lower stringency similarity. These values can be tuned; here Building 1 primarily set the two thresholds, as a distance of 2.82 included a large 9.6% of pairs, while the 1.5 cutoff included 1.1% (S11 Fig). Buildings 2 and 3 both showed lower percentages for these cutoffs, with each generally possessing fewer similar time series and an increased mean distance for all pairs. A histogram of distances between all time series (an all-to-all comparison), displayed a greater fraction of similar time series within each building than between buildings, for which there were few similar time series even up to a Euclidean distance of 2.82 (0.70%, 0.36% and 0.25% for B1-B2, B1-B3 and B2-B3 comparisons respectively, Fig 5B, S12 Fig). Building 1 showed more intrabuilding similar time series than buildings 2 or 3, likely due to the decrease in intensity of hundreds of features on the final day, which would dampen the znormalized intensities for other days (Fig 2D). The high intrabuilding zero-distance bin primarily corresponded to distances calculated between a feature and itself, plus a small number of very similar features. For the different comparisons, the majority of similar time series belonged to pairs in which both features had high average intensities; few belonged to pairs in which one feature had low average intensity, possibly due to instrumental noise (S13 and S14 Figs). This observation suggested that high similarity between fTDs did not arise from comparisons to normalized background or noise features.  ���� 3-alpha-hydroxy-5-alpha-androstane-17-one 3-d-glucuronide / etiocholanolone glucuronide (B) caffeine-related, � 5-acetylamino-6-amino-3-methyluracil / 6-amino-5[n-methylformylamino]-1-methyluracil, �� 5-acetylamino-6-amino-3-methyluracil / 6-amino-5[nmethylformylamino]-1-methyluracil and (C) acetaminophen-related putative metabolites across the three buildings. Full metabolite details can be found in S4 Table. https://doi.org/10.1371/journal.pcbi.1008001.g004 Further analysis on the large number of similar fTDs within a building, revealed that many of the feature pairs (at the high stringency cutoff) possessed large differences in mz and rt ( Fig  5C, S15 Fig for complete range). A 2-D histogram of only those feature pairs that differed by at least 30 s in rt showed that many of the shared temporal dynamics differed by 30-100 s and 0-100 Da. A large number of feature pairs corresponded to mz and rt differences much greater than 100 s and 100 Da; overall, feature similarities were observed across the full mz and rt

PLOS COMPUTATIONAL BIOLOGY
Temporal wastewater analysis finds building-specific metabolite dynamics domains (S15A Fig). A comparable, but more populated, 2-D histogram was obtained with the low stringency similarity cutoff (S15B and S15C Fig).
The distance between a feature and its corresponding feature across buildings (a one-toone comparison) demonstrated that only a few features (36, 33 or 51) possessed similar temporal dynamics between buildings, even at the more inclusive, low stringency cutoff (Fig 5D). This analysis revealed that the majority of features displayed markedly different temporal dynamics, as the median distance was larger than five for each comparison.
Time series analysis revealed features and K-means cluster centers that are anticorrelated in time. Thousands of z-normalized feature pair time series (6750, 32453, and 7851 for Buildings 1, 2, and 3 respectively), some with large mass-to-charge ratio differences, possessed Pearson correlation coefficients less than -0.6 ( Fig 6A,   behavior with a coefficient < -0.6; the putatively identified reduced riboflavin was anticorrelated in buildings 1 and 3 as well as 2 and 3 (S18 Fig). In addition to individual features, feature cluster centers from the K-means clustering showed both positive and negative (< -0.6) correlation coefficients (Fig 6C). While a number of such correlations may be due to small fluctuations of stable features, this would suggest anticorrelated temporal profiles exist, representing a benefit of temporal measurements.

Machine learning classifies buildings and finds building-specific feature dynamics
Orthogonal to the temporal clustering approach, the complete feature profile of a single day provides a means for isolating community-specific feature information. Using an alternative objective-across-day building classification-with simple machine learning models, it is possible to extract features with unique building-to-building temporal patterns.
Single day feature profiles allow for building classification and find building-specific fTDs. L1-regularized logistic regression (L1-LR) and random forest (RF) models provided high classification performance and revealed building-differentiating features. We used a oneversus-the-rest approach for model training, for which each input corresponded to all feature values from a single day. Importantly, we used standardized log-transformed features intensities calculated across all three buildings combined, not individual temporal z-normalized values. Receiver operating characteristic (ROC) area under the curve (AUC) analysis demonstrated high building classification AUC for all three comparisons with L1-LR models (0.946 ± 0.083 mean and standard deviation of 50-fold repeated model training, Fig 7A). The RF model did not perform as well (AUC = 0.906 ± 0.065, S19 Fig), but provided additional insight from the set of features used. Additionally, in line with statistical analysis, we found it possible to differentiate weekdays from weekends in Building 1 using an L1-LR model (AUC = 0.760 ± 0.180) but not in Buildings 2 and 3 (AUC = 0.484 ± 0.179 and 0.410 ± 0.233 respectively).
Features that were used in at least 40 of the independent L1-LR models and that possessed an average importance value greater than 0.005 across all 50 RF models demonstrated unique building dynamics. Four such feature time series are depicted in Fig 7B, of all which have been highlighted by alternative methods. The urine-related metabolite 2-hydroxyethanesulfonate and pantothenol both displayed lower ion intensity levels in the multipurpose-use Building 1 than in the residential buildings; similarly 6-keto-decanoylcarnitine, a urine metabolite used in non-muscle invasive bladder cancer diagnostic models [32], was mostly absent in Building 1 but appeared at high levels in the other two ( Fig 7B). Such differences, for many features, in Building 1 relative to 2 and 3 may explain the ease of Building 1 classification. Beyond these four, many of the important metabolites were either stable and statistically significant between buildings, or possessed alternative unstable metabolite class labels (S5 Table). While carboxylic acids (their derivatives) and fatty acyls were the largest named classes, along with amino acids, peptides and analogues being the main subclass, the most prevalent group was that lacking any classification (S20 Fig). This minimally biased modeling, largely recapitulated the findings of traditional statistical and temporal analyses, and suggested metabolites that through subsequent temporal analysis were shown to possess unique and building-differentiating dynamics.

Grouping temporally similar features suggests targets for follow-up studies
We extracted additional features that were temporally related to the set of metabolites identified by our analyses. We grouped additional features temporally similar to each of the select metabolites for all three buildings, and analysed between-building, feature-pair cluster co-occurrences. This directly suggested features, and possibly hypotheses, for specific follow-up experiments, and may comment on shared controlling processes (chemical, biological, etc.) that govern feature dynamics.
Intrabuilding temporal similarity and interbuilding co-clustering link possibly unrelated features. Analyzing the 'important features' (IFs, S2-S8 Tables) highlighted by our methods, we found that there were numerous, likely unrelated features within buildings that were highly correlated to the IFs. The IFs included the ML model features, putatively named metabolite groups, select unstable features, statistically significant stable features, and the urine, drug, sweetener, and cosmetics-related features. Metric multidimensional scaling (MDS) of the groups of features sharing fTDs with the IFs showed varying levels of clustering and co-clustering (Fig 8). Because many features were shared among multiple clusters, but only assigned to the largest (see Methods), many groups displayed overlap in this two-dimensional space. Similar to other methods, MDS revealed that many features and putative metabolites with large differences in mz and rt values grouped with some of the most prominent IFs, many of which are believed to be human-related (Fig 8D). The clusters suggested unknown features that may originate from the same source, for which additional analysis is needed for identification.
Between buildings we observed many co-clustered sets of features, despite most co-clustered features possessing different dynamics in each building. Using the intrabuilding Kmeans clusters, we found intersecting sets of co-clustered features between pairs of buildings (S21 Fig). Again, this revealed groups of features with substantially different mz and rt values and that co-clustered in either two or all three buildings. The co-clustering of specific feature groups across buildings, despite building-specific temporal dynamics (Fig 5D), lends additional credence to the observed associations. Co-clustering across buildings, along with MDS analysis, highlighted numerous, seemingly unrelated features that displayed similar temporal dynamics, suggesting specific compounds or unidentified mz and rt pairs for follow-up analysis. Thus fTD analysis may further our understanding of specific chemicals (e.g. pesticides or

PLOS COMPUTATIONAL BIOLOGY
Temporal wastewater analysis finds building-specific metabolite dynamics drugs) by suggesting other small molecules that would otherwise appear unrelated, but that may have been introduced into the waste stream by similar controlling processes.

Discussion
Previous temporal studies of wastewater metabolomics have examined seasonal variation [33] or larger, aggregated populations at multiple time scales (often 24-hour aggregates), using downstream wastewater treatment plant (WWTP) sampling [16,[34][35][36]. Here, we demonstrated the utility of short duration, high frequency (daily) direct building sampling to understand through-time statistical properties, individual feature temporal dynamics (fTDs), and clusters of temporally related features of single-building wastewater. Adding this temporal component, without substantial wastewater aggregation, may benefit future wastewater metabolomics studies by revealing community-specific metabolite dynamics and the daily burden of environmental pollutants, drugs, or lifestyle-related small molecules. Longitudinal sampling uncovered highly dynamic and unique building profiles that would be lost with WWTP-based sampling [23] or by sampling over longer time periods. For instance, the observation of significantly different features on weekdays versus weekends in multipurpose building 1 are likely reflective of a five-day workweek with a lower weekend occupancy, something not observed for the residential buildings. Along with building-specific information, we presented a series of methodological techniques that provide orthogonal and overlapping information for the analysis of longitudinal untargeted wastewater metabolomics. We highlight several main findings: the importance of temporal data collection, the utility of untargeted metabolomics for community monitoring, data-driven methods for information extraction, and the importance of direct, building sampling.
Temporal data acquisition, in combination with clustered feature modeling and time series comparisons, provides information not available from stationary statistical properties alone. We note that our sampling was at times sparse and irregular, a fact that hinders a more in depth analysis of patterns at various time scales; nevertheless, with the time series available it was still possible to observe benefits of longitudinal sampling. fTDs show that features may possess similar through-time mean intensities in different buildings, but with different temporal dynamics. These building-specific fluctuations may provide information on health-related events, given additional chemical identification. In light of the high level of feature intensity changes and subsequent mean reversion, it is likely that this study was not sampled at a high enough frequency for much of its duration. Additionally, replicate instrumental measurements were taken for only a few days; however, to better uncover feature intensity changes, multiple replicates at all time points are critical, as this lessens spurious intensity values from an already minimal wastewater sample that could represent a few individuals. As expected under the assumption that different individuals are generating waste within and across days, autocorrelation drops to near zero, even with a time lag of one, for the five largest clusters in each building. This observation indicates minimal information transfer between time points and agrees with the results of the Gaussian process modeling (S22 Fig). Frequent sampling may help explain select feature dynamics. For example, recreational drugs may be used at higher frequency on weekends rather than weekdays; thus, one might expect higher intensities on weekends. Further, the lack of differentiating features for Buildings 2 and 3 between weekdays and the weekend highlights the importance of temporal measurements; because large changes may be observed on any day, a statistical or aggregate approach would not properly capture these processes. This suggests that future studies should sample daily and-in certain circumstances-multiple times per day, requiring additional device engineering for fully automated sampling, unlike the current manual process. Finally, the importance of short, high-frequency sampling can be seen in our theoretical waste stream mixing analysis, which suggested that much of the dynamics information is lost with the addition of only a few additional waste streams.
An unanticipated finding from this temporal analysis was that numerous features display highly similar temporal dynamics within buildings. While many of these feature pairs correspond to different adducts or isotopes, a large number appear to be attributable to different metabolites, as their masses and retention times can differ by hundreds of Daltons and seconds (Figs 5C and 8 and S15 Fig). Specifically, focusing on highly correlated fTDs offers a means to discover new molecules (or features) linked to other molecules, events, or processes with known sets of features. Whether or not all of these features correspond to real metabolites, this information is readily supplied by fTDs and suggests perhaps non-obvious connections. In short, this approach may act as a hypothesis-generating method while also providing information about daily metabolite usage or exposure.
Untargeted metabolomics represents an information-rich method to monitor a community's health and behavior. We putatively identified several human-related metabolites, most notably related to drugs, cosmetics, and food. For these, we observed that many features displayed similar intensities through time, but that their stability was frequently different across buildings. An unexpected observation was that most features possessed similar mean intensities between buildings, resulting in only a 12 being statistically significant with our stringent cutoff. Yet this small number appeared to provide important, building-specific information (Fig 2A). Such differentiating information may reflect the number of individuals using the toilet during the sampling period. While providing the potential for significant public health information, this method is limited by the ability to chemically identify each of the features observed. In addition, the use of only a single MS ionization mode and LC column type prevented a more complete report on the small molecule output of the buildings. These limitations warrant additional studies to expand feature-to-metabolite naming along with the use of select standards to validate putative metabolites.
A data-driven approach, based on classifying buildings using the features of a single day, recovers many of the features identified as important in other types of analysis, but also provides additional metabolites and features for follow up. Importantly, it extracts information relevant to each of the buildings in a minimally biased manner. The machine learning (ML) models we used found features in a manner complementary to the other presented methods, and demonstrate that it is possible to classify which building generated a specific, single-day waste profile. In addition to finding many of the 12 statistically significant stable features, the models also found features that belong to multiple classes of metabolite dynamics. For instance, 6-keto-decanoylcarnitine was important for building classification and was found to be unstable in Buildings 1 (Class 1) and 2 (Class 2), but stable in Building 3 (S5 Table). Our methods may prove useful for future temporal studies with the specific aim of public health monitoring. For instance, given data from well characterized control buildings and a new building of interest, using ML models and fTD clustering may help identify and track the dynamics of compounds in target communities.
Although direct building sampling was not the specific focus of this study, it was critical for this work, and our findings support the applicability of this technique for community-specific wastewater epidemiology. Most small-population studies have focused on targeted methods for measuring various drugs, with minimal temporal information [10,37]. Our recent 24-hour study likewise demonstrated the utility of upstream sampling, but of larger populations [23]. Single-building sampling minimizes the amount of time-and thus sample degradationbetween sample generation and collection. This may address a potential source of uncertainty and error in WWTP-based measurements of population size or monitoring of illicit drug consumption [38][39][40][41]. Likewise, direct building sampling bypasses the issue of wastewater mixing or of occasional septage pumping into WWTPs, which may obfuscate fTDs or bias monitoring [1]. Thus, applications of the presented sampling method may include estimating population sizes and per capita feature values, and monitoring sporadic features.
High frequency, close-to-source sampling may, however, pose an ethical quandary. As the size of the population decreases, so does the anonymity of the results. For this reason, community-specific research must be conducted in such a way that personal health information remains confidential, and minimal negative consequences are experienced by the community under study [42].

Sample collection
Samples were collected from street-level manholes located outside of three buildings: one multipurpose-use building (Building 1), and two residential buildings (Buildings 2 and 3). We used a commercial peristaltic pump (Boxer) to continuously collect wastewater samples for 3 hours starting from 9:00 AM for Building 1 and 8:00 AM for Buildings 2 and 3. The peristaltic pump was programmed to pump wastewater at a rate of 5.55 mL/min over a 3-hour period into a 1 L polycarbonate bottle (Thermo Scientific) stored on ice, for a total volume of 1 L of wastewater. 100 mL of each sample were then filtered separately through a 0.2 μM PTFE membrane filter (Millipore) using a glass filtration apparatus (Glassco) to remove bacteria and debris. All filtration glassware and polycarbonate bottles were acid washed with hydrochloric acid and autoclaved prior to filtration. The filtrate was collected in amber glass vials, the pH was adjusted to between 2 and 3, and stored at -80˚C, all in less than 2 hours post sampling. Additional days for which only select building data was obtained were included only for intensity, stability and statistical analysis as well as classification and include:

Liquid chromatography-mass spectrometry
10 μl of sample was analyzed via LCMS using a Vanquish ultra-performance liquid chromatography system coupled to an Orbitrap Fusion Lumos (Thermo Scientific) via a heated electrospray ionization (ESI) source. Data was collected in negative ionization mode with datadependent secondary mass spectra (MS/MS) obtained via high-energy collisional dissociation (HCD, mass resolution 15,000 and collision energy of 35 arbitrary units, automatic gain control, AGT, of 5.0e4 and max injection time, IT, of 22 ms). The full MS resolution was 120,000 at 200 mz with an AGT target of 4.0e5 and a maximum IT of 50 ms. The quadrupole isolation width was set at 1.0 m/z. ESI was carried out at a source voltage of 2600 kV for negative ion mode with a capillary temperature of 350˚C, vaporizer temperature of 400˚C, and sheath, auxiliary, and sweep gases at 55, 20, and 1 arbitrary units, respectively.
Chromatographic separation was performed on a Waters Acquity HSS T3 column (2.1 × 100 mm, 1.8 μm) equipped with a Vanguard pre-column and maintained at 40˚C. The column was eluted with (A) 0.1% formic acid in water and (B) 0.1% formic acid in acetonitrile at a flow rate of 0.5 mL min -1 . The gradient started at 1% B for 1 min, ramped to 15% B from 1-3 min, ramped to 50% from 3-6 min, ramped to 95% B from 6-9 min, held until 10 min, ramped to 1% from 10-10.2 min, and finally held at 1% B (total gradient time 12 min). Run order was randomized over two batches of samples with pooled quality control samples run intermittently (every 6 or 7 samples) along with MilliQ water blanks to account for the general background of solvent system and mass spectrometer.
All features were binary log transformed, after which the standard deviation was calculated. Features with pooled sample coefficient of variance in excess of 0.3 were removed from further analysis. The data also was corrected for run order using the local (two closest run order-flanking quality control, QC, samples) and global (all) QC feature values where the normalized feature intensity was calculated with the following formula: Here X 0 corresponds to the corrected value, X the input feature value, R the global average of the feature over all QC sample and C the local feature QC average. All samples were corrected in this way. All samples were blank subtracted (mean blank intensity for each feature) and resulting values less than 0 or missing values were filled with one half the minimum of the feature's intensity in a given building. Only features for which the cross building sum of log intensities was greater than 100 (S1 Text) were kept and days with replicates were averaged (B1: O4, O18, O21, N4, D2, D20; B2: O4, O18, O21, N4, D2, D20; B3: O4, O18, O20, O21, N4).

Putative metabolite identification
For ease of figure presentation when named features or lists thereof exceeded defined sizes, the name was replaced with its HMDB chemical subclass or direct parent for which each mz-rt tuple can be mapped to names in the corresponding supplementary table. Given only putative identification throughout, all names should be interpreted with caution.
Identification was automated using custom python scripts, outlined in the supporting information. It performed a primary mass-to-charge look up of the exact mass accounting for  [47], the Chemical Entities of Biological Interest (ChEBI) [48], and LIPID MAPS [49]. For this lookup, we report putative chemical names if the parts per million (ppm) error was � 5, this represented an identification level of 3. Following this matching, the named features with associated ppm error and adducts or isotopes were ranked according to a heuristic order of likelihood of being observed (see S1 Text for ordering). Given equally ranked adducts or isotopes, priority was given to the chemicals with the lowest ppm error. The second part of the naming was to perform in silico fragmentation matching using MetFrag [50,51] on all ions with secondary mass spectra (MS2) that were extracted during the initial XCMS feature extraction. For this, individual MetFrag programs were run in parallel on MS2 scans (on up to three different MS2 spectra for the precursor ion), for all of the following The MetFrag outputs for each parent ion were combined and ranked in order of MetFrag probability, these names were then matched to the primary mass-named metabolites with matches being given the highest ranking in terms of metabolite identification (level 2). Following naming, some putative labels were removed, specifically those with select elements, polymers or 'R-groups' (S1 Text and S1 Table).

Chemical class mapping
Each named feature was matched to its corresponding feature in one of the following parsed databases: HMDB, LIPID MAPS, ChEBI or MetaCyc. From this initial search, if not in HMDB, the name, Inchi string, InchiKey, HMDB ID and HMDB accession numbers were extracted if available for the compound. These were each used to look up the compound in the HMDB database from which class, subclass and direct parent values were obtained. If no label was available at the specific taxonomic level, the next available, higher class name was used. Chemical taxonomy from different databases was not mixed. If no class was found for any putative names, the feature was counted as having a putative annotation but no classification ('putative annotation no class' label in plots). The only exception is the isoquinoline and sulfonamide in Fig 2A for which the closest parent node in the ChEBI ontology was used as they could not be found in HMDB.

Statistical analysis
Multiple comparison tests of feature intensity differences between more than two buildings were done with the non-parametric Kruskal-Wallis test followed by Benjamini-Hochberg FDR correction (scipy.stats.kruskal, statsmodels.stats.multitest.multipletests). For stationary statistical significance an FDR corrected P-value of less than 0.00001 was used. Linear regression on mean feature values and the associated correlation constants were extracted using scipy.stats. linregress.

Feature time series summary statistics
Through-time mean log intensity and standard deviation values were calculated for all features in each building individually. Features with a standard deviation � 2 were labeled unstable, and stable otherwise. Given these labels, building overlap analysis was performed using venn3 and venn3_circles (matplotlib_venn) for which all building overlaps were retained. To determine feature color in Fig 1D, a feature was considered to be stable between the compared buildings if it was stable in either (colored blue).

Temporal feature profile analysis-building clustering
All features were z-normalized through time (temporal mean subtracted and divided by the standard deviation) and each building was clustered by K-means clustering (sklearn.cluster. KMeans) using 100 clusters with all other parameters kept to their default values. Each feature was mapped to its corresponding mz and rt value for plotting as well as being grouped by cluster size. Mean cluster values were extracted for each of the clusters and used to train a GP regression model (sklearn.gaussian_process.GaussianProcessRegressor, alpha = 0.0001) using a radial basis function kernel with a 0.125 length_scale parameter mixed linearly with a constant kernel with noise corresponding to the mean standard deviation of elements of the cluster (sklearn.gaussian_process.kernels.RBF and .ConstantKernel respectively).

Temporal feature profile analysis-inter and intra building feature analysis
Time series distances were calculated via a Euclidean distance metric (scipy.spatial.distance. euclidean). All-to-all feature distance matrix calculation within and between buildings was performed using scipy.spatial.distance_matrix while the one-to-one distance of each feature to the corresponding feature in the other building using the euclidean function. For the all-to-all, feature pairs that met the similarity cutoff of either 1.5 or 2.82 were kept and further analyzed for both rt and mz difference between features. Pearson correlations were calculated using the .corr() method of Pandas DataFrame objects. Ward hierarchical clustering was performed using the linkage function from scipy.cluster.hierarchy.

Machine learning model training and analysis
L1-LR and RF models were built to predict which building a single day-feature profile belonged to (sklearn.ensemble.RandomForestClassifier, and sklearn.linear_model.LogisticRe-gressionCV with the 'saga' solver). To gain statistics on model performance, each was trained 50 times on random, full data shuffles (sklearn.utils.shuffle). For each data shuffle, the days were randomly split 75:25 into train and test splits respectively. The log ion intensity data training split was standardized (sklearn.preprocessing.StandardScaler), and the test data transformed. Internal cross validation (3-fold) on the training split was performed internal to the LogisticRegressionCV class while the number of trees for the RF was set to 1000 requiring no cross validation. Following training, all models were evaluated on the fully held out test set. ROC-AUC analysis was performed for each separate model (using sklearn.metrics.roc_curve and .auc) during the testing phase. For the 50 models built, the feature coefficients of the L1-LR models or the feature importances from the RF were averaged. To isolate unique building features, averaged features that had non-zero feature coefficients in at least 40 of the L1-LR models and an averaged feature importance in the RF models of > 0.005 were extracted.

'Important' feature and cross building analysis
Temporal similarity values were calculated between all features to the IFs (all features from S2-S5 Tables) in each building separately using a Euclidean distance metric. Many features possessed a distance < 1.5 to multiple IFs and were counted for the size of each IF's group. After calculating the size of each group, all features were then only assigned to the largest group they belonged to and those with sizes greater than 20 for Building 1 and 2 or 5 for Building 3 were input to metric MDS (sklearn.manifold.MDS).
To find features that co-clustered between buildings, the cluster memberships for all buildings were fully compared, including between all three buildings. If the set of features in the intersection of either 2 or 3 clusters (each from different buildings) was > 5, the cluster pair overlap of features was kept. A cluster pairing was only further analyzed if the difference in minimum and maximum rt was greater than 30 s. Supporting information S1 Table. Name components that caused a label to be discarded. If any of the following showed up in the name or chemical formula for a putative name for a feature, it was removed from consideration as a plausible name. (XLSX)   Table for complete details. � hinokitiol glucoside / (3r,4r)-4,8-dihydroxy-3-((r)-2-hydroxypentyl)-6,7-dimethoxyisochroman-1-one, �� methyl (3x,10r)-dihydroxy-11-dodecene-6,8-diynoate 10-glucoside / methyl 3,4-dihydroxy-5-prenylbenzoate 3-glucoside, ��� 5,6,7-trimethoxy-3-