Modeling geogenic and atmospheric nitrogen through the East River Watershed, Colorado Rocky Mountains

There is a growing understanding of the role that bedrock weathering can play as a source of nitrogen (N) to soils, groundwater and river systems. The significance is particularly apparent in mountainous environments where weathering fluxes can be large. However, our understanding of the relative contributions of rock-derived, or geogenic, N to the total N supply of mountainous watersheds remains poorly understood. In this study, we develop the High-Altitude Nitrogen Suite of Models (HAN-SoMo), a watershed-scale ensemble of process-based models to quantify the relative sources, transformations, and sinks of geogenic and atmospheric N through a mountain watershed. Our study is based in the East River Watershed (ERW) in the Upper Colorado River Basin. The East River is a near-pristine headwater watershed underlain primarily by an N-rich Mancos Shale bedrock, enabling the timing and magnitude of geogenic and atmospheric contributions to watershed scale dissolved N-exports to be quantified. Several calibration scenarios were developed to explore equifinality using >1600 N concentration measurements from streams, groundwater, and vadose zone samples collected over the course of four years across the watershed. When accounting for recycling of N through plant litter turnover, rock weathering accounts for approximately 12% of the annual dissolved N sources to the watershed in the most probable calibration scenario (0–31% in other scenarios), and 21% (0–44% in other scenarios) when considering only “new” N sources (i.e. geogenic and atmospheric). On an annual scale, instream dissolved N elimination, plant turnover (including cattle grazing) and atmospheric deposition are the most important controls on N cycling.


Introduction
There is a growing understanding of the importance of bedrock weathering as a source of nitrogen (N) to groundwater and river systems, particularly in mountain environments where weathering fluxes can be large [1][2][3][4]. Houlton et al. [5] recently estimated that [92][93][94][95][96][97][98][99][100][101][102][103][104][105][106][107][108][109][110] Pg N are stored in the top 1m of rock worldwide, and that 19-31 Tg N yr -1 are mobilized (i.e. made available via weathering) from near-surface rocks annually, nearly tripling previous estimates of global rock-derived N fluxes. In N-limited mountainous watersheds, bedrock-derived N may represent a majority of N available for plant and microbial use [6,7], particularly given rock-derived N can be weathered to a bioavailable form such as dissolved organic N (DON) or ammonium (NH 4 + ), which is readily utilized by plants and microbes [4,7]. However, our understanding of the relative contributions of geogenic N to the total N supply of mountainous watersheds and its contribution to dissolved N loads supplied downstream remains unclear.
One of the main limitations restricting our ability to quantify the fate and transport of mountain N is an absence of watershed-scale biogeochemical models that directly focus on high altitude regions, specifically incorporating hydrological, geological, biogeochemical, and climatic drivers relevant to mountain environments. The majority (>80%) of watershed N models have been constructed for application to agricultural systems [8], where riverine N loads can usually be predicted based on fertilizer application regimes [9,10]. In near-pristine mountain environments where N concentrations are considerably lower, and often limiting to ecosystem productivity [11], N mass balance is driven by a complex series of interacting drivers, including bedrock weathering [5,6], plant uptake, including direct organic N uptake [12,13], plant storage and release [14], the timing of spring snowmelt [15], changes to atmospheric deposition [16], denitrification [17,18], and erosion of particulate N off steep hillslopes and mountainsides [19,20]. Though there are models (e.g. INCA) that have been selectively adapted to predict N fluxes in mountain watersheds [21,22], our ability to predict spatiotemporally dynamic changes to mountain N concentrations lags behind agricultural systems, particularly in systems with potentially large sources of geogenic N, where information on mineralogy and weathering rates are required.
Calibrating models using 'end-of-pipe' stream nutrient measurements results in the possibility of equifinality, i.e. the occurrence of multiple parameter combinations that predict the same stream nutrient concentrations over time. At present, there is no satisfactory solution for both identifying and quantifying all possible calibration parameter combinations, due in large part to the inability to constrain reaction and transport rates representative of entire subwatersheds. Auto-calibration approaches such as Markov Chain Monte Carlo (MCMC) have been proposed to identify equifinality [23][24][25], but are difficult to apply due to the model complexity, the high spatiotemporal frequency of measurements needed, and the fact that auto-calibration may actually result in less realistic predictions of nutrient dynamics than manual calibration [26][27][28]. Nevertheless, it is important for complex watershed nutrient models to recognize and identify the possibility of equifinality in order to minimize model uncertainty, and we can use manual calibrations to explore key regions of the parameter space.
In this study, we focus on the East River Watershed (ERW) north of Crested Butte in the Colorado Rocky Mountains. The ERW is a relatively pristine headwater tributary to the Gunnison River in the Upper Colorado River basin, which provides 10% of the flow to the Gunnison River, which, in turn, provides 40% of the flow to the Colorado River at the Colorado-Utah border [29]. The ERW is predominantly underlain by a Cretaceous age, N-rich Mancos shale bedrock, which offers a unique opportunity to quantify the fate and transport of both geogenic and atmospheric N at the watershed scale. Mancos shale weathering in the ERW occurs primarily due to abiotic processes controlled by the water table depth [30,31]. Leaching experiments of Mancos cores from the ERW indicate that shale-N is primarily organic N and ammonium [30]. Seasonal saturation of the shale, following the snowmelt-induced water table rise, results in the dissolution of organic N and the desorption of NH 4 + from clay minerals (primarily illite and smectite [32]. Following mobilization, experimental evidence indicates that DON and NH 4 + are both readily mineralized and nitrified by microflora, driving rapid NO 3 production [30]. Here we develop the High-Altitude Nitrogen Suite of Models (HAN-SoMo), a watershedscale, process-based N ensemble of box models representing the hydrologic and dissolved N cycles, which discretizes N cycling into sub-watersheds (i.e., is semi-distributed). Given available measurements, we focus specifically on dissolved N species, but due to the importance of particulate N in mountain environments, devote a portion of the discussion to hypotheses related to particulate N loads in stream. This is the first study to utilize all available ERW dissolved N species time series data for both surface and subsurface waters to quantify whole watershed N cycling. Transient hydrological input parameters are constrained via coupling to the three-dimensional groundwater-surface water model ParFlow [33], which is further coupled to the Community Land Model (CLM) [34] that has been specifically developed at high resolution for the ERW [35]. Herein, we quantify the relative contributions of atmospheric deposition and bedrock weathering to dissolved N exports downstream. We further quantify storage, loss, and N species transformation fluxes including plant uptake and release via litter decomposition, denitrification, nitrification, and mineralization. In addition to the coupling of these models, we also employ the high-resolution spatiotemporal monitoring scheme in place at the ERW since 2014 to calibrate the model. This includes hourly measurements of discharge and daily-tomonthly stream water N measurements at tributary confluences and three main reaches on the East River, and groundwater N measurements from several locations. The goal of our study is to determine the relative contribution of different N fluxes, including the magnitude and fate of N derived from shale weathering, to total N-export. from a pristine mountainous watershed.

Study watershed
The ERW is an intensely monitored observational watershed, and has been heavily instrumented [36], with over 1600 stream water dissolved N concentration measurements in five key sub-catchments, spanning a time period 2014-2020. This study is focused on an 85 km 2 region of the watershed around the Rocky Mountain Biological Laboratory (RMBL) and specifically the encompassing watershed area located north of a pumphouse (PH), used to extract water for use by the adjacent municipality of Mt. Crested Butte, Colorado (Fig 1). This region includes alpine, sub-alpine, and montane ecosystems ranging in elevation from 2760m to 4120m. Eight perennial tributaries drain into the East River (ER) upstream of PH: Rustlers, Copper, Gothic, Quigley, Rock, Marmot, Bradley, and Avery Creeks. The watershed receives 670-1200 mm of precipitation annually, depending on the monitoring location within the watershed, with about 70% as snow [37], and the majority of the remaining 30% during monsoonal rains in late summer and early fall. Land cover ranges from barren rock to quaking aspen (Populus tremuloides) stands, Engelman spruce (Picea engelmannii) and subalpine fir (Abies lasiocarpa) mixed forest, dry shrub/scrub, and meadows away from stream networks, and riparian areas predominately characterized by woody shrub vegetation, dominated by members of the Salix genus (willows), with interspersed herbaceous wetlands (Table 1). N-fixing plants, including members of the lupine genus (Lupinus argenteus, Lupinus bakeri), sweet pea (Lathyrus latifolius), and American vetch (Vicia americana), are unevenly distributed throughout the watershed's meadows.
The ERW is mainly underlain by Mancos shale [6], which outcrops throughout the watershed and has high N concentrations (solid concentrations~1150-1400 mg N kg -1 ). The watershed vadose zone largely developed atop of a region of weathered Mancos shale-derived saprolite, with additional area covered by a mixture of colluvium and glacial deposits. Less prominent bedrock formations include sandstone, conglomerate, and Oligocene quartz monzonites and granodiorites, including two large laccoliths that form the mountains on the southwestern edge of the watershed. Finally, a herd of cattle numbering ca. 500 head roam the watershed from late July to early October, grazing between PH and RMBL for approximately   using an AS-18 anion exclusion column; the method detection limit based on calibration standards for nitrate is 0.1μM. Stream water samples for analysis of total dissolved nitrogen (TDN) and ammonium (NH 4 + ) were collected at the same locations less frequently (weekly to monthly) and filtered (0.45 μM). Stream discharge was measured at hourly intervals at LT, Copper, EAQ, and Rustlers according to the methods described in Carroll et al. [37] (note that long-term discharge monitoring was not possible at ME). Groundwater samples were collected intermittently throughout the watershed using installed piezometers at PH, RMBL, Bradley and Rock Creeks (Fig 1). TDN was analyzed via chemiluminescence using a Shimadzu Total Nitrogen Module combined with a TOC-VCSH analyser (Shimadzu Corporation, Japan

3 Model overview
The suite of models consists principally of box models representing the hydrology and N dynamics in the river, soil water (vadose zone) and groundwater for each subwatershed i. The ERW is discretized into five sub-watersheds (Fig 1) . Here, we use both input and output from the 100m resolution model run. The subsurface is discretized into five layers across the entire watershed, the top three are soil layers of depths 0.1m, 0.3m and 0.6m, while the bottom two are geological layers of 8m and 21m, with the deepest 21m representing fractured bedrock.
Pressure head output from ParFlow-CLM was spatially integrated for each sub-watershed i (Fig 1) to determine the total volume of water stored in the groundwater, V g,i , vadose zone, V s,i , and surface water, V r,i , at hourly intervals. Movement into and out of the vadose zone is quantified with two fluxes from ParFlow-CLM, also at hourly intervals: infiltration into the vadose zone, Infilt i , and evapotranspiration from the soil layers, ET s,i . Infiltration includes snowmelt water, precipitation, and runoff from adjacent cells that enters the vadose zone, while exfiltration (i.e. negative infiltration) includes vadose zone water that exits to the surface via saturation excess. Surface water pressure head values were converted to discharge (flows) using Manning's equation for the outlet of each sub-watershed. Discharge is a function of the representative slope values of each outlet, and the Manning n value of that cell, as parameterized by land cover type. As discussed in Foster and Maxwell [35], Manning n and hydraulic conductivity were used constrain system-wide discharge by manual calibration with streamlevel observations to control the dynamics of the streams.
Soil and air temperature (T soil and T air ,˚C) were output as hourly spatial averages for each sub-watershed. Air temperatures are derived from PRISM datasets [46], interpolated to hourly resolution using phase two of North American Land Data Assimilation (NLDAS-2) forcing [47], which are available at 1/8 th -degree at hourly time steps, and were interpolated and downscaled to match the discretization of the ParFlow-CLM model. Soil temperature is solved in CLM using the heat diffusion equation and a subsurface heat flux with the Fourier law for heat conduction over the top 2 meter of the model for both soil and snow layers [48]. Stream water temperatures were calculated using the empirical relationship given in Lauerwald et al. [49]. We calculate a single groundwater temperature (T gw,˚C ) for the entire ERW by averaging subwatershed soil temperatures at each timestep.

Aggregated hydrology.
We spatially aggregate the ParFlow-CLM simulation outputs to produce a simplified mass balance model based on the conceptual model used by Jackson-Blake et al. [50]. Watershed hydrology in each sub-watershed i is broadly grouped into three pools of water storage: soil water, V s,i , groundwater, V g,i, and stream water, V r,i (Fig 2). Soil water represents the total unsaturated (vadose) zone within each sub-watershed, while groundwater represents the saturated zone, to a depth of 30m. Stream water includes all surface water in the main river channel, lakes/ponds, and tributaries. All storages volumes are in m 3 representing entire sub-watersheds (Fig 1). The key advantage to using ParFlow-CLM to constrain this box model is that it enables us to calculate water residence times for the stream, groundwater and vadose zone that vary with each timestep, rather than assuming they are constant. Through this approach we are further able to account for groundwater recharge and discharge to the stream, as well as bidirectional exchange of water between the vadose zone and groundwater.
Using the time series of fluxes and volumes generated via ParFlow-CLM, the remaining fluxes between hydrological pools were back-calculated analytically for each hourly timestep using mass balance equations: where Q s,i is the daily soil water flow exiting (if positive) or entering (if negative) the soil water reservoir (m 3 hr -1 ), V s,i (t+1) and V s,i (t) are the water volumes within the vadose zone at time t+1 and t, respectively (m 3 ), [Infilt−ET s ] i is the infiltration (or exfiltration) minus the ET from the vadose zone (m 3 hr -1 ), and Δt is the time step of one hour. When Q s,i is positive, the flow is partitioned between what is delivered directly to the stream (runoff) and what is delivered to groundwater. The proportion of Q s,i delivered to groundwater from soil water is determined by multiplying Q s,i by the base flow index, β i , a unitless values from 0-1 that varies for each sub-watershed i for the baseflow period (winter), rising hydrograph (early spring), falling hydrograph (late spring-early summer), and monsoon season (late summer to mid-fall) (constrained in Carroll et al. [37]). When Q s,i is positive, the mass balance for the groundwater pool is: where Q g,i is the flow of water from groundwater to the stream (if positive), or vice versa (if negative) (m 3 hr -1 ). If Q s,i is negative, the mass balance for V g,i is: The mass balance for the surface water is solved using: where (1−β i )Q s,i is interflow, i.e. the flow directly from the vadose zone to the stream (only relevant if Q s,i is positive), ∑Q r,i+1 is the sum of the flow exiting any upstream reaches or tributaries in sub-watershed i+1, and Q r,i is the flow leaving the reach. [Q q −ET r ] i is the overland flow delivered directly to the stream, minus direct ET from surface water (m 3 hr -1 ).

Fig 2. Hydrological box model used in HAN-SoMo.
Fluxes and volumes in red italics were extracted from ParFlow-CLM. V r,i , V s,i , and V g,i are the volumes of water stored in the river, vadose zone and groundwater in sub-watershed i, respectively (m 3 ), Q r,I , Q s,i and Q g,i are the flows of stream water, vadose zone water, and groundwater, respectively (m 3 hr -1 ), Infilt i is the infiltration (or exfiltration if negative) (m 3 hr -1 ), ET r,i is the direct evapotranspiration from surface water (m 3 hr -1 ), ET s,i is terrestrial evapotranspiration (m 3 hr -1 ), β i is the base flow index (unitless), and ∑Q r,i+1 is the sum of the flow exiting any upstream reaches or tributaries i+1.  (Fig 3). The majority of fluxes are represented using first order kinetics, with the exception of plant N uptake kinetics, which use Monod kinetics. Note that while the mechanistic N model is also solved using one-hour timesteps, the time units are in days as N reaction parameters are not constrained to resolve diel or hourly trends (e.g. daytime vs. nighttime differences in primary productivity). Hence, all the hydrological inputs described in section 2.4.2 are converted to units of days before insertion into the N model. In the vadose zone, N pools are solved numerically using: . As with wet deposition, we assume 25% of the total dry N deposition occurs as DON, i.e. 0.52 mol m -2 day -1 . These fluxes are multiplied by the sub-watershed surface areas in m 2 to yield total deposition fluxes in mol day -1 .
M o , M n and M a are the areal fluxes of DON, nitrate and ammonium mobilized from the Mancos shale to the sub-surface (mol m -2 day -1 ), via desorption, ion exchange, dissolution, or rapid mineralization and nitrification specifically associated with the weathered ions [30]. Given the uncertainty related to the relative magnitude of each specific weathering process, as well as the size of the available N stock in the bedrock for the entire watershed, we model the Mancos weathering and associated mineralization and nitrification as a constant combined input into the model, generating input fluxes for all three N species, which are all calibrated. Calibration of shale weathering fluxes is discussed in detail in Section 2.5. We assume 90% of the Mancos weathering flux takes place in the saprolite within the vadose zone, hence the 0.9 coefficient in the weathering term of Eqs 5, 6 and 7, while 10% is weathering and released to the groundwater from fractured bedrock, based on the findings of Wan et al.
[30] for a representative hillslope adjacent to PH. We multiply vadose zone weathering rates by σ i , the proportion of each sub-watershed that is underlain by Mancos shale down to 8m below surface ( , and DON. The inclusion of direct plant DON uptake reflects the recent acceptance that this uptake mechanism is ecologically relevant, particularly in N-poor systems like the ERW [13,56]. The generalized N uptake flux for all N species, F x,up,i , via plants is solved using Monod kinetics: where N x,s,i is N n,s,i , N a,s,i or N n,s,i , F x,max,up is the maximum uptake rate for each N species in each sub-watershed (mol day -1 ), constrained based on the proportion of four generalized land cover types in each sub-watershed i: deciduous and mixed forest (ρ df,i ), coniferous forest (ρ cf,i ), meadow (includes grassland, herbaceous wetlands, and dry shrub/scrub) (ρ ms,i ), and willowy wetland (ρ ww,i ), using the following: where land cover proportions are unitless values between 0-1. The land cover proportions are determined using the USGS National Land Cover Database (2016) ( Table 1). The sum of ρ values do not equal 1 in each sub-watershed as barren and developed areas (e.g. roads) are assumed to have reaction rate constants equal to 0. F x,max,df , F x,max,cf , F x,max,ms and F x,max,ww are maximum uptake fluxes for each N species by each land cover type (mol m -2 day -1 ) (S1 Table in S1 File), and α i is a unitless temperature correction factor: where ϑ s is a constant equal to 12, and T soil,i is the soil temperature at each timestep, output The rate constants for vadose zone net mineralization, k min,s,i , denitrification, k den,s,i , and nitrification, k nit,s,i (day -1 ) are similarly defined based on the proportion of each land cover type in sub-watershed i and rate constants for each land cover type ( Table 2): and k den; These fluxes represent totals for the entire sub-watershed vadose zone, hence are able to occur concurrently within the model (e.g. as in Wade et al. [10] and Whitehead et al. [60]. Denitrification, for example, which predominantly occurs under saturated, anoxic conditions, can occur in the vadose zone to reflect the fact that there are local areas of ponded water or a perched water table. These parameters are calibrated as described in section 2.5. F a,fix,i is the flux of NH 4 + added to the system via N-fixing plants, assuming 1% of the meadow plant coverage are N fixers [61]. For each timestep, the N fixation flux is chosen as a PLOS ONE random number between 10 −8 and 10 −4 mol m -2 day -1 when the soil temperature is above 0˚C. The annual areal fixation approximates rates observed in similar mountain meadow environments where L. argenteus and L. latifolius are the N fixing plants [62][63][64], while allowing for daily variability. F o,lit,i is the release of DON from decomposing plant litter (mol day -1 ) and is calibrated so that the annual plant uptake flux is within 10% of the litter plus cow deposition, based on the assumptions of Zhu and Zhuang [56] (see section 2.5). For meadows, aspens, and willows, litter rates are calculated by randomly selecting litter N release rates (mol m -2 day -1 ) from uniform distributions, grouped for different periods of time during the year (e.g. high litter in autumn, low in spring) (values are given in Table 2), enabling daily variability as well as seasonal N litter additions that align with the above assumption. Coniferous litter N release, F o,lit, Willowy wet 5x10 -4 , 5x10 -6 , 0.002 PLOS ONE spruce (in g N m -2 day -1 ) is constrained by fitting equations to model output described by Grant [65] and Mekonnen et al. [66] (using Grant [67])) to develop the following equations: and where GPP spruce is the gross primary productivity of spruce (mg C m -2 day -1 ), and υ is a unitless scaling parameter used in the calibration ( Table 2). (Note that the model converts units in g to mol before solving). The flux of DON delivered to soil water via cattle excretion, F o,cows,i (mol day -1 ) is determined according to: where n cows is the total number of cows in the watershed, equal to 500, G i is the proportion of the herd present in sub-watershed i on timestep t, equal to 0 if none are present and 1 if all 500 are present, and L dung and L urine are the loads of DON delivered to the soil per cow per day from dung and urine, respectively. L dung is equal to 8.6 mol cow -1 day -1 and L urine is equal to 15 mol cow -1 day -1 [68]. These values represent the flux of N that enters the soil; i.e. volatilization of NH 3 to the atmosphere is accounted for. Each year, G i is set to 1 in the LT region for July 15 -September 8, and for September 9 -October 15, we assume G i is equal to 0.6 in ME, 0.2 in Copper, and 0.1 in EAQ and Rustlers, based on the approximate annual grazing schedule of the local ranchers. G i is set to 0 in all other cases. In the stream, N pools are solved using: dN n;r;i dt ¼ X N n;r;iþ1 þ k nit;r N a;r;i À k den;r N n;r;i À k up;r N n;r;i À 1 t r N n;r;i þ N nP ½Q q À ET r � i þ F n;int;i þ F n;gw;i ð18Þ are the sums of ammonium, nitrate, and DON entering from upstream sub-watersheds i+1 (if applicable) (mol day -1 ), k m,r is the rate constant for in-stream net mineralization (day -1 ), k nit,r is the rate constant for instream nitrification (day -1 ), k up,r is the rate constant for instream N primary productivity (day -1 ), and k den,r is the instream denitrification rate constant (day -1 ). τ r,i is the timestep-specific in-stream water residence time (in days) for each sub-watershed i, equal to V r;i Q r;i , which gives the flux (mol day -1 ) of each nutrient exiting the sub-watershed via streamflow when its inverse is multiplied by the in-stream concentration. Wet atmospheric deposition directly to the stream is modeled by multiplying N aP , N nP , or N oP by [Q q −ET r ] i . F x,int,i is the flux of ammonium (F a,int,i ), nitrate (F n,int,i ), or DON (F o,int,i ) that enter the stream via interflow (mol day -1 ) (relevant if Q s,i is positive), calculated using: where N x,s,i is the amount of each N species in the vadose zone at each timestep (mol), τ s,i is the timestep-specific vadose zone water residence time (days), equal to V s;i Q s;i . If groundwater is discharging to the stream, i.e. Q g,i is positive, the flux of each species to the stream, F x,gw,i , is equal to: where N x,g,i is the amount of each N species in the groundwater (mol), τ g,i is the groundwater residence time for sub-watershed i (days), equal to If the stream is recharging to groundwater, i.e. Q g,i is negative, the N species recharge fluxes are calculated using: where N x,r,i is the amount of each N species in the stream (mol). All instream rate constants are temperature corrected based on water temperature: where k y,r represents k nit,r , k den,r , k min,r , or k up,r at time t, k 20,y,r is any of these rate constants at 20˚C, ϑ r is a constant equal to 1.07 [69], and T water,i is the water temperature (˚C) at time t. k 20, m,r is set to 1.5 day -1 , based on observations of Catalán et al. [70]) and Cheng and Basu [71] that fresh terrestrial organic material mineralizes quickly upon entering the water column (also supported with data in Bertilsson and Stefan [72]). k 20,nit,r , k 20,up,r , and k 20,den,r are calibration parameters and specific values are discussed and provided in section 2.5 and Table 2 dt are the rates of change of groundwater ammonium, nitrate and DON storage over time (mol day -1 ). As with Eqs 5-7, the 0.1 coefficient multiplying the Mancos shale term accounts for the assumption that 10% of the weathering takes place in groundwater. F a,vz,i, F n,vz,i and F o,vz,i represent the flow of each N species between groundwater and the vadose zone (mol day -1 ). When Q s,i is positive (i.e. flow is from the vadose zone to groundwater), the generalized expression for each N species, F x,vz,i , is solved using: Whereas if Q s,i is negative, F x,vz,i is solved using: k den,g,i the rate constant for denitrification in the groundwater, calculated according to: where k den,max,g,i is the maximum denitrification reactivity, in this case for the maximum groundwater temperature of 3.8˚C, calibrated as discussed in section 2.5, ϑ g is a constant equal to 0.3, and T gw is the groundwater temperature (˚C) at time t. Groundwater denitrification is assumed to happen primarily during periods of rapid groundwater recharge resulting in water table rise and near-surface anoxia, particularly in floodplains. This flux therefore only occurs when Q s,i exceed 1.0 x 10 5 m 3 day -1 for an entire sub-watershed, limiting groundwater denitrification to the spring snowmelt and high intensity monsoonal rainfall events in the late summer and early fall. Note that groundwater denitrification differs from vadose zone denitrification, which represents localized regions of saturation that exist throughout the growing season, such as in perched water tables.

Model calibration and uncertainty
The mechanistic N model is manually calibrated for Oct. 1, 2014 -Sept. 30, 2016 (731 days) for all 5 sub-watersheds concurrently to yield consistent reaction rate constants across the watershed, using nitrate concentrations measured at the outlets of each sub-watershed. These first two model years were used for calibration as they represent approximately average snowpack years relative to the 1981-2010 mean [73]. Due to exceptionally low in-stream concentrations, we calibrate using the measured concentrations as opposed to fluxes (i.e., concentrations multiplied by the river discharge). This is because the variability in discharge is very large relative to the stream N concentrations. As a result, the fluxes follow nearly identical temporal trends to discharge, and it becomes impossible to resolve differences in N behaviour when using fluxes to calibrate as they are dwarfed by differences in calibrating. The goodness of fit for the calibrations are checked using root mean square error (RMSE) comparisons of modeled vs. measured data (Table 3). Surface water NO 3 is initialized with concentrations from the chronologically closest measurements. Due to the scarcity of NH 4 + and DON surface water measurements, these pools are initialized based on the median measured concentrations (Fig 4). Given the short surface water residence times, any bias from the initial surface water conditions is eliminated within a few timesteps. Few groundwater and vadose zone measurements for any N species were available throughout the upper watershed due to a limited number of monitoring wells (none in EAQ, Copper and Rustlers) and an absence of vadose zone pore water samplers anywhere except PH. As a result, existing measurements were used to identify the order of magnitude for subsurface concentrations, and the initial conditions were manually adjusted iteratively to ensure that the concentrations during the non-growing season are at or close to steady state (refer to step 5 in calibration procedure below). Groundwater initial conditions are additionally constrained based on the size of baseflow concentrations instream; if initial conditions are set too high, baseflow stream concentrations, which originate overwhelmingly from groundwater discharge, will exceed measured values. To calibrate the model, we follow this iterative procedure: 1. We assume that atmospheric deposition fluxes, cow N deposition, and plant uptake fluxes are well constrained and cannot be adjusted in the calibration. N-fixation is small and not adjusted.
2. We adjust the magnitude of litter fluxes under the assumption that litter plus cow N release must be within ±10% of the plant uptake flux for the calibration period as done in Zhu and Zhuang [56].

PLOS ONE
3. The shale weathering flux remains the only source that can be adjusted. We initially use the median value of the total N flux range described in Houlton et al. [5] (9.8 x 10−5-15 x 10 −5 mol m -2 day -1 ) starting with the assumption that DON accounts for half of the total shale weathering flux and that NO 3 and NH 4 + are the remaining quarters, and adjust each species as necessary. We base these initial ratios on the in situ porewater N species concentrations determined at this site by Wan et al. [30] in five boreholes on a hillslope above PH This study showed that weathered N was primarily in the form of DON, followed by NH 4 + , but that these species were both quickly mineralized and/or nitrified. Because we do not explicitly model nitrification or mineralization in groundwater, we calibrate N species-specific weathering fluxes to account for these transformations in the subsurface. We assume the shale weathering rate is constant throughout the year and adjust based on the baseflow concentrations in the streams. 4. Denitrification and instream loss are the last sinks to be constrained. Based on our assumption that the total magnitude of N sinks must be within ±10% of the magnitude of sources for the calibration period, these remaining fluxes are adjusted to meet this criterion. We adjust the size of denitrification based on the concentrations in the vadose zone to ensure that the vadose zone NO 3 pool does not drain or fill. The instream concentrations can then be constrained based on the fluxes that discharge into the river. This process is done iteratively to ensure NH 4 + and DON stream concentrations fall within the approximate range of the measured distributions (Fig 4). 5. We adjust vadose zone and groundwater initial conditions to meet criteria described above. Return to step 1 and rerun until step 5 does not require changes to initial conditions. This approach leaves several key uncertainties, which we use to explore equifinality. We present three model calibrations representing different regions in the parameter space that fit the surface water NO 3 measurements approximately equally well (Table 2), and discuss drivers of differences in subsurface concentrations, surface water NH 4 + and DON concentrations, and the magnitude of fluxes constrained. To further quantify the relative importance of the Mancos shale as an N source, we develop two additional scenarios that assume the watershed is not underlain by a N-rich shale (Table 2), and investigate whether it is possible to calibrate the model to the measured stream concentrations. Given the high uncertainty already associated with the magnitude of the weathering fluxes, we have not attempted to resolve seasonal differences in the weathering rates as these differences likely already fall within the margin of uncertainty predicted by the various calibration scenarios. Future attempts to resolve the spatially integrated seasonal changes to weathering rates will also need to consider watershed-wide changes to water table height [31,74]. We also develop a scenario without cattle to quantify the model's sensitivity to N recycling by cows.

Model scenarios
Through the iterative calibration process described in Section 2.5, we identified six distinct N model cases that represent different regions within the parameter space that fit the data approximately equally well ( "Low" and "high" refer to relative representative regions within the parameter space where calibrations could be achieved; in other words, while somewhat lower or higher values than what are listed in Table 2 are possible, they exist within a continuum of parameters in the same general section of the parameter space and are thus not identified here because they yield roughly the same conclusions regarding relative importance of N processes. C3 is specifically calibrated to identify the largest shale weathering flux for which a calibration can be achieved.

Trends in measured stream water data
The model calibration years of Oct. 1, 2014 -Sept. 31, 2016 exhibited approximately average snowpack, relative to the 1981-2010 means, with monthly 2015 January-June snowpack at 98% of average, and 2016 January-June at 123% [73]. Winter 2017 had above average snowpack, with a monthly average at 146% of the mean, while winter 2018 was well below average at only 40% of the mean [73]. During the model calibration period, the two sub-watersheds with the most measured data, ME and LT, show relatively consistent repeating annual trends (Fig 5, S1 Fig in S1 File). Measured NO 3 concentrations are generally drawn down over the course of the growing season, following the end of the snowmelt (typically beginning in March and ending in June) and rebound throughout the winter, with instantaneous peaks in concentration throughout the year. Spring peaks in NO 3 concentrations at ME and LT likely correspond with the spring snowmelt, although flows were not measured at ME so this cannot be confirmed.
The following two years of data, Oct. 1, 2016 -Oct. 1, 2018, ME and LT exhibit extremely low concentrations barely above instrument detection limits (consistently below 1 μM from August 2017 through January 2018), with no repeating seasonal trends, and cannot be reasonably used to attempt a separate model calibration with this time period (Fig 5, S1 Fig in S1  File). Given that each of these years experiences substantially different extremes in snowpack, but that stream NO 3 concentrations in both years remain below those of the average snowpack years, we can only speculate on a mechanism driving these low concentrations. It is possible that during the extremely high snowpack in winter 2017, the vadose zone was sufficiently insulated throughout the winter by deep snowpack that microbial activity could continue to such an extent that dissolved N was lost via denitrification [75,76]. Additionally, the high snowpack would result in a prolonged period of soil saturation during and following the snowmelt period, further promoting enhanced denitrification. Hence, the dissolved N would not be measured exiting the watershed via the river throughout the growing season as the overall watershed stock was converted to N 2 or N 2 O gases. In the 2017-2018 winter, which had

PLOS ONE
unusually low snowpack, following this hypothesis we would expect low gaseous emissions due to a lack of snowpack insulation and frozen soils observed throughout the watershed. However, Wan et al. [30] found high winter N 2 O emissions from the subsurface on a hillslope above PH during this time, but since their study did not begin until May 2017, a comparison with the winter before is not possible. Further research is therefore needed to elucidate the driver of consistently low stream NO 3 concentrations in both the high snowpack and low snowpack years. We discuss the model's performance in these two years in section 3.6.

Scenario comparison
Across all sub-watersheds, and under all calibration scenarios, stream water NO 3 -, NH 4 + , and DON concentrations peak during the snowmelt period from March-May, decline during the growing season from May-September, and rebound throughout the fall (Fig 5, S1-S3 Figs in S1 File). When comparing all the calibration scenarios, C2 appears closest to representing the observed watershed dynamics. While the calibration RMSEs for streamwater NO 3 in each of the sub-watersheds are similar for each calibration (Table 3), C1 is not capable of achieving the low stream NH 4 + concentrations at the LT for which there are measurements available (Fig 5).
C2 and C3, which use the large instream reaction rate constants for uptake and denitrification (Table 2), do predict these low concentrations. C2 and C3 instream rate constants are consistent with observations from Boyer et al. [77]. Further support for C2 being the likeliest calibration scenario can be drawn from the magnitude of denitrification, and the size of the denitrification rate constants, which are both more realistic for a N-poor mountainous watershed. For example, McMahon et al. [78] constrained a first order bedrock denitrification rate constant of 0.001 day -1 , the same as that used in C2, for an analogous N-rich Colorado shale. Similarly, C2 predicts denitrification rates for the subsurface of 3.2 x 10 −4 mol m -2 yr -1 (0.044 kg N ha -1 yr -1 ), while C3 predicts rates of 0.022 mol m -2 yr -1 (3.1 kg N ha -1 yr -1 ). The latter value is much more characteristic of heavily impacted agricultural systems, while measurements from more pristine alpine and/or forested sites rarely exceed rates of 0.1 kg N ha -1 yr -1 [79][80][81].

Atmospheric vs. geogenic N sources
Over the course of a year, C1-C3 predict that wet atmospheric deposition accounts for an annual average of 1.83 x 10 7 mol yr -1 NO 3 -, 1.15 x 10 7 mol yr -1 NH 4 + , and 9.84 x 10 6 mol yr -1 DON added to the ERW, or 36-40% of the total sources and 52-74% of the total new sources to the watershed, i.e. the N that is not being liberated from plants or cattle (Fig 6). Dry deposition accounts for 1.73 x 10 4 mol yr -1 NO 3 -, 7.99 x 10 4 mol yr -1 NH 4 + , and 3.24 x 10 4 mol yr -1 DON, i.e. 1-2% of the sources and 2-3% of the new sources (Fig 6). Comparatively, C1-C3 predict that the Mancos shale weathering supplies 10-31% of the total N mobilized to the ERW's combined groundwater, vadose zone, and surface water annually (Fig 6), and 21-44% of all new N sources. In the most probable C2 scenario, weathering supplies 12% of the total N mobilized, and 21% of the new N. Annually, DON [1] estimated that more than 10 kg N ha -1 yr -1 originated from biotite schist and diorite saprolite in the Mokelumne River watershed in central California.
We note that these are warmer environments that receive more of their precipitation as rain than the ERW, driving higher weathering rates. Our estimates additionally fall below the montane chemical N weathering rates estimated by Houlton et al. [5], which exceed 5 kg N ha -1 yr -1 .
Approximately the same quality calibration can be achieved using the NM1 and NM2 as C1 and C2, respectively (Table 3). These results suggest that geogenic N sources are not the dominant N source to the stream. This finding aligns with that of Holloway and Smith [6], who showed that Mancos shale weathering in the Grand Valley on the western Colorado border

PLOS ONE
did not impact streamwater N concentrations. This conclusion is further supported by a lack of correlation between the amount of shale saprolite within each sub-watershed and instream NO 3 -concentrations. The vadose zone in the EAQ sub-watershed is composed of 70% Mancos saprolite (Table 1), the highest of the sub-watersheds, but has the lowest stream NO 3 concentrations (Table 3). Model results do predict the highest stream DON and NH 4 + concentrations in EAQ, suggesting that greater coverage of shale saprolite correlates with increased riverine and vadose zone total N. However, the model also predicts very high NO 3 concentrations in the stream at EAQ, so these trends may reflect the model structural inaccuracies discussed in Section 3.5. Measured low NO 3 and NH 4 + could arise in EAQ due to the short subsurface residence times that do not provide sufficient time for DON to undergo mineralization and/or nitrification before discharging to the stream.

Terrestrial N-limitation
Litter plus cattle recycling accounting for 46% of the N sources in C2 (with 27% of this value from cattle), and plant uptake accounting for 45% of sinks/fates (Fig 6). Efficient terrestrial plant uptake and recycling via litter deposition, plus low export of dissolved N downstream (<10% of TN sinks, Fig 6) indicates that the ERW is efficiently retaining N. Therefore, these data suggest that the ERW is N-limited with regard to terrestrial plus instream primary productivity [82,83]. This hypothesis is supported by dissolved inorganic N (DIN = NO 3 -+ NH 4 + ) to DON ratios in the stream that are drawn down to between 0.5-1 in all of the subwatersheds during the growing season in C2, which has been to shown to be a proxy for N-limited terrestrial systems [84,85]. This N-retention efficiency suggests that the presence of Mancos shale is not sufficient to liberate plant growth from N-limitation in the watershed. As further evidence of this hypothesis, C2 yields estimates of~1.0-1.15 x 10 7 mol for TN stored in the vadose zone and groundwater combined for the entire ERW. Thus, annual Mancos shale weathering fluxes represent 4-5% of the total available N stored in the subsurface. Given that the Parflow-CLM-predicted median groundwater residence times range from 17-50 years in each sub-watershed, nearly all N mobilized to groundwater from geogenic sources remain in groundwater storage for at least a decade before becoming available to the stream or vadose zone for plant uptake or denitrification. Indeed, only~1.2 x 10 5 mol NO 3 is discharged annually from the groundwater to the stream, equivalent to 3-4% of the groundwater NO 3 -. In the vadose zone, median residence times range from 20 to 212 days, indicating that geogenic N is more available for plant uptake, denitrification, or export to the stream. While we cannot explicitly track the fate of geogenic N in this type of model, the residence times indicate that, per year, all vadose zone N is entirely replaced at least once, and up to 18 times, with an average of 4.8 x 10 5 mol TN discharged via interflow to the stream annually.
It is worth emphasizing that in this relatively pristine watershed, the presence of even a small herd of 500 cattle plays a major role in annual N recycling, accounting for a larger flux of DON to the soil than litter in C2, C3 and NM2 (Fig 6). Using the NC scenario, we can equally achieve a calibration without the presence of cows merely by increasing the magnitude of the litter flux. Indeed, mechanistically, roaming cattle may merely serve as an alternative pathway for plant nitrogen to re-enter the soil, with key differences being that cows accelerate the decomposition process and alter the timing of when the N re-entry into the soil takes place. The timing of the cows' presence in the watershed may greatly impact the extent of their influence on the N cycle. However, given that the grazing period occurs during later summer and early autumn when litter fluxes are highest, their influence on the unperturbed timing of DON release to the soil may be minimal. The key difference between the NC scenario and the equivalent scenario with cows (C2), is that peak vadose zone and stream DON concentrations are ~2-8 μM higher in the scenario without cows (Fig 5 and S2 and S6 Figs in S1 File). There are only imperceptible differences in NH 4 + and NO 3 concentrations. Due to volatilization of ammonia from cow urine and dung (already accounted for in the fluxes used in Eq 16), the overall flux of N into the soil is lower per unit plant biomass than from litter decomposition. In C2, the average annual N sources to the watershed are 3.77 x 10 6 mol lower than in NC. The continuity of calibrations solutions that can be achieved merely by substituting cow DON release with litter release does however identify the need for DON surface and subsurface water time series and/or ERW-specific cow dung and urine N fluxes to the soil to better constrain this portion of the model.

Surface water N dynamics
We estimate that 2.57-3.97 x 10 5 mol yr -1 TN (0.42-0.65 kg N ha -1 yr -1 ) exits the ERW via the river at PH annually (Fig 6). In other words, only 6-9% of the total watershed TN sink exits via the stream in dissolved form. Instream loss via denitrification and primary productivity accounts for 22-46% of the total annual sinks, with C2 representing the upper bound of this range (Fig 6). Indeed, in the C2 calibration, 86% of the N that is delivered to surface water is denitrified or taken up via primary productivity. The magnitude of instream processing is particularly significant, given Parflow-CLM predicts instream water residence times on the order of only days for the ERW, compared with weeks to centuries for the subsurface. As a result, a majority of the N sent downstream may be in the form of particulate organic N (PON). The magnitude of this downstream PON load is likely increased by high physical erosion off steep hillslopes and mountainsides [19,20]. Fox et al. [86] showed that in the ERW, 23-34% of organic carbon (OC) deposited in the floodplain sediments originated from eroded shale, and speculated that the fraction of N in stream sediments that was derived from shale may been higher. It is therefore likely that our calibrated shale weathering fluxes are underestimates as they do not account for this possible stock within the watershed. Whether or not the magnitude of the unaccounted-for weathering flux results in an overall flux in the most probably C2 calibration scenario that is in excess of the magnitude predicted in C3 remains to be tested. The downstream flux in the form of PON is particularly important given the number of large dam reservoirs downstream, most notably the Blue Mesa Reservoir on the Gunnison River, approximately 60 km downstream of PH. The Blue Mesa Reservoir has a surface area of 37.15 km 2 and an annual water residence time of 1.3 years [87], and the TDN concentrations have historically been low (0.1-0.4 mg L -1 , 7-28 μM), with the majority in the form of DON [88]. Reservoirs, particularly those with long residence times such as the Blue Mesa, are known to eliminate significant N from the water column via burial in sediments or denitrification [89,90]. A load delivered from upstream in the form of PON would be more readily sedimented than dissolved loads. Indeed, Holloway and Smith [6] estimated that the flux of NO 3 in the Colorado River at the Colorado-Utah border is 0.049 kg N ha -1 yr -1 , indicating substantial reduction of N fluxes along the river after the ERW. It is therefore worth investigating both the PON fluxes out of the ERW as well as the Blue Mesa Reservoir's N elimination. The reservoir may act in such a way that "restarts" the upper Colorado River watershed's N cycle. The ERW's N cycles differs from the majority of modeled watersheds in that atmospheric precipitation is nearly always more concentrated in all N species than the porewater and stream water. Thus, during the spring snowmelt, rising instream N concentrations are mainly controlled by flushing of concentrated meltwater through the vadose zone, in addition to direct mixing of meltwater with stream water. This pattern is evident by the abrupt increase in vadose zone and surface water NO 3 -, NH 4 + and to some extent DON during the snowmelt period each year (Fig 5). The rising spring concentrations with increased streamflow align with concentration-discharge (C-Q) relationships for N species observed in the adjacent Coal Creek watershed, where positive C-Q trends are shown to be indicative of flushing of more concentrated vadose zone waters in wet seasons [91], while the stream N in dry seasons tends to be sourced from deeper, less concentrated groundwater. The impact that the ERW's highly concentrated unprocessed atmospheric N has on instream and vadose zone concentrations seems to support the findings of Sebestyen et al. [92], who showed that in northern North American forests, unprocessed atmospheric N can account for a high fraction (defined as > 20% of total N) of these surface water concentrations during wet periods such as snowmelt. While we cannot explicitly track the proportion of vadose zone N that is processed (i.e. its source) once it is inside the box model (Fig 3), the fact that stream and vadose zone N concentrations peak during the largest period of highly concentrated atmospheric N addition suggest that much of the N being delivered to the stream in the spring is unprocessed atmospheric N.

Model uncertainty
Our model development process shows that despite >1600 nutrient concentration and nearly continuous discharge measurements, equifinality is unavoidable for a system of this complexity, where more than 20 parameters are being concurrently calibrated. Indeed, existing biogeochemical modeling research that has managed to overcome equifinality have been limited to only three calibration parameters or fewer [93]. Thus, in order to ask broad questions about subsurface processes like the relative importance of the Mancos shale on watershed N cycling, watershed nutrient modelers need to be prepared to explore multiple calibration solutions. This process enabled us to identify the boundaries of probable parameter ranges and identify the likeliest scenarios based on the magnitudes of specific N processes. We can also use the current modeling experiments to identify a series of knowledge gaps that, once filled, could revise and narrow the HAN-SoMo parameter constraints and further approximate the "true" magnitude of N processes in the ERW. The implementation of six parameter cases is further useful because it can identify model uncertainties that are consistent across all scenarios. This can guide future exploration, through both measurement and modeling, of unusual N cycling behavior that typically do not emerge for more frequently modeled agricultural systems, where N dynamics are governed mainly by predictable fertilizer application schedules. For example, each of the modeled scenarios predict seasonal trends that repeat annually and do not resolve anomalous or noisy concentration deviations, such as the large peaks in LT NO 3 in late winter and early spring of 2016 (Fig 5, S1-S3 Figs in S1 File). Given the extremely low concentrations of all N species in each sub-watershed, the ability to identify and quantify repeating seasonal and annual drivers of change in a near-pristine alpine watershed represents an important advancement in watershed nutrient modeling. However, HAN-SoMo performs poorly when attempting to predict atypical biogeochemical circumstances, which seem to occur under unusually high or low snowpacks. For example, in the latter half of the model run period, LT and ME both experience periods where measured stream nitrate concentrations are below modeled output despite similar seasonal patterns and magnitudes of streamflow (S6 Fig), and N concentrations within the major tributaries that are comparable to previous years. The drivers of the extremely low concentrations fall outside the biogeochemical mechanisms described in this model, though may be related to snowpack insulation (see section 3.1), which in itself represents an important conclusion for alpine N modeling. In particular, these findings suggest that end-ofpipe model calibration, which is the norm in watershed nutrient modeling, may not be suitable at low concentrations characteristic of alpine environments. In other words, lumped, semi-distributed models such as HAN-SoMo cannot resolve local hotspots that likely play a disproportionately large role in determining downstream riverine N fluxes. They can, however, be used to identify the local measurements that would eliminate existing uncertainty.
One important example of a local discrepancy identified by the equifinality analysis is the inability to replicate surface water NO 3 concentrations as low as the measured values at EAQ in any of the calibration cases. The EAQ sub-watershed has the highest fraction of Mancos shale in the saprolite (70%, Table 1), and thus nutrient export here is high in the model. It is important to consider local differences in lithology as a potential source of this discrepancy. While the majority of the ERW is underlain by the upper member of the Mancos shale, an older, lower stratigraphic Mancos unit outcrops upstream of EAQ. This unit is harder and less fractured, partially due to the presence of igneous dikes and sills that cross-cut the shale [94], and thus weathering fluxes calibrated for the remainder of the watershed do not seem applicable. However, even under the 'no Mancos' scenarios (NM1 and NM2), modeled EAQ stream concentrations are too high (S4 and S5 Figs). It is possible the spatial and altitudinal heterogeneity in atmospheric deposition throughout the watershed could be an underlying issue. The CASTNET atmospheric deposition measurements collected at RMBL (Fig 1) may be larger than those at higher elevations or different aspects and slopes. Thus, additional measurements of wet and dry deposition would supplement existing measurements, and help identify the spatial variability of N deposition in mountainous systems. A number of more intensive field-based measurements could be considered to revise model estimates, particularly of the Mancos shale's weathering fluxes. In the absence of experimental measurements of the rate of N weathering from the Mancos shale cores, field estimates of denitrification time series around the watershed can help constrain gaseous loss to tighten up the overall N mass balance. Additionally, concentration depth profiles for N species in the vadose zone and groundwater at multiple locations and times can help further refine subsurface model constraints. However, given that this strategy would require a large number of depth profiles to constrain spatial variability, it is likely prohibitively expensive throughout much of the ERW. Along these lines, it is worth investigating the role of dissimilatory nitrate reduction to ammonium (DNRA), which, while typically a minor process, could be an important mechanism in local regions of the watershed given the low N concentrations.

Conclusions
Through exploration of six hypothesized watershed scenarios (with and without shale weathering), we have shown that the shale contributes perceptible concentrations of N to the East River watershed's N cycle. We note that due to the number of parameters being calibrated and issues of equifinality, the uncertainty associated with the magnitude of the Mancos weathering flux remains high. At most, the Mancos accounts for 44% of the "new" N delivered to the watershed annually. However, this value is likely lower, given the scenario that predicts it (C3) concurrently predicts denitrification fluxes more representative of heavily N-saturated agricultural systems than of a mostly-pristine mountainous wilderness area. Instead, the most plausible watershed scenario, C2, suggests that 21% of the new N delivered to the ERW annually is from geogenic sources, with the remainder deposited atmospherically, and that the ERW is N-limited with regard to terrestrial plus instream primary productivity. Scenario C2 predicts very high in-stream transformation of dissolved N forms to PON, which we note is particularly relevant given the large water storage reservoirs downstream capable of retaining this material in sediment form, potentially "restarting" the upper Colorado River basin's N cycle. Our modeling approach has further emphasized the uncertainty associated with extremely nutrient poor alpine environments, which are largely neglected in the field of watershed biogeochemical modeling. Through identification of multiple parameter combinations that fit observed data equally well, we have developed a watershed management tool that can be used to describe "typical" seasonal trends, which in turn can guide discussion of anomalous concentration trends and more precise local experimentation and data collection.