Mapping Atmospheric Moisture Climatologies across the Conterminous United States

Spatial climate datasets of 1981–2010 long-term mean monthly average dew point and minimum and maximum vapor pressure deficit were developed for the conterminous United States at 30-arcsec (~800m) resolution. Interpolation of long-term averages (twelve monthly values per variable) was performed using PRISM (Parameter-elevation Relationships on Independent Slopes Model). Surface stations available for analysis numbered only 4,000 for dew point and 3,500 for vapor pressure deficit, compared to 16,000 for previously-developed grids of 1981–2010 long-term mean monthly minimum and maximum temperature. Therefore, a form of Climatologically-Aided Interpolation (CAI) was used, in which the 1981–2010 temperature grids were used as predictor grids. For each grid cell, PRISM calculated a local regression function between the interpolated climate variable and the predictor grid. Nearby stations entering the regression were assigned weights based on the physiographic similarity of the station to the grid cell that included the effects of distance, elevation, coastal proximity, vertical atmospheric layer, and topographic position. Interpolation uncertainties were estimated using cross-validation exercises. Given that CAI interpolation was used, a new method was developed to allow uncertainties in predictor grids to be accounted for in estimating the total interpolation error. Local land use/land cover properties had noticeable effects on the spatial patterns of atmospheric moisture content and deficit. An example of this was relatively high dew points and low vapor pressure deficits at stations located in or near irrigated fields. The new grids, in combination with existing temperature grids, enable the user to derive a full suite of atmospheric moisture variables, such as minimum and maximum relative humidity, vapor pressure, and dew point depression, with accompanying assumptions. All of these grids are available online at http://prism.oregonstate.edu, and include 800-m and 4-km resolution data, images, metadata, pedigree information, and station inventory files.


Introduction
PRISM Climate Group [8]. The interpolation was performed separately in three overlapping regions: western, central and eastern US, and the resulting grids merged to form a complete conterminous US grid. The western region extends from the Pacific Coast to eastern Colorado, central from central Colorado to Lake Michigan, and eastern from eastern Minnesota to the eastern seaboard. Care was taken to include as many islands offshore the US mainland as possible, but undoubtedly some very small islands were missed. To accommodate GIS shoreline data sets of varying quality and resolution, the modeling region was extended offshore several km and generalized to include bays and inlets. However, the gridded climate estimates are valid over land areas only.
Overviews of the data processing and mapping work flows for T dmean and for VPD min and VPD max are diagrammed in Figs 2 and 3, respectively. The process began with hourly and daily data for T dmean , and hourly data for VPD min and VPD max , which were averaged to daily, and then monthly, time steps, with quality screening done at each stage (see Station Data and Processing section). At the monthly time step, all three variables were expressed as dew point depression (DPD) to take advantage of a spatial consistency quality control (QC) method using a version of PRISM called ASSAY (see Quality Control and Calculation of Monthly Values section). T dmean was kept in the form of DPD for interpolation by PRISM, and converted back to T dmean as a grid post-processing step. VPD min and VPD max were converted from DPD back to their original forms before interpolation by PRISM (see Mapping Methods section). A performance evaluation was conducted on the PRISM interpolation process (see Uncertainty Analysis section). As a final step, output grids from PRISM were checked for consistency with previously-created grids of 1981-2010 T max and T min .

Station Data and Processing Data Sources
Data from surface stations, numbering about 4,000 for T dmean (44 km average station spacing) and 3,500 for VPD max and VPD min (47 km average station spacing) were obtained from a variety of sources (Table 1; Fig 1). Data from the National Weather Service (NWS) Automated Surface Observing System (ASOS) came via Unidata's Internet Data Distribution system and the National Climatic Data Center (NCDC) Integrated Surface Hourly/Integrated Surface Data archives (ftp://ftp.ncdc.noaa.gov/pub/data/noaa/), supplemented by the Solar And Meteorological Observation Network (SAMSON) and Integrated Surface Weather Observations CD-ROMs. NWS Cooperative Observer Program (COOP) and Weather Bureau Army Navy (WBAN) data were obtained from (http://www.ncdc.noaa.gov/cdo-web/). USDA Forest Service and Bureau of Land Management Remote Automatic Weather Station (RAWS) data came from archives at the Western Regional Climate Center (http://www.raws.dri.edu), the Real-  To improve marine representation, data were obtained from coastal stations and offshore buoys operated by the NOAA National Data Buoy Center (NDBC; http://www.ndbc.noaa.gov/ data/historical/stdmet). To better define humidity profiles at high elevations, mean monthly upper-air temperature, geopotential height, and relative humidity grid points for the western and eastern United States were obtained at 2.5-degree resolution for the period 1981-2010 from the National Center for Environmental Prediction (NCEP) Global Reanalysis (ftp://ftp. cdc.noaa.gov/Datasets/ncep.reanalysis.derived/pressure). The 5650-m level (~500 hPa) was chosen for the western US, and the 3050-m level (~700 hPa) for the eastern US. (Upper-air data were not needed in the central US, because of a lack of elevated terrain.) These levels were sufficiently far above the highest terrain features to minimize potential errors involved in estimating surface humidity statistics from free air values.

Quality Control and Calculation of Monthly Values
As shown in Table 1, station data were available at either an hourly or daily time step. Daily data were sufficient for T dmean , but hourly data were required to calculate VPD min and VPD max . In some cases, hourly data were provided in the form of RH directly from the instrument. Range checks were performed to screen out values that were either impossible or could cause instabilities in the calculation of related statistics. If RH 0 or RH > 105, the value was set to missing. Otherwise, if RH < 0.5, it was set to 0.5, and if 100 < RH 105 (as sometimes occurs under saturated conditions), it was set to 100.
Dew point (T d ) was derived from hourly RH (%) and ambient temperature (T a,°C ) by calculating the saturation vapor pressure at T a (SATVP a ) ( [26], Eq 21): A consistency check was done to ensure that the calculated T d was less than T a; if not, each was set to missing.
As a range check, T d values that fell below -68°C, or exceeded the statewide extreme maximum temperature record (http://www.ncdc.noaa.gov/extremes/scec/records) were set to missing. In addition, the hourly T d values were subjected to a step check in which each value could not differ from that of the previous hour by more than 10°C; otherwise the current hour's T d was set to missing.
Hourly VPDs were calculated from a combination of T a and T d or T a and RH, depending on which was available. If RH was available, VPD was calculated by: (1) finding SATVP a from Eq 1; (2) finding the saturation vapor pressure at the previously calculated T d : and; (3) calculating VPD as If T d was available, VPD was obtained by calculating SATVP d from Eq 3 and VPD from Eq 4. Each hourly VPD was subjected to a range check, where if VPD > 200 hPa or < 0 hPa, or if VPD ! SATVP a , the value was set to missing.
Three persistence tests were done on T a and T d hourly values for each 24-hour period. If the maximum difference between any pair of hourly observations or the difference between the maximum and minimum value over the 24-hour period was less than 0.1°C, or the standard deviation of all values in the 24-hour period was less than 0.1°C, the day was considered a "flat day" and all variables, including the VPDs, were set to missing.
For networks with hourly data, T dmean values were calculated by averaging available hourly observations, subject to the requirement that at least 18 of 24 observations be non-missing; fewer than 18 non-missing hourly observations resulted in a missing daily value. Daily VPD min and VPD max , as well as T min , and T max, were determined by finding the minimum and maximum hourly value, respectively, subject to the same data completeness requirement.
Daily maximum and minimum T d (T dmax , T dmin ) and mean T a (T mean ) values were calculated for diagnostic purposes. Consistency checks were performed on daily combinations of T dmean and temperature as follows: If T max < T dmax , T min < T dmin , or T mean < T dmean , all variables, including VPD min and VPD max , were set to missing.
The daily values were averaged to create monthly mean T dmean , VPD min , and VPD max for each year of record. A minimum of 85% of non-missing daily values were required for a monthly value to be non-missing [28], [29].
The monthly averages were tested for spatial consistency using the ASSAY QC (quality control) system. ASSAY is a version of PRISM that estimates station values in their absence and compares them to the observed values, a procedure termed cross-validation [8]. The interpolation procedures in ASSAY are exactly the same as in PRISM; the only difference is that ASSAY interpolates to point (station) locations, rather than grid cells. As a rule, a large discrepancy between an observed station value and the interpolated estimate from ASSAY suggests that the station value is unusual compared to nearby stations, and may therefore be erroneous. In previous work, an ASSAY QC analysis was performed for T dmean , with T dmean expressed in the form of dew point depression (DPD) with T min ; that is, DPD = T min −T dmean . (As will be discussed in the next section, the DPD form of T dmean was the favored method of expressing T dmean for interpolation in this study.) In the previous ASSAY analysis of DPD, QC was done manually by trained personnel, and compared with the ASSAY results. It was found that when the absolute difference between the observation and ASSAY estimate of DPD exceeded 4.5°C, the value was considered erroneous by the manual method. Based on these results, in this study ASSAY QC was applied to T dmean expressed as DPD, and station values producing absolute differences between the observation and estimate of more than 4.5°C were flagged as bad and set to missing. Overall, 0.8 percent of the monthly T dmean values were set to missing using this method.
To take advantage of the ASSAY QC screening method, VPD min and VPD max were also expressed as DPD with T min , termed DPD(VPD min ) and DPD(VPD max ), respectively. This involved: (1) calculating the saturation vapor pressure for the station's T min value (SATVP a ) using Eq 1; (2) subtracting the VPD from this value to get SATVP d ; (3) obtaining T d from Eq 2, substituting SATVP d for SATVP a and using 100 for the RH value; and (4) subtracting the resulting T d from T min to obtain the DPD. Manual QC checks found that the same threshold of 4.5°C used in QC'ing DPD was suitable for DPD(VPD min ) and DPD(VPD max ) as well, and those exceeding this threshold were flagged as bad and set to missing. Overall, 0.7 percent of the monthly VPD min values and 1.8 percent of the monthly VPD max values were set to missing using this method.
Monthly values of T dmean , VPD min , and VPD max passing the ASSAY QC screening were averaged over their period of record (POR), or 1981-2010, if available. A 1981-2010 monthly mean calculated using data from at least 23 of these 30 years (75% data coverage) was considered to be sufficiently characteristic of the 1981-2010 period, and was termed a "long-term" station. However, many stations had a POR of fewer than 23 years. Averages from stations with short PORs were subjected to adjustment to minimize temporal biases, as described in [8], Appendix A.

Mapping Methods
Mapping of 1981-2010 mean monthly T dmean , VPD min , and VPD max was performed using PRISM [6][7][8], [30]. For each grid cell, PRISM calculated a local linear regression function between the atmospheric moisture variable and a predictor grid (see Climatologically-Aided Interpolation below). Nearby stations entering the regression were assigned weights based primarily on the physiographic similarity of the station to the grid cell. Physiographic factors relevant to this study were distance, elevation, coastal proximity, vertical atmospheric layer, and topographic position (relative to surrounding terrain). Detailed descriptions of the PRISM model algorithms, structure, input grids, and operation are given in [7], [8], and [30]. Details on the specific modeling approach for this study are given below.

Derive or Interpolate VPDmin and VPDmax?
Given that T dmean is a basic atmospheric moisture variable from which other variables can be derived, early in this study the question was asked: do VPD min and VPD max need to be interpolated separately, or can they be derived from a combination of the T dmean grid and previouslycreated grids of T max and T min ? More specifically, can T dmean be combined with T max to estimate VPD max and T dmean combined with T min to estimate VPD min ? To do this, we must assume that T d = T dmean and T a = T max at the time of VPD max , and T d = T dmean and T a = T min at the time of VPD min . To test these assumptions, monthly averages of VPD min and VPD max , as well as VPD min and VPD max estimated as described above, were calculated and averaged for each month over the ten-year period 2003-2012 at 100 randomly selected ASOS stations with hourly data. Results of this exercise are given in Table 2 for January and July, which bracket the range of monthly values observed during the year.
In Table 2, actual and estimated VPD min and VPD max are expressed as means and percentiles (5 th percentile / mean / 95 th percentile) so that the full distribution of values can be evaluated. Differences between actual and estimated VPD max were generally within about three hPa or five percent across the distributions in both winter and summer. However, a proportion of the estimated VPD min values fell below zero, which is not physically possible. This problem was most serious during winter; in January, a VPD min of less than zero was estimated at 47 of the 100 stations. In these cases, T dmean exceeded T min , resulting in a negative VPD. In reality, the opposite was true, resulting in a positive VPD min . Given this issue, deriving VPD min and VPD max from T dmean, T min, and T max was not considered a viable option, leading us to interpolate VPD min and VPD max directly from station data.

Climatologically-Aided Interpolation
In previous work, constructing the 1971-2000 and 1981-2010 monthly normals for T min and T max used a DEM as the predictor grid (see [8] for details on methods used for mapping T min and T max ). There were nearly 10,000 stations used in the mapping of the 1971-2000 T min and T max normals, and over 16,000 stations used in the 1981-2010 normals. In contrast, there were only 3,500-4,000 stations available for this study, with poor representation at high elevations. Faced with limited station data, we opted for the CAI method of interpolation. CAI uses an existing climate grid to improve the interpolation of another climate element for which data may be sparse or intermittent in time [31][32][33][34][35][36]. This method relies on the assumption that local spatial patterns of the climate element being interpolated closely resemble those of the existing climate grid (called the predictor grid). Uses of CAI fall into two main categories: (1) using a long-term mean grid of a climate element to aid the interpolation of the same element over different (usually shorter) averaging periods; and (2) using a grid of a climate variable to aid the interpolation of a different, but related, climate variable, such as interpolating annual extreme minimum temperature using January mean minimum temperature as the predictor grid [37]. A classic example of the first strategy involves mapping a long-term mean climatology carefully with sophisticated methods, then developing time series grids for shorter averaging periods (monthly or daily) using simpler and faster methods such as inverse-distance weighting to interpolate deviations from the mean climatology to a grid. These deviations can then be added to (e.g., temperature) or multiplied by (e.g., precipitation) the mean climatology to obtain the new grid. Our use of CAI for this study falls into the second strategy, for which we use pre-existing grids of 1981-2010 mean monthly T min and T max as predictor grids in the interpolation process. A series of tests was conducted with ASSAY and PRISM to determine which of the 1981-2010 mean monthly T min and T max grids were the strongest predictors of the spatial patterns of T dmean . T min was found to be a good predictor, which is not surprising given that temperatures often reach the dew point at the time of T min over much of the US. However, inspection of the interpolated grids revealed that in some areas subject to locally low temperatures, such as cold air pools in mountain valleys, the interpolated T dmean exceeded the mean temperature, meaning that long-term mean RH exceeded 100 percent, which was not acceptable. In order to more closely tie the patterns of T dmean to the patterns of the existing monthly T min grids and their relatively large supporting station data sets, each T dmean station value was expressed as the deviation from T min , or DPD (T dmean-T min ). Thus, in the interpolation process, PRISM was run with the 1981-2010 mean monthly T min grid as the independent variable and DPD as the dependent variable. The local regression functions were generally not as strong as they were in the case of T dmean vs. T min , because DPD was essentially the residual from the T dmean vs. T min relationship. Once the mean monthly DPD values were mapped in this manner, the final T dmean grid was obtained by adding the DPD grid to the 1981-2010 T min grid. The result was an interpolated T dmean grid that was highly consistent in an absolute, as well as relative sense, with the associated T min grid. An example of the relationship between 1981-2010 January mean observed DPD and gridded T min for a location north of the Wind River Mountains of Wyoming (43.6N, 109.73W) is shown in Fig 4a. This area is characterized by persistent wintertime cold air pools in mountain valleys, where the humidity is high and T min is less than T dmean (DPD<0). Above these cold pools, T min increases and rises above T dmean (DPD>0).
An additional series of tests was conducted with PRISM and ASSAY to determine whether either of the pre-existing 1981-2010 mean monthly T min and T max grids could be used as predictors of the spatial patterns of VPD min and VPD max . T max was found to be a good predictor of VPD max; and the same was true for T min and VPD min . However, once T dmean was mapped, a superior, second-generation CAI method using T dmean became available. Termed CAI2, this method involves using the result of a CAI interpolation as the predictor grid in another CAI interpolation. Specifically, a first-guess VPD min grid was created by calculating the VPD associated with the combination of the T dmean and T min grids using Eqs 1, 3 and 4. Similarly, a firstguess VPD max grid was found by calculating the VPD associated with the combination of the T dmean and T max grids. These "first-guess" predictor grids represented what the VPD min and VPD max spatial patterns would be like if T d was held constant throughout the day at T dmean , and VPD min occurred at the time of T min and VPD max occurred at the time of T max . While these assumptions are not always correct (see Table 2), they are sufficiently reasonable to produce predictor grids that closely match the relative spatial patterns of VPD min and VPD max .
An example of the relationship between 1981-2010 June mean observed VPD min and firstguess predictor grid VPD min for a location near Las Vegas, Nevada (35.79N, 115.26W) is shown in Fig 4b. In this desert environment, VPD min is still relatively large (14-22 hPa), even at its minimum for an average day in June. The lowest VPD min values are found at higher elevations, where temperatures are cooler. An example of the relationship between 1981-2010 October mean observed VPD max and the first-guess predictor grid VPD max for a location in San Francisco, along the California coastline (37.76N, 122.45W), is shown in Fig 4c. In this case, the lowest VPD max values are found along the immediate coast where temperatures are cooler and moisture is greater, with higher values in warmer and drier inland areas.

PRISM Weighting Functions
During PRISM interpolation, upon entering the local linear regression function for a pixel, each station was assigned a weight based on several factors. The combined weight (W) of a station is given by the following: where W c , W d , W z , W p , W l , and W t are cluster, distance, elevation, coastal proximity, vertical layer, and topographic position weights, respectively, and F d and F z are user-specified distance and elevation weighting importance scalars [8], [38]. All weights and importance factors, individually and combined, are normalized to sum to unity. PRISM weighting functions not enabled in this study were topographic facet weighting, which is used primarily to identify rain shadows in precipitation mapping, and effective terrain height weighting, which identifies orographic precipitation regimes based on terrain profiles [8]. Table 3 summarizes how the PRISM climate regression and station weighting functions accounted for physiographic climate forcing factors, and provides citations for further information. Cluster weighting was used to keep clusters of stations that represent similar local conditions from dominating the regression functions; both horizontal and vertical separations were considered [8]. Distance and elevation weighting were used to accommodate the spatial coherence of climatic regimes, both horizontally and vertically [8].
Coastal proximity weighting accounted for sharp gradients in temperature and atmospheric moisture from coastlines to interior regions [7], [8], [30]. Atmospheric layer weighting was useful where temperature inversions occurred, by delineating gradients in the relationship between temperature and atmospheric moisture along the transition from the relatively humid boundary layer near the earth's surface to the drier free atmosphere above [7], [30]. Topographic position weighting differentiated topographically sheltered locations where cold, moist air may accumulate, from more exposed locations not susceptible to cold air pooling, such as hill slopes and ridge tops [8], [38].
Relevant PRISM control parameters are listed in Table 4. A minimum of 25 stations were required for each pixel's regression function, and the radius of influence was expanded from a minimum of 20 km to as far as necessary to reach the 25-station threshold. T dmean slope bounds were expressed as the change in DPD per unit T min from the predictor grid, and VPD slopes expressed as the change in VPD per unit first-guess VPD from the predictor grid. The maximum and minimum allowable regression slopes were derived from test runs of PRISM where distributions of slope values were created and outliers examined to determine validity, combined with performance assessments using ASSAY. In the final PRISM interpolation runs, slopes falling outside the designated bounds were set to values that fell halfway between the default slope and the bound that the slope violated (either the maximum or minimum). In the western region, allowable slopes in the relationship between VPD max and the first-guess VPD max were constrained to fall between 0.99 and 1.01, so as to avoid a rare circumstance of predicting a VPD max that might approach the value of SATVP a at T max (hence producing a very low RH) in the warmest and driest areas. The exponential relationship between temperature and SATVP a steepens at higher temperatures, leaving less room for error in extremely low-humidity situations. Unconstrained, the average slope of the 5.5 million PRISM regression functions (one per pixel) for July (the warmest month) across the western region was 1.017, with the 10th percentile at 0.964 and the 90th at 1.15, so this constraint had a very slight effect.
Station weighting parameters were very similar among the three variables, and were generally set to the same values across the three regions. The exception was the elevation weighting exponent for VPD max , which was set to 0.5, compared to 1.5 for the other variables. T max and VPD max occur primarily during the day, when the atmosphere is more likely to be well-mixed, and vertical gradients in atmospheric moisture more slowly varying. As a rule, the weighting exponents were set to the lowest values possible to achieve the desired effects. Being parsimonious with the weighting exponents ensured that the station data entering the local regression functions were not down-weighted unnecessarily, which can weaken the statistical results by decreasing the effective number of stations in the regression.
A number of small but noticeable inconsistencies in DPD values between adjacent stations were observed during the summer months in some agricultural areas, most notably in eastern Colorado. Further investigation revealed that relatively low DPD values were coming from two networks: AGRIMET and COAGMET. Stations in these networks are typically sited in or near irrigated fields for use in water management calculations, resulting in more humid conditions than locations away from irrigated areas. Given that the data from these networks were of otherwise high quality, it was unreasonable to simply omit these two networks outright. This issue raised questions about whether the grids should, or could, represent conditions over irrigated land. Concerns over attempting to do so are summarized as follows: 1. Siting requirements for these station networks state that the station must be located in, or immediately adjacent to, an irrigated field to be representative of an irrigated environment. This suggests that the effects of irrigation on humidity are highly local, and which may not extend beyond one 800-pixel. In practice, however, these stations influenced the interpolated estimates many km away, well beyond the likely limits of irrigation in many instances.

2.
A complete picture of the location and extent of irrigated lands across the country was unavailable; further, there was no interpolation mechanism in place to constrain the influence of stations on irrigated fields to just those lands. It was also unclear how to modify DPD values in the many irrigated areas not represented by station data.
3. We, and likely others, will be using these atmospheric moisture climatologies as the predictor grids for CAI mapping of monthly and daily time series of the same variables. Given that many of these time series will extend back a century or more, when crop patterns and irrigation practices were very different than today, patterns of humidity caused by today's irrigation patterns could be propagated to times when they are not applicable.
Given these concerns, a subjective middle ground was taken, where a few (<10) stations causing the most severe spatial discrepancies were omitted from the T dmean dataset, and the rest retained. For consistency, the same stations were also omitted from the VPD min and VPD max datasets.

Grid Post-processing
Once monthly grids of T dmean , VPD min , and VPD max were generated with PRISM, post-processing checks were made to ensure that the interpolated values did not exceed reasonable ranges, and that consistency was maintained among the three variables and with the pre-existing 1981-2010 mean monthly T min and T max grids. Given that the 1981-2010 monthly means are by definition made up of yearly values above and below these means, the acceptable ranges were defined to be more restrictive than what would be allowed on, say, a single day. If T dmean > (T mean − 0.5°C), T dmean mean was set equal to T mean − 0.5°C. If VPD min < 0.01 hPa, VPD min was set equal to 0.01 hPa. If VPD min < 0.001 SATVP a at T min (i.e., RH > 99.9%), VPD min was set equal to 0.001 SATVP a at T min . If VPD max > 0.95 SATVP a at T max (i.e., RH < 5%), VPD max was set equal to 0.95 SATVP a at T max . The number of grid cells affected by these restrictions was limited to a handful in remote mountainous regions of the western US.

Results and Discussion Spatial Patterns
Dew Point. Maps of 1981-2010 mean T min and T dmean in January and July are shown in Figs 5 and 6, respectively. The general patterns of T dmean are similar to those of T min , especially in winter. In January, values are lowest in the northern tier of states, and in cold pools along valley bottoms in the West (Fig 5). T dmean is high in the eastern part of the country in July, exceeding 20°C in much of the Southeast, which is exposed to the transport of moist air from the Gulf of Mexico (Fig 6). Despite relatively warm temperatures, July T dmean is low in the dry Intermountain West. A notable exception is the Southwest, where the summer monsoon produces locally elevated T dmean values.
Interpolated DPDs between T min and T dmean (T min -T dmean ) for January and July are shown in Fig 7. DPD is the original variable interpolated with PRISM before being added to the T min grid to obtain T dmean . In January, DPDs are mostly near zero or negative, meaning that T dmean is similar to, or greater than, T min over much of the country. In the northern US and in persistent cold pools in large inland valleys of the West, T dmean is as much as 3-5°C greater than T min (Fig 7a). For example, the 1981-2010 mean January T dmean at the International Falls, MN ASOS station is -18.1°C and the mean January T min is -21.3°C. In contrast, T dmean averages  6-8°C lower than T min in the southwestern US, which is a testament to the region's aridity, even in winter.
In July, T dmean and T min have similar values in the eastern half of the country. Interestingly, major metropolitan areas such as St. Louis, Kansas City, Minneapolis, and Detroit, and several eastern seaboard cities, are visible as small areas of relatively warm T min values, resulting in positive differences between T min and T dmean of 1-3°C (dark green spots in Fig 7b). The West is Mapping Atmospheric Moisture Climatologies dominated by larger positive differences, where T min is much higher than T dmean . The changeover from near-zero to positive differences roughly follows the 100 th meridian. Notable exceptions are the West Coast, where cool, marine air penetrates inland, and areas of the Southwest affected by the summer monsoon. The core area of maximum DPD is centered on southeastern California, Nevada, western Arizona, and lower elevations of Utah, where T dmean averages 10-25°C lower than T min . A combination of low moisture content (low T dmean ) and limited time  for nighttime cooling (high T min ) during summer contribute to these high DPDs. Methods that estimate T dmean by assuming it is equal to T min would experience the largest errors in this region [22].
Vapor Pressure Deficit. Patterns of 1981-2010 mean VPD min in January and July are shown in Fig 8. In winter, VPD min is low (<1 hPa) over most of the country, due to a combination of low temperatures and relatively small differences between T a and T d during the morning hours (Fig 8a). Vapor pressure varies exponentially with temperature; for example, a difference between T a and T d of 1°C amounts to a VPD of 0.15 hPa at 20°C, but less than 0.05 hPa at 0°C. In July, VPD min is also low in the eastern US, but exceeds 5 hPa over much of the West, reaching maxima of over 20 hPa in the desert southwest (Fig 8b).
Spatial patterns of 1981-2010 mean T max and VPD max in January and July are shown in Figs 9 and 10, respectively. Patterns of VPD max roughly follow those of T max in January, with the lowest values in the northern tier and western mountains, and the highest in the southern states doi:10.1371/journal.pone.0141140.g009 (Fig 9). In July, the area of high VPD max expands considerably, exceeding 25 hPa over much of the western US, and reaching a maximum of over 60 hPa in the desert southwest (Fig 10). Coastal regions of the West exhibit relatively low VPD max values, as they did for DPD. Similarly low values are found at higher elevations, illustrating the strong spatial relationship between VPD max and T max . In the east, VPD max values in the Midwest and northeast are mostly less than 20 hPa, while those in the southeast can range above 25 hPa; a notable maximum occurs in the piedmont region of Georgia and the Carolinas. Relative Humidity Derivation. RH can be derived from VPD and T a by calculating SATVP a at the desired temperature from Eq 1, and factoring in the appropriate VPD: On a gridded basis, if only T max and T min are available to estimate T a , and only VPD min and VPD max available to estimate VPD, grids of minimum RH (RH min ) can be approximated by substituting VPD max for VPD and T max for T a , and maximum RH (RH max ) can be similarly derived using VPD min and T min . In performing these derivations, we make the assumption that T a = T min and VPD = VPD min at the time of day when RH max occurs, and T a = T max and VPD = VPD max at the time of day when RH min occurs. To evaluate the error introduced by these assumptions, we calculated 2003-2012 mean monthly RH min and RH max for the same 100 ASOS stations used in Table 2 and compared them with the estimated RH min that would result from substituting VPD max for VPD and T max for T a , and RH max resulting from using VPD min and T min . As expected, T max and VPD max were slightly higher than T a and VPD at the hour of RH max , and T min and VPD min were slightly lower than T a and VPD at the hour of RH min ( Table 5). As a result, the substitutions resulted in an overestimation of RH min and an underestimation of RH max . The differences averaged less than five percent in all months, suggesting that the derived grids are reasonable, but not exact, measures of the true RH.
Maps of estimated 1981-2010 mean January and July RH min , estimated from grid values of VPD max and T max , reveal some interesting features not easily seen in the maps of VPD max (Fig  11). In January, RH min values are as low as 10-15% on the lee side of the Rocky Mountains, associated with dry, Chinook (downslope) winds produced by a strong westerly jet stream. In contrast, RH min values exceed 70% in the winter-wet Pacific Northwest. RH min values are also above 70% in California's Central Valley and Idaho's Snake Plain, where extended periods of high pressure and calm winds promote temperature inversions that often result in persistent fog and low clouds (Fig 11a). Higher RH min values in the upper Midwest reflect frequent cloudy weather during winter in this region.
RH min patterns in July show a divided country, with a dry west and moist east (Fig 11b). The exception is the West Coast, where cool, moist onshore flow maintains relatively high Table 5. Results of temperature and vapor pressure deficit substitution in the derivation of minimum and maximum relative humidity. 2003-2012 January and July averages from 100 randomly-selected ASOS stations. Actual and estimated RH values are expressed as a distribution: 5 th percentile / mean / 95 th percentile.

Variable
January July humidity values throughout the day. The lowest RH min (<10%) values are centered in Nevada, and more generally in the Great Basin. In the east, the highest values are found in regions that receive substantial moisture from the Gulf of Mexico, such as the southern Appalachians and Mississippi Valley.

Uncertainty Analysis
Estimating the true errors associated with spatial climate data sets is difficult, and subject to its own set of errors [36]. This is because the true climate field is unknown, except at a relatively small number of observed points, and even these are subject to measurement and siting uncertainties (as has already been noted in the case of irrigated land, for example). Leave-one-out cross validation (C-V) approach is the most common evaluation method where each station is omitted from the dataset one at a time, the station value estimated in its absence, and the estimate and observation compared. The mean absolute error (MAE) and bias are typically calculated once the process is complete. While this approach is commonly used to assess error in interpolation studies, and is reported here, there are several disadvantages. An obvious disadvantage is that no error information is provided for locations where there are no stations. The single-deletion method favors interpolation models that heavily smooth the results, so that deletion of one station is relatively unimportant to the stability of the estimate. Randomly withholding a larger percentage of the station data at once can help to minimize this issue, as well as provide more robust error statistics. Withholding a stratified sample from the analysis is useful in detecting specific weaknesses or issues in the interpolation, as was done to investigate irrigated stations (also, see [36] for an example).
In this study, uncertanties in the mapped estimates were initially estimated by performing a C-V exercise with ASSAY, and the results compiled for each month for each of the three modeling regions (west, central, and east).
Even accounting for weaknesses in the C-V methodology, these errors are underestimates of the actual C-V uncertainty. One overlooked aspect of CAI is that it relies on predictor grids which have their own interpolation errors. These errors accumulate from one CAI generation to the next. For T dmean , uncertainties in the interpolated T min predictor grids must be accounted for. In turn, VPD min relies on interpolated T min and T dmean , and VPD max relies on T max , as well as T min and T dmean .
To quantify the effects of error propagation on the CAI MAEs, the predictor grid interpolation error was introduced at each CAI step by using ASSAY interpolated estimates at station locations in their absence instead of the predictor grid values, which already have those station values built in. For T dmean , T min values for all stations used in the mapping of the T min predictor grid were estimated in their absence using ASSAY, and estimates common to both the T min and T dmean station datasets used as the values of T min in an ASSAY C-V exercise for T dmean . T min values for stations used in the interpolation of T dmean but not T min , such as those from the SCAN and COAGMET networks, could not be estimated by ASSAY, because they were not used to create the predictor grid. However, they could still be included in the T dmean C-V error estimation because, by definition, the estimates at their location on the T min predictor grid were made in their absence.
For VPD min , ASSAY estimates of station T dmean values, predicted using ASSAY estimates of T min as described above, were used in combination with ASSAY estimates of T min to form station values of first-guess VPD min , used as the predictor in the interpolation of VPD min . The result was station values of first-guess VPD min that accounted for errors in the interpolation of both T min and T dmean . These were used as the values of first-guess VPD min in a C-V assessment of VPD min with ASSAY. VPD max involved the same steps as VPD min , except that first-guess VPD max values were formed from a combination of ASSAY estimates of T max and T dmean . Again, a C-V exercise for VPD max was performed with ASSAY. Table 6 reports monthly ASSAY cross-validation MAEs for each of the modeling regions. MAEs that account for CAI interpolation error propagation are denoted with a CAI or CAI2. Figs 12,13 and 14 show the distribution of cross-validation absolute errors at stations across the country for T dmean , VPD min , and VPD max , respectively.
Dew Point. In general, T dmean interpolation errors in the west were greater than those in the central and eastern US, due primarily to terrain-induced complexities in the vertical distribution of moisture and temperature. This is evidenced by higher absolute errors in the Rocky Mountains, Cascades, and Sierra Nevada (Fig 12). Regionally, MAEs for T dmean , interpolated as DPD (T min −T dmean ), were less than 1°C in the west and less than 0.5°C in the central and east (Table 6). CAI MAEs that accounted for error propagation were 0.1-0.2°C larger than those that did not in the west, but less than 0.05°C larger in the central and east, owing to larger T min interpolation errors in the west.
The T dmean C-V analysis showed that the MAEs were inflated by systematic positive biases (lower observed DPD and higher T dmean than predicted) in the COAGMET and AGRIMET  networks. In fact, bias accounted for nearly all of the MAE. As discussed previously, stations in these networks were typically sited in or near irrigated fields for use in water management calculations. Despite subjective omission of stations producing the largest spatial inconsistencies, network-wide biases were still noticeable. Fig 15a and 15b show monthly T dmean MAE and bias for each of the major station networks when each network is entirely eliminated from the dataset at one time. Plots are for the central region, to better control for the effects of complex physiographic features on interpolation bias; the exception is AGRIMET, which operates in the western region only, but stations are in flat agricultural areas, so interpolated predictions should be otherwise relatively unbiased. The peak MAE and bias in mid-summer correspond to the maximum difference in atmospheric moisture content one would be expect between irrigated and non-irrigated land. At their summer peaks, the COAGMET MAE and bias were 1.7 and 1.6°C, respectively. Peak AGRIMET MAE and bias were 1.5 and 1.4°C, respectively. These are in contrast to RAWS, for which the peak MAE was 1.4°C but the bias was only 0.1°C in the central region. The MAEs for these networks were much larger than the overall MAE for the central region of less than 0.5°C (Table 6).
A second source of systematic bias was found regionally in RAWS stations (Fig 15c). In the western region, biases were negative (higher observed DPD and lower T dmean than predicted). In this region, the culprits may have been both local land cover and location. RAWS stations in the West are used primarily for fire weather applications, and are typically sited in open, ventilated areas away from transpiring vegetation. In addition, many RAWS stations are located on exposed terrain above locally humid cold air pools, and in foothill locations, which can be above larger-scale valley inversions, where a dry, free atmosphere aloft overlies relatively moist air below. Spring is the time of year when good vertical mixing causes a reduction in the incidence of inversions and cold air pools, which may explain the minimal bias at this time of year. The case for siting and location as causes for bias appears to be supported in the central region, where RAWS biases are near zero (Fig 15c). It is not clear why biases become somewhat positive (lower observed DPD and higher T dmean than predicted) in the eastern region. One explanation may be that RAWS stations, often located in heavily forested areas in the east, are sampling more locally humid environments than other networks located at airports and in developed areas. The seasonal maximum bias in summer would correspond with the time of maximum transpiration from vegetation.
Vapor Pressure Deficit. MAEs for VPD min and VPD max were calculated regionally as percentages and in absolute hPa units (Table 6) and absolute errors mapped across the country (Figs 13 and 14). MAEs in absolute units were typically larger in summer than in winter, in keeping with characteristically larger vapor pressure deficits in summer due to higher temperatures (Table 6). Regionally, VPD min MAEs were less than 1 hPa in all months and regions, owing to relatively small absolute values. In January, the highest absolute VPD min errors were concentrated in the desert southwest, where moisture deficits are greatest (Fig 13). These higher errors spread to most of the west in July, when deficits are seasonally high. Regional MAEs ranged from roughly 30-40 percent in the west, 20-30 percent in central, and 25-50 percent in the east. CAI2 MAEs that accounted for error propagation were 0.1-0.5 hPa and 15-20% larger in the west, 0.05-0.15 hPa and 5-15% larger in the central, and 0.05-0.15hPa and 10-25% larger in the east.
Regional VPD max absolute MAEs were approximately double those for VPD min in absolute units. Percentage MAEs were accordingly much lower, ranging from roughly 6-11, 3-5, and 4-6 percent in the west, central, and east, respectively. CAI2 MAEs that accounted for error propagation were 0.1-0.5 hPa and 0.5-1.5% larger in the west, and 0.05-0.25 hPa and 0.5-1.5% larger in the central and east. MAEs were again inflated by systematic positive biases (lower observed VPD observations than predicted) from COAGMET and AGRIMET. The spatial and temporal distributions of absolute errors were similar to that of VPD min , with the highest January errors in the west, spreading to the east in July (Fig 14). July VPD max errors were higher in the eastern US than those of VPD min , because warm maximum temperatures in this region produce significant daytime vapor pressure deficits, despite relatively high RH values.

Summary and Conclusions
Long-term normal grids of 1981-2010 mean monthly average daily T dmean , VPD min and VPD max were developed for the conterminous United States. The T dmean grids update previous unpublished T dmean normals used as CAI predictor grids in PRISM monthly time series  datasets, and to our knowledge, the VPD min and VPD max normal grids are the first of their kind. Interpolation of the long-term monthly averages was performed using PRISM. Nearby stations entering the PRISM local regression functions (one per pixel) were assigned weights based on the physiographic similarity of the station to the grid cell that included the effects of distance, elevation, coastal proximity, vertical atmospheric layer, and topographic position. Relatively few stations were available for these variables, prompting us to use CAI to improve interpolation accuracy. In the CAI process, 1981-2010 monthly T min grids served as predictor grids for the interpolation of T dmean , expressed as the dew point depression (DPD = T min  Table 1. southwest. VPD min was very low over most of the country, due to a combination of low temperatures and relatively small differences between ambient and dew point temperatures in the morning hours. Patterns of VPD max roughly followed those of T max in winter, with the lowest values in the northern tier and western mountains, and the highest in the southern states. In summer, the area of high VPD max expanded considerably, reaching a maximum of over 60 hPa in the desert southwest. Coastal regions of the West exhibited relatively low VPD max values, where cool, marine air penetrates inland.
A PRISM interpolation uncertainty analysis was performed using C-V exercises. Since CAI relies on predictor grids which have their own interpolation errors, these errors were included in the overall error estimates. To quantify the effects of error propagation on the CAI MAEs, the predictor grid interpolation error was introduced at each step by using ASSAY predictions at station locations instead of the interpolated predictor grid values. When accounting for error propagation, MAEs for T dmean were 0.8-1.1°C in the west modeling region and less than 0.5°C in the central and east, with the greatest errors typically occurring in summer. VPD min MAEs were generally less than 1 hPa in all months and regions, and VPD max MAEs about double those values.
Overall, accounting for CAI error propagation increased C-V errors, but not as much as expected, especially when compared to the initial MAEs associated with the T max and T min predictor values. One reason for this appears to be that the errors were not systematically biased in one direction; that is, overestimates and underestimates partly canceled each other, producing lower net error increases.
Some stations in the AGRIMET and COAGMET networks were found to have consistently high T dmean and low VPD values during the summer months. Stations in these networks were typically sited in or near irrigated fields for use in water management calculations, resulting in more humid conditions than locations distant from irrigated areas. This raised questions about whether the grids should represent conditions over irrigated land. A middle ground was taken, where a few stations causing the most severe spatial discrepancies were omitted from the datasets. In addition, the siting of RAWS stations was posited as the reason for lower than expected humidity in the west and higher than expected in the east. These issues highlight the effect that local topographic position and land use/land cover properties can have on the spatial patterns of atmospheric moisture content and deficit.
In combination with existing PRISM grids of T min and T max , grids of T dmean , VPD min and VPD max allow the user to derive many other atmospheric moisture variables, such as minimum and maximum RH, vapor pressure, and DPD. Accompanying assumptions may need to be made, however, such as those outlined in the derivation of RH min and RH max. These new grids will serve as the predictor grids in second-generation CAI to produce an updated version of the PRISM T dmean monthly time series dataset that covers the years 1895-present [9], and initial versions of monthly VPD min and VPD max time series grids for the same period using third-generation CAI. Similar methods will be used to produce initial versions of daily T dmean , VPD min and VPD max time series datasets, spanning 1981-present. All of the PRISM normal grids discussed here are available online at http://prism.oregonstate.edu, and include 800-m and 4-km resolution data, images, metadata, pedigree information, and station inventory files. Links to the papers cited in Table 3 and elsewhere are also available from this website.
improved manuscript. We are grateful to all of the station data providers, without whom this mapping effort would not have been possible.