A high-resolution map of coastal vegetation for two Arctic Alaskan parklands: An object-oriented approach with point training data

Bering Land Bridge National Preserve and Cape Krusenstern National Monument in northwest Alaska have approximately 1600 km of predominantly soft-sediment coastlines along the Chukchi Sea, a shallow bay of the Arctic Ocean. Over the past decade, marine vessel traffic through the Bering Strait has grown exponentially to take advantage of new ice-free summer shipping routes, increasing the risk of oil spills in these fragile ecosystems. We present a high-resolution coastal vegetation map to serve as a baseline for potential spill response, restoration, and change detection. We segmented 663 km2 of high-resolution multispectral satellite images by the mean-shift method and collected 40 spectral, topographic and spatial variables per segment. The segments were classified using photo-interpreted points as training data, and verified with field based plots. Digitizing points, rather than polygons, and intersecting them with the segmentation allows for rapid collection of training data. We classified the map segments using Random Forest because of its high accuracy, computational speed, and ability to incorporate non-normal, high-dimensional data. We found creating separate classification models by each satellite scene gave highly similar results to models combining the entire study area, and that reducing the number of variables had little impact on accuracy. A unified, study area-wide Random Forest model for both parklands produced the highest accuracy of various models attempted. We mapped 18 distinct classes, with an out-of-bag error of 11.6%, resulting in an improvement to the past per-pixel classification of this coast, and in higher spatial and vegetation classification resolution. The resulting map demonstrates the utility of our point-based method and provides baseline data for incident preparedness and change detection. Elevation is highly correlated with the ordination of the vegetation types, and was the most important variable in all tested classification models. The vegetation classification brings together the largest amount of vegetation data for the Chukchi Sea coast yet documented.


Introduction
The dynamic, soft-sediment Arctic coastal ecosystems of northwest Alaska offer important ecological services and habitat for a wide variety of Arctic and migratory species [1]. These shorelines include vast and shallow lagoons with fractal-patterned interiors, large estuaries teeming with waterbirds, barrier islands, sandy capes, salt marshes, mudflats, brackish wetlands, and the world's northernmost eelgrass beds [2]. Like those of eastern North America before European contact, the northwest Arctic shorelines are wild, productive, and extensive. These areas are globally significant for a variety of life forms; in particular the lagoon systems and Nugnugaluktuk Estuary (Fig 1) of BELA are identified as Global Important Bird Areas by the Audubon Society [1]. These lagoons also serve as important habitat for a diversity of fish and bird species, including whitefishes and other salmonids, which are important subsistence resources in the region [3].
The enabling legislation [4] of both Bering Land Bridge National Preserve (BELA; Fig 1) and Cape Krusenstern National Monument (CAKR; Fig 1) mandates protection of subsistence resources, plant communities, coastal formations, migratory bird habitat, fish and wildlife, and additionally archaeological sites in CAKR. CAKR was established in large part to preserve the history of ancient human settlements along the coast which arose in concert with the rich coastal resources, and which provide a detailed archaeological chronosequence of human habitation spanning over 5000 years [5,6]. The Arctic coast continues to be home to Native Iñupiaq communities, for whom marine mammal and fish harvests are the center of their statutorily protected way of life.
Arctic sea ice volume, extent and duration have declined dramatically for decades [7], leaving the coast increasingly vulnerable to fall and winter storms. The tidal range along the northwest Alaskan coast is small, estimated at a 30 cm daily average range [8], but storm surge and atmospheric pressure can drive water levels several meters beyond mean sea level [9,10]. Reduction in sea ice means a winter storm now has greater potential to drive oil or other contaminants deep into the sensitive habitats described in this study.
In the summer months, the Arctic ice pack is now sufficiently far north to allow for passage of vessels via both the Northern Sea Route (above Siberia) and the Northwest Passage (through the Canadian Archipelago to Greenland). Over the past decade, vessel traffic has grown exponentially through the Bering Strait with the retreat of summer sea ice and now includes a significant number of tankers and cargo ships [11]. At the same time, planning and initial stages of development for oil and gas extraction have progressed in the National Petroleum Reserve-Alaska, as have plans for the development of deepwater ports at Nome and Cape Blossom (near Kotzebue). These developments now place BELA and CAKR at risk of a marine incident without adequate pre-disturbance vegetation mapping.
individually classified, frequently result in a 'salt-and-pepper' effect [17]. Grouping pixels into segments, or objects, allows more homogenous clusters to be classified, as well as the collection of object-level traits such as size and shape [17][18][19]. Geographic object-oriented landcover classification has been widely used in urban and agricultural areas, where the borders between types are clearly delimited and segment shape is often a highly useful parameter [20,21]. Its use is undeveloped landscapes is often limited to classifying a handful of broad landcover classes; e.g. forested vs unforested [22,23]. Mapping of higher resolution landcover classes (>8 classes) is also typically divided by functional group (e.g. [24]), or physiognomy (e.g. [25]), not plant associations.
To assist in park preparedness, we have produced a vegetation classification and detailed map of the coastal vegetation communities. The Jorgenson et al. [12] raster map includes only two brackish water vegetation classes-Coastal Brackish Sedge-Grass Meadow and Coastal Brackish Willow Shrub. In reality, coastal salt marsh is among the most productive habitats along the BELA and CAKR coasts and represents a complex mosaic of vegetation types, each of which hosts different bird species using these areas for nesting, foraging and pre-migration  BELA coast falls into a multi-species bird hotspot represented as a  global Important Bird Area, fine-scale delineation of habitat types helps discriminate habitats  into polygons used by dozens of species for different purposes at different times of the year  [26]. These types include lagoon interior marshes at different tidal elevations, estuarine salt marshes at different elevation and salinity, and several other halophytic sedge and shrub-dominated classes. By mapping vegetation at a higher resolution than Jorgenson et al. [12], we hope to enable far more insight aimed at potential uses for spill response, post-spill restoration, Natural Resource Damage Assessment (NRDA), and the scientific community at large for uses in vegetation monitoring, climate change, and sea level rise detection. In the event of an oil spill, a fine-scale habitat baseline is the single most valuable tool for response, restoration and NRDA [1]. We aimed to map at least 10 distinct coastal vegetation types, with a minimum polygon size of 200 m 2 and smooth polygon boundaries when displayed at 1:10,000 scale. To meet NPS standards for vegetation classification, we mapped USNVC vegetation types. NPS minimum accuracy for vegetation classification is 60% [26]; our target accuracy was 80% as is typical of similar object-oriented classifications [22,24,27].

Study area
Coasts of both parks lie along the Chukchi Sea, an embayment of the Arctic Ocean north of the Bering Strait (Fig 1). The Chukchi Sea is a shallow continental shelf, averaging 50 m deep. The primary coastal features of both parks' lands are gravelly or sandy barrier island complexes, backed by lagoons [28,29]. The beach ridges of both coasts initiated nearly 4000 years BP as the sea level stabilized [30,31]. Lagoons and estuaries are bordered by salt marshes (Fig  2A). Other parts of the coast are ice-rich permafrost bluffs with a narrow band of beach ( Fig  2B). These coastal bluffs are found where the lagoon and barrier complexes are absent: in CAKR most notably along much of the west-facing coast, and in BELA near Kitluk River and along the coast of Kotzebue Sound. Bedrock outcrops occur along the coast in Goodhope Bay ( Fig 2C) and Ugrurak Bluff, north of Tasaychek Lagoon in Cape Krusenstern. The mean annual temperature along the BELA and CAKR coasts is -5˚C [32].

Field methods
We sampled 362 vegetation plots in Bering Land Bridge National Preserve and Cape Krusenstern National Monument in the summers of 2018 and 2019. Due to the remote, roadless nature of these parklands, access was by helicopter and foot. Using satellite imagery, we selected sampling areas to maximize high spectral and landform diversity and to minimize flight time and expense. Plots were arrayed subjectively, attempting to capture as much of the ecological variation as possible in the imagery across a broad geographical area. On the ground, plots were located within areas of homogeneous vegetation, as well as mosaics (multiple types within 200 m 2 ) and gradual eco-tones, but plot locations that spanned distinct boundaries between types were avoided. Our intent was to obtain a representative sample of the vegetation while avoiding plot locations that were likely to mix data from plant assemblages that were likely to represent more than one type in the final classification. Plots were circular with an 8-meter radius (200 m 2 ). All vascular plants with at least 1% cover within the plot were identified to species in the field or collected for later identification. Plants below 1% cover were identified in the field to species or genus. Taxonomy follows the vascular plant inventory of the National Park Service (NPS) Arctic Inventory and Monitoring Network (ARCN) [34]. Mosses and lichens were identified to species or species group, following the ARCN Vegetation sampling protocol [35]. For all species in the plot, ocular estimates of cover were made using the following five classes: 0:<1%, 1:1-5%, 2:5-25%, 3:25-50%, 4:50-75%, 5:>75%, a modified Daubenmire scale [36]. GPS coordinates were collected at plot center on Trimble Geo XH 6000 or Geo 7X model receivers, and post-process corrected to the Kotzebue CORS reference station. The mean horizontal error was 30 cm, with 99% in the range of 10-50 cm. Electrical conductivity (EC) in units of μS/m, an estimate of salinity, was measured at plot center with a soil probe (Hanna Instruments Direct Soil EC Tester). Photos were taken in each cardinal direction. Cover of functional groups (e.g. shrubs, forbs, graminoids) was estimated visually. Post-fieldwork, a hydric index-a proxy for site wetness-was calculated by weighted averaging using wetland species weights from the Federal Interagency Committee for Wetland Delineation [37].
Along with the 362 plots collected in the summers of 2018 and 2019 (two of which were non-vegetated), we incorporated 37 coastal plots collected in 2003 by Jorgenson et al. [12] and 35 coastal plots established in 2013 as part of the ARCN Vegetation Node Sampling protocol [35]. These data, measured in percent cover, were converted to the above cover categories for comparison with our data, and synonymy was standardized to the NPS Arctic Network plant species list [34]. Non-vascular plant diversity was reduced to the species list collected for the ARCN vegetation protocols [35]. To be comparable to the EC units collected in this study EC values for Jorgenson et al. [12] plots were truncated to 4000 (the maximum value read on our instruments). EC was not collected for the 35 ARCN plots; we imputed these values through a k-nearest neighbor model using the R package VIM [38] (data available: [39]).

Classification methods
Our classification goal was to balance the recognition of as many discrete vegetation types as possible with the ability to distinguish these types via photo interpretation and automated classification on the map product. Our final goal of the classification methods was the description of vegetation types that can be keyed dichotomously, and we used both analytical and subjective tools to separate vegetation types.
We initially attempted to classified plots via previously existing vegetation classification schemes [12,40]. Some plots clearly fit type descriptions, others were lumped into overly broad categories, and others did not match any described vegetation types. We assigned plots to types which matched descriptions; these included Halophytic salt marsh, Dunegrass beach meadow, Crowberry tundra and Freshwater wet sedge meadow [12,40]. We used nonmetric multidimensional scaling (NMS) ordination [41] iteratively to test whether these pre-existing types were coherent in species space, whether further subdivisions were possible, and how unlabeled plots clustered.
We began our analyses with a matrix of 432 plots and 302 species. A Bray-Curtis dissimilarity matrix was used as the basis for ordinations. As ordinations are reductions in dimensionality via co-occurring species, plots with only one species do not ordinate. Thus, plots assigned to the predominately monoculture classes comprised of Arctophila fulva, Carex lyngbyei, Carex saxatilis and Hippuris tetraphylla were removed from ordination analyses, as were outliers, defined as plots more than two standard deviations above the mean Bray-Curtis dissimilarity [42] to all other plots [43]. These plots were statistical as well as ecological outliers. Thirteen outlier plots and 31 monoculture type plots were removed from the ordination analyses. All species occurrences with less than 1% cover, and species with less than three plot occurrences total, were removed from the dataset, leaving a matrix of 386 plots and 117 species. Preliminary ordinations were used to find whether further subdivisions of types were coherent in species space. Outliers and boundaries between types were distinguished using sorted table analyses [44,45].
Non-metric multidimensional scaling (NMS) ordinations were run via the R package 'vegan' using the function 'metaMDS' [46]. A Bray-Curtis dissimilarity matrix of the community data (reduced as described above) was input with the following parameters: 2 dimensions, with 250 runs of data, a maximum number of random starts 500 and maximum iterations 999. The ordination was constrained to 2 dimensions based on a step-down in dimensionality. Previous ordination positions were used as a starting point, and data were centered but not otherwise transformed.

Mapping methods
The map was based on WorldView-2 (WV2) satellite imagery (resolution 0.46 m panchromatic, 1.85 m multispectral) captured in July 2013 and July 2014. The 16.4-km wide images were orthorectified and clipped to the study area in ArcMap 10.6 [47]. The multi-spectral imagery was then segmented via the ArcMap 'Segment Mean Shift' tool with the following parameters: spectral detail 20, spatial detail 18, minimum segment size 54 pixels, which mimics the size of our 8-m radius ground based plots (200 m 2 ) [48]. The spatial and spectral detail parameters used by the proprietary ArcGIS software range from 1 to 20; they are not equivalent to the bandwidth parameter h used in the original formulation of the mean-shift method [49], and no additional information on their properties is provided. Lacking an a priori basis for choosing the spatial and spectral parameters, we chose values by trial and error to produce segments distinctly finer than our intended final map. This ensured that all potential objects of interest were differentiated. The unnecessary complexity in the segmentation was then reduced by the classification process, which merged adjacent segments assigned to the same class. The unnecessary complexity in the segmentation was then reduced by the classification process, which merged adjacent segments assigned to the same class.
After all plots were assigned to classes, the spectral, topographic and spatial variables of image segments containing vegetation plots were used as training data. Further training data were added whenever landcover class was clearly identifiable from the imagery, in order to increase the overall number of training points and to increase representation of rare classes. Sources of imagery for visual interpretation of landcover class included: 1) an orthomosaic of true-color aerial 1:24,000-scale photographs, taken in 2003 by AeroMetric Inc and digitized at 0.6 m resolution of the CAKR and BELA coasts; 2) the statewide 2012 SPOT imagery mosaic; 3) pan-sharpened WV2 imagery (available for the coasts of both CAKR and BELA); and 4) oblique aerial photographs of the coast available through ShoreZone [33]. The land cover class was identified visually at numerous locations using these image sources, and digitized as point locations. Individual points can be digitized much more rapidly than drawing polygons, which are typically used for supervised classifications, at a rate of ten or more per minute (Fig 3). Allowing the segmentation to define training area boundaries reduces much of the decisionmaking needed during training data collection, allowing for many more points to be collected. This method allows collection of training data which encompasses the full range of variation of each land cover type. A total of 11,647 visually interpreted points with class labels were inferred for BELA, and 6575 from CAKR. The labeled points were then intersected with the segmentation, the variables of the segments where each point fell were extracted, and these records formed the basis of the supervised classification.
A total of 40-46 variables were collected for each segment (Table 1), derived from the WV2 satellite imagery, Alaska 5-meter IFSAR digital terrain model, the National Hydrography Dataset [50] and the 2003 coastal orthomosaic described above. NDVI, the normalized difference vegetation index was calculated from the WV2 images as Near Infrared (NIR)-Red/ (NIR + Red). The normalized difference water index (NDWI) was calculated as Green-NIR / (Green + NIR) [51]. Spectral data was the level 1b radiometrically corrected, 16-bit pixel values, and not further processed to radiance or reflectance.
Only the training data derived from the visually interpreted point locations was used to assign map cover classes to the imagery. We used the plot data and locations to calibrate our visual interpretation of training points, and as a reserved test set for error analysis.
The "unknown" segments (those not identified by visually interpreted training points) were classified by the Random Forest classifier [52]. Random Forest (RF) is a powerful machine learning classifier, widely used in object-oriented mapping due to its high accuracy, and its ability to incorporate high-dimensional model variables and non-normally distributed data [27,53]. RF is an ensemble classifier, built from many classification-and-regression (CART) trees.
Using the RF decision tree modeling approach as implemented in the R package 'random-Forest' [54], we classified the segmented map in four ways (Table 2). First, the training data were divided for each satellite scene, and a total of 40 localized RF models were built. Secondly, all the training data were combined to create a single study-area wide model with the scene identifier as a variable. For both modeling approaches, the initial model measure of variable importance was used to reduce the variables to the 15 most important, uncorrelated variables, and the models were then re-run. Prior to the reduction in variables, Spearman rank correlations were used to test the relationship between variables using package 'Hmisc' in R [55]. Spearman rank correlations of all 40 variables found three variables that were highly correlated  (> 0.9) with other variables at an alpha level of 0.01. These were the segment green band, the mean of the 2003 Orthomosaic green and its standard deviation (S2 Table; p-values S3 Table). Those three variables were removed from consideration for the 15 variables used in the important variable models. For all approaches, we ran the data with 1000 trees and the number variables sampled per node (parameter 'mtry') as determined by the function 'rfTune'.

Error analysis
We used two approaches to assess map accuracy: the bootstrap method provided by RF analysis, and the field plots as an independent test. We computed errors using the field plots as verification data, because the classifications were performed solely with the photo-interpreted points as training data. Our field plots were placed intentionally, and they do not provide an unbiased estimate of the proportion of each type in the study area. However, within each type they provide a useful check on accuracy that can be compared with the OOB error rates. Because RF is based on classification trees that leave out a random subset of the data, RF can calculate out-of-bag (OOB) error, the percent of the training data left out of the bootstrapped sample that is misclassified. Breiman [52] found OOB error to be an accurate metric that did not necessitate leaving out an independent test set. However, Millard and Richardson [27] found that OOB error was inflated for high-dimensional models relative to an independent test set, and recommend reducing the variable set to uncorrelated, high-importance variables; these formed the basis of our decision to include only the top 15 uncorrelated variables for both modelling approaches. Note that OOB error for individual classes is an estimate of "producer's accuracy" [56], i.e., the percent of observations (segments in our case) from a known class that were correctly classified.

Vegetation classification
The ordination's final stress was 0.1271, with an instability (standard deviation of change in stress over the previous ten runs) of 0.00015. The observed non-metric fit for the ordination was 0.984.
The covariates elevation, hydric index, salinity (EC) and the log of distances to ocean or estuary bodies were used as correlation overlays in the ordination, as these were environmental factors most highly correlated with ordination axes and of greatest interpretive value ecologically. There was a strong negative correlation between elevation and EC (-0.76) and a moderate positive correlation between distance to the ocean and EC, but otherwise weak relationships between variables (S1 Table). The ordination was rotated to align Axis 1 maximally with the hydric index, the most strongly correlated factor, r 2 = 0.42 (Table 3; Fig 4). A higher hydric index indicates a drier site, thus this axis appears to represent a wetness gradient. The structure of the ordination is similar to the ecological sequence seen along beach ridges. Axis 2 separates plots closest to the ocean from those that are most isolated from salt water with transitional, mid-salinity plots in between. EC was not perpendicular to the ordination-presumably because EC registered as 0 in dry soils, regardless of their proximity to the ocean. Surprisingly, neither distance to ocean or distance to estuary was a highly correlated variable. This may be because linear distance from the ocean or an estuary does not accurately represent protection from storm surge. For example, a site 500 m from the ocean protected by two intervening beach ridges is not equivalent to a site 500 m from the ocean along contiguous low-lying salt marsh. Classes were further separated subjectively using the ordination. Several highly diverse, forb-dominated plots in CAKR clustered consistently from crowberry lichen tundra, in the now designated 'Herbaceous meadow' vegetation type (see Appendix 1 in S1 File). The brackish plots intermediary to salt marsh and freshwater sedge were dominated by Carex rariflora and Salix ovalifolia, and assigned to the 'Brackish sedge wet meadow' type.
Using the ordination and sorted table analyses, we describe 12 vegetation types as formal NCVS plant associations (Appendix 1 in S1 File). The map includes these 12 vegetation types, as well as six other broad classes, which we will hereafter collectively refer to as land cover classes ( Table 4). The six broad classes include the non-vegetated classes of water, sediment and built-up (human infrastructure), Salt Marsh-Water Mosaic, and two non-coastal vegetated classes, Tall Shrub Upland and Upland Tundra. Salt marsh-water mosaic is identical in terms of plant cover to salt marsh, but includes a high percent cover of water, and is spectrally distinct. The two non-coastal classes, Tall Shrub Upland (dense vegetation found along steep slopes and freshwater riverine systems) and Upland Tundra (consisting of low shrub-sedge tundra outside of tidal influence), are mapped to delimit the study area. These two classes were only incidentally sampled and we do not include them in the description of vegetation types. Full descriptions of the vegetation types, as well as a dichotomous key and data summaries of plots are included in Vegetation Types, Appendix 1 in S1 File.
Attempts to separate the large Brackish Sedge-Willow Marsh cluster of plots into willow-or sedge-dominated classes were not consistent in the ordinations. The spectrally distinct 'salt marsh-water mosaic' ecotype was added to the map to display the estuary and lagoon islands composed of a fine patchwork of mud/water pockets between salt marsh vegetation. In terms of species composition, these were identical to salt marsh plots. One undersampled type that did not ordinate consistently, Grayleaf Willow Shrub, was included as a type because it was very distinct on the imagery (see S1 File).
Several potential types were so rare on the landscape that we were unable to include them and designated them as 'unclassified'. These include two Sphagnum (peatmoss) dominated plots on the edge of lakes in Cape Krusenstern, CAKR, and Juncus arcticus (Arctic rush) dominated swales in sparsely vegetated beach ridge swales in BELA. Also included are three plots with the freshwater aquatic emergent species Hippuris vulgaris (common marestail). Due to the difficulty in mapping halophytic Hippuris tetraphylla (fourleaf marestail), which is more prevalent in the study area but still generally rare, we also designated these plots as unclassified. Ultimately, 5 plots were designated as outliers, 12 as unclassified, and 8 had low enough vegetation cover (<10%) that they were mapped as sediment.

Variable importance
Variable importance is measured two ways for random forest trees: mean decrease accuracy (MDA), how much accuracy decreases when a variable is excluded from the model, and mean decrease Gini (MDG), a measure of how homogenous nodes including the variable are. We tested the stability of both measures with 100 random forest models using jackknife runs leaving out 10% of the data (ala [58,59]) and found them both to be highly stable (r 2 between the rank of each variable from full dataset and the 100 resamples was >.99 for both MDG and MDA, see S4 Table). We report MDA for ease of interpretation, but MDG results are substantively similar (S4 Table). Fig 5 displays the ranking of the importance of the segment variables for land cover classes, as measured by MDA for the unified, all-variable model. Elevation is the single most important variable, especially for Upland Tundra and the two Salt Marsh classes. Distance to ocean is important generally, and particularly for distinguishing Dunegrass Beach Meadow, the vegetation type found directly at the beachfront. NDWI (a metric of wetness) and NDVI (a metric of vegetation density) are most important for recognizing the unvegetated classes of sediment and water. Latitude and longitude are important variables for most types; many of the vegetation types are structured sequentially along beach ridges. The least important factors are segment compactness and rectangularity, two metrics of segment shape.

Model comparisons
We compared 4 models, scene-specific models with all variables (Scene spef., all var.) and top 15 important variables (Scene spef., top var.), and a study-area wide unified model with all variables (Unified, all var.) and top 15 important variables (Unified, top var.). We found all 4 models to have generally similar accuracies, with the unified important variable model and scene-specific all variable model in particular having slightly higher overall accuracies, and different strengths in recognizing classes (Table 5).
For the independent field test set, accuracy is similar across models (Table 4). Accuracy was 0.7 or better for all types and models except GWS, which was not adequately sampled (n = 5). Several widespread types had accuracies of 0.9 or better. Extremely low (0.5) and high (1)

PLOS ONE
values were obtained only for types with very small sample sizes of 5 or less. Accuracy varies more between classes than it does between models. For OOB error, most classes have very similar accuracies across models (Fig 6; Table 4). Some of the rarer classes have a wider range in accuracy, with the unified models being notably worse for these classes. When out-of-bag errors are parsed by satellite scene (Fig 7), again, error rates between scenes vary more than between models, that is to say, some scenes consistently classify better than others, regardless of model used. There is no consistent relationship between training data sample size and accuracy in by-scene comparisons (data S5 Table, multiple linear regression results S6 Table, adjusted R 2 = 0.008).
Reducing the model to the fifteen most important variables increased accuracy only for the unified model. This can be seen slightly in the field test set results and more strongly for OOB error. For the field test set, the unified, top-variable model was more accurate than its counterpart all variable model in 7 out of 15 classes and tied in 5. For OOB error, the unified important-variable model is more accurate than the all-variable model for 12 out of 18 classes, with a 1.7% increase in overall accuracy (Table 4; Fig 6). In contrast, the scene-specific model sets the unified and important-variable models tie for 11 out of 15 classes of the field test data, and 8 of Table 5. The producer's error for the independent test set of excluded field plots for all four sets of models and out-of-bag producer's error from each RF model. The number of test plots is higher than the number collected, as some plots occur on overlapping satellite scenes, which were classified separately in all modeling approaches. Such plots were counted as testing each classified scene. In bold: the highest accuracy for a given type. Four-way ties are unbolded. Starred classes in the OOB section indicate land cover classes without field plot test data.

Field plot producer's accuracy RF OOB producer's accuracy
https://doi.org/10.1371/journal.pone.0273893.t005  (Table 4). This is in contrast to the results from Millard & Richardson [27], who found an independent test set necessary to assess the value of reducing variable dimensionality. The unified, important variable model was the most accurate for a plurality of land cover classes (9 out of 18), the majority of scenes (23 out of 40), and had the lowest overall OOB producer's error, 11.6%. There is a strong correlation with sample size-widely sampled classes are more accurate in the unified models, while 5 out of 6 of the rarest classes have lower error in the scene-specific models. Visual comparison of the maps produced by the two best models, the scene-specific all variable model and the unified top variable model, found minor differences. For the purposes of this map, our priorities were distinguishing vegetated from nonvegetated areas, and the boundaries of more abundant classes. We chose the unified important variable model to produce the final map. Table 6 shows the error matrix of the independent test set against this classification. Many of the errors are predictable-dunegrass beach meadow, an often sparsely vegetated type, is most commonly misclassified as sediment. Salt marsh and salt marsh mosaic, which have identical plant associations, are frequently confused. Brackish sedge-willow meadow is found in close proximity to and has gradual transitions from salt marsh and crowberry-lichen tundra, the two types with which it is most confused. Visual inspection of the map shows that inappropriate types are not being mapped on mismatched landforms (i.e. beach ridge types in estuaries), and much of the error is from difficulty in distinguishing transitions. However, the error is uncertain for many of the smaller land cover classes. As we do not have a sampling design proportionate to the area occupied by each class, we do not calculate user's accuracy. The OOB error matrix is shown in S7 Table.

Final map product
The final product is a 663.4 km 2 map of the vegetation of the BELA & CAKR coasts, available as a vector GIS layer [39]. Figs 8-10 show the maps, as well as close-ups of the detail. The map will also be made available through the Alaska Ocean Observing System's online Ocean Data Explorer [60]. This data network hosts coastal and oceanographic data from multiple partners, making it readily available to natural resource managers and stakeholders in spill preparation and response.

Uses of mapping product
We envision three primary uses for these high-resolution coastal land cover layers: spill response, restoration, and long-term ecological monitoring. First, as the Arctic sea ice has retreated, shipping through the Bering Strait has more than doubled since 2008. The multiagency U.S. Committee on the Marine Transportation System projects likely vessel transit increases of approximately 30% in the next decade [61]. With increased shipping comes an increased risk of spills from the cargo, towing, fishing, and tanker vessels that have constituted 70% of the several hundred vessels in the region each year in the second half of the last decade. Immediately following an oil spill, responders will need high-quality information about the distribution of coastal communities along the shoreline with important conservation value (e.g., salt marshes, brackish sedge-willow marshes) and a high likelihood of oil retention (i.e., areas of low wave energy, marine eddies, etc.) Low-lying, backwater vegetation communities generally have high importance as bird breeding areas [62] and oil releases have the potential to remain on site for decades [33,63]. Site-specific oil spill response strategies have been established by the State of Alaska for sensitive areas [64], but little is known about the vegetation and habitat composition within these sites. In the event that prioritization of response resources is needed due to logistical issues in a response, our data layers will allow area estimation of the most sensitive habitats, thus enabling data-based triage. Overlaying Shorezone's Oil Residency Index [33] onto land cover classes of interest could assist in developing a plan for prioritization of response resources.
Second, in the years following a spill, our land cover layers could prove valuable in guiding both restoration and Natural Resource Damage Assessment proceedings [65]. In a spill reaching U.S. lands, the responsible party is required to pay the cost of restoration, and the landowner and other regulatory parties determine the necessary restoration action. If the spill causes damage on NPS lands, NPS requires restoration of vegetation communities and physical site properties to their original, natural condition [66, 67]-in contrast to other landowners that may require only simple revegetation or restoration of ecological function. Our data layers can provide necessary information about composition, aerial extent, and location of affected communities. Using the characteristics of the unaffected areas of a given cover class in Appendix 1 in S1 File (vegetation type descriptions, S1 File) coupled with detailed vegetation work in the affected area and adjacent unaffected areas, restoration staff can create targets against which restoration efforts may be evaluated.
Lastly, these layers may serve as an excellent baseline against which to compare future vegetation changes from climate change and landscape changes that follow increasing coastal erosion [28]. NPS's Arctic Network has conducted long term monitoring of terrestrial vegetation  on NPS Arctic units since 2004 [35]. Because our data layers are high-resolution, they will permit an assessment of coastal landcover change in future decades using similarly high-resolution imagery that is becoming increasingly available from both public and commercial sources.

Use of mapping method
This is one of the first large-scale, high-resolution object-oriented classifications of any National Park unit, and demonstrates that such methods are functional for conservation efforts. This classification method has several advantages as well as limitations. In per-pixel classifications, isolated pixels can be misclassified in the midst of the true, homogenous type, resulting in 'salt-and-pepper' land cover maps [17]. Segmentation reduces this by aggregating pixels into more natural units, and improves the display quality of the final map. Additionally, 30 m 2 pixels are often mixtures of communities because they straddle boundaries or ecotones. Segments greatly reduce the number of boundary pixels, making more homogenous areas that are easier to classify. Segmentation also allows for object-level variables to be collected, such as shape parameters or texture, though we did not find these useful for our classification of a primarily undeveloped landscape. As is true of any classification, high quality ground-based data is essential. Our total of 443 plots is substantial and well-distributed, but there are still several kilometer-wide gaps between https://doi.org/10.1371/journal.pone.0273893.g009 some samples, due to the remote nature of the Chukchi coast and the existence of parcels of private land along the shore. The ability to recognize vegetation types from satellite imagery or aerial photography is often difficult, and the existence of continuous oblique photography along the coast available through ShoreZone [33] was a major asset to this project.
The method of identifying training points and then intersecting those identifications with the segmentation has yet to be widely used. It is in essence the same as the methods found internally for some object-oriented software workflows, where imagery is first segmented and the training data is collected via manually identifying segments [68]. The point-based method has the advantage that it can be adapted to any segmentation. Placing and categorizing training points is much faster than delineating polygons, allowing for a much larger set of training data to be collected. Collection of training points can be undertaken in a systematic way, via a grid or random sample. We did not use this method because of the large number of types we were attempting to recognize; assigning training points to unambiguous areas is more efficient.
Classifying each satellite scene by a separate model has certain advantages, allowing disparate variable information and potentially better inference for localized classes. For our map, we found minor differences between scene-specific and unified models. Including the scene identifier as a model variable allowed the decision tree forest to incorporate the local differences in classes. All but two of our satellite tiles were WV2 imagery taken within a year of each other, and we were able to collect 40 variables for all scenes, making the data consistent across the study area. Maps with wider variation in data availability and spectral information between scenes may perform better with scene-specific models.
We improved on the spatial resolution of the existing coastal land cover map, going from 900 m 2 Landsat pixels to minimum 200 m 2 polygons, and the ecological resolution, increasing from 5 coastal types to 12. Our accuracy rates, at 88.4% OOB producer's accuracy for the final model, surpassed our target of 80% accuracy.
Supporting information S1