What Data to Use for Forest Conservation Planning? A Comparison of Coarse Open and Detailed Proprietary Forest Inventory Data in Finland

The boreal region is facing intensifying resource extraction pressure, but the lack of comprehensive biodiversity data makes operative forest conservation planning difficult. Many countries have implemented forest inventory schemes and are making extensive and up-to-date forest databases increasingly available. Some of the more detailed inventory databases, however, remain proprietary and unavailable for conservation planning. Here, we investigate how well different open and proprietary forest inventory data sets suit the purpose of conservation prioritization in Finland. We also explore how much priorities are affected by using the less accurate but open data. First, we construct a set of indices for forest conservation value based on quantitative information commonly found in forest inventories. These include the maturity of the trees, tree species composition, and site fertility. Secondly, using these data and accounting for connectivity between forest types, we investigate the patterns in conservation priority. For prioritization, we use Zonation, a method and software for spatial conservation prioritization. We then validate the prioritizations by comparing them to known areas of high conservation value. We show that the overall priority patterns are relatively consistent across different data sources and analysis options. However, the coarse data cannot be used to accurately identify the high-priority areas as it misses much of the fine-scale variation in forest structures. We conclude that, while inventory data collected for forestry purposes may be useful for forest conservation purposes, it needs to be detailed enough to be able to account for more fine-scaled features of high conservation value. These results underline the importance of making detailed inventory data publicly available. Finally, we discuss how the prioritization methodology we used could be integrated into operative forest management, especially in countries in the boreal zone.


Introduction
Informative conservation decision-making depends on available data Biodiversity conservation deals with multifaceted and complex problems [1] that call for interor transdisciplinary research and decision-making [1][2][3]. Several different kinds of data are typically required [4], such as spatial data on species distributions, habitats and ecosystem services [5,6], costs associated with conservation actions [7], the structure and representativeness of the existing reserve network [8], and increasingly information about the present and future state of dynamic environments [9,10] and anthropogenic threats [11,12]. Furthermore, onground conservation decisions are almost always tied to a relatively fine spatial scale which implies that the data used for conservation prioritization should also have resolution relevant for the prioritization problem at hand [6,13]. Spatial conservation prioritization [14,15] is a form of conservation assessment primarily interested in when, where, and how should conservation action be taken in order to achieve conservation goals [16,17]. It can be embedded within a broader context of conservation planning [18] that can be described as a complete operational model covering all the stages needed for successful conservation action including assessment, planning and management. Conservation prioritization problems have been extensively studied conceptually and mathematically for many years [19] and consequently many software methods for solving a wide array of problems have also been published [20][21][22][23][24].
Most of the contemporary approaches on spatial conservation prioritization are based on the concept of complementarity, which can be defined as a property of a prioritization solution whereby high-priority sites complement each other in terms of biodiversity features they contain. In other words, sites work together efficiently in achieving conservation objectives [25]. If a site has unique biodiversity features such as species, it is often considered irreplaceable and sites with high irreplaceability are typically considered high conservation priorities [15]. Connectivity is another central concept for spatial conservation prioritization and planning [26]. The term connectivity is commonly used to refer to measures of spatial connectedness of a network of sites connected by species' dispersal with higher connectivity usually implying increased species persistence [27]. In practice, however, operationalizing connectivity in spatial conservation prioritization has been difficult because of multitude of definitions [28] and computational intractability [29]. Nevertheless, enhancing connectivity is often promoted as an important conservation strategy [28,30,31], although it may come at the expense of highquality sites that are poorly connected [13,26]. Irrespective of the spatial conservation prioritization method employed, validating the results is an important, but often overlooked part of the whole prioritization. Maps and other results of prioritization assessments are often produced assuming that the input data is of sensible quality and thus the priorities reflect onground reality adequately [32], but this assumption needs to be validated against independent validation data. Creating informative and accurate conservation prioritization results therefore hinges on the availability of reliable input and validation data.
A number of studies have been published regarding the pattern of distribution of biodiversity and new technological advancements, such as relatively cheap and very accurate remote sensors, have led to an increase in the available biodiversity data [33]. However, in most regions of the world the primary biodiversity data for conservation decision-making still remains scarce [34,35] and are often biased in terms of species representation, areas sampled and spatiotemporal accuracy [36,37]. For most species, and most parts of the world, we simply do not have sufficient data [35,38] and even when we do, it is not necessarily accessible. Sharing data is especially desirable from the decision-making point of view because of the many benefits it entails, such as enabling integrative and synthesizing science [39], enabling exploration of new topics not envisioned by the data originators [40], and providing more verifiable research for dryad.d9p4v. Detailed forest inventory data owned by the Finnish Forest and Park Service's Natural Heritage Services are available from the organization (http://www.metsa.fi/sivustot/metsa/en/ContactUs/ Sivut/ContactUs.aspx, P.O. Box 94 (Vernissakatu 4) FI-01301 Vantaa, +358 205 64 120). The data are not publicly available, but can be acquired for scientific research upon a request. The data are not subject to legal restrictions. Coarse forest inventory data are publicly available from http://www.metla.fi/ohjelma/ vmi/vmi-moni-en.htm.
policymakers [41]. Many public and private organizations collect and maintain research and monitoring databases that could be valuable for conservation decision-making, but remain unavailable because of political or technical barriers for data sharing. There are good reasons for withholding the data, such as detailed location data on endangered species or confidential information concerning the privacy of individuals [40,42,43]. Restricting access to such information is not only an ethical obligation, but also often a legal one. Thus, the availability of potentially useful biodiversity data remains restricted, despite the fact that conservation decisions still have to be made [35,44]. With great potential for better informed decision-making, open access to relevant data is crucial for addressing the increasingly complex conservation issues the world is facing [12,34,43,45,46].

Forest inventory data for spatial conservation prioritization in the boreal zone
The circumpolar boreal forest is the second largest biome in the world [47]. Countries in the boreal zone have traditionally utilized their forest-based natural resources extensively, which has led to changes in forest structure, species composition, habitat diversity, and large-scale disturbance dynamics [48][49][50][51][52]. While it is not the most species-rich or threatened biome on the planet [53,54], there are still many reasons for increasing conservation efforts in the boreal zone. First, boreal forests host a great number of highly specialized species that are dependent on resources such as dead wood [55][56][57]. Many of these species have become endangered because of intensive forestry practices. Second, because of their large extent and biomass, boreal forests have a major role in carbon sequestration and climate change adaption [47,58]. Third, many parts of boreal zone, especially in the Russian Federation and Canada, remain inaccessible presenting an opportunity to protect large tracts of relatively intact forest [59,60].
In boreal zone and elsewhere, effective conservation planning should ideally be done simultaneously with general land-use and natural resource use planning [6]. This further emphasizes the need to be able to synthesize and utilize data from various sources. Data for land use and natural resources use planning may also be useful for conservation planning, assuming that they act as surrogates for biodiversity features of conservation interest. The upside is that resources allocated for collecting these types of data usually exceed those allocated for conservation-related data collection. For example, countries with an active forest sector typically have high-resolution, national, forest inventory systems (NFIs) in place [61,62]. In addition to NFIs, many other public and private operators collect detailed forest inventory data for their own operational planning often at the national or regional scale. Recently, governments and public research institutions in particular have started opening up their databases. For example, the Finnish Forest Research Institute has opened up their multi-source national forest inventory database (http://www.metla.fi/ohjelma/vmi/vmi-moni-en.htm).
Forest inventory data has historically been collected to assess the productive functions of forests [61], but broad-scale NFIs are increasingly being used for monitoring forest biodiversity particularly in boreal forests [63]. Following the classification by Corona et al. [63], biodiversity indicators estimated from forest inventory data can be classified into two categories: (i) compositional indicators directly measuring biodiversity, and (ii) structural indicators based on key structural features (e.g. variability in tree size and the amount of dead wood) acting as correlates or surrogates for biodiversity [63]. The latter approach largely relies on the assumption that high structural and tree species diversity provides more habitats for different forest species [63][64][65]. It also requires that the features can be reliably estimated from forest inventory data [66,67]. The approach based on structural indicators also has many desirable qualities from the perspective of spatial conservation prioritization. First, as structural components are comparatively easy to measure, data about them are commonly included in forest inventories [61,63]. Second, the effects different forest management options have on structural features and thus on biodiversity can be easily assessed [68], enabling comparisons between different management scenarios. Third, forest inventories are typically repeated periodically, making it possible to monitor changing conditions [63]. Fourth, since forest inventory data still are primarily collected for forest management and planning purposes, conservation prioritization based on forest inventory data can be more easily understood by forestry practitioners. Finally, the data, and thus the results of conservation prioritization analyses, are produced at a resolution directly relevant for operative planning. However, validating the results is even more important when relying on surrogate data such as structural forest inventory data. At the structural level, one can use the locations of known conservation value, such as existing protected areas, as crude validation data which should, on average, be more valuable than the surrounding (commercially managed) landscape.

Aims and scope
Here, we develop a set of conservation prioritization analyses based on freely available and proprietary forest inventory data with a varying degree of detail across the province of South Savonia, Finland. We use the conservation prioritization software Zonation to develop complementarity-based priority maps. Accounting for connectivity in spatial conservation prioritization can be ecologically justified [25] and often called for in conservation implementation [69]. Therefore, we also include connectivity considerations in our Zonation analyses. With this approach, we are studying the following questions: 1. Can conservation prioritization analysis based on forest inventory data capture conservation value in boreal managed forest landscapes?
2. How well does freely available coarse forest inventory data perform compared to more detailed proprietary stand-based inventory data?
Furthermore, given the differences in the data reliability and prioritization results, we discuss under what kind of planning circumstances is open but coarse inventory data sufficient for informative conservation decision-making?
We limit our attention to the effects that different data sources have on the quality of spatial prioritization, and we acknowledge that the computational analysis described here is just one part of a full conservation planning process [20,70]. While the results will be case-specific to a certain degree, the procedure itself is applicable to other countries that have similar forest inventory data available. The results should be applicable to countries with a similar forest management history and current forest and conservation management needs.
To encourage other scientists and practitioners to build upon the work presented here, we also make available the full analysis implementation (see S1 Appendix) and the code necessary to produce the results from the prioritization analyses. While the proprietary data we have used cannot be shared because of privacy issues, we have made all the stages of the implementation openly available for examination and re-use.

Study area
The study area covers the region of Southern Savonia located in southeastern part of Finland. South Savonia is one of 13 regional administrative units of the Finnish Forest Centre (FFC). The region is ca. 13990 km 2 and characterized by a large number of lakes and fragmented waterways, which cover ca. 25% of the total area. Of the land area, approximately 88% is forestry land that can further be divided into mineral soils (79%) and mires (21%). Together, forests on mineral soils and mires form a gradually varying landscape mosaic in the study region. The south boreal vegetation zone covers the whole region and forests are mostly dominated by the Scots pine (Pinus sylvestris) and the Norway spruce (Picea abies), mixed with varying amounts of broadleaved trees. Land ownership is highly fragmented, with private forest owners being the largest group (77.3%) followed by private companies (11.5%) and the state (6.2%) [71]. Most of the forestry land is under silvicultural management and only 2.5% is strictly protected, which is the same as the average for forestry land in southern Finland. Whereas private forest land has several operators working on it (including the FFC), the state-owned land is managed by a single organization, Metsähallitus (the Finnish Forest and Park Service, which is further divided into two independent departments: the Forestry Department manages the Finnish state production forests and the Natural Heritage Services (NHS) manages forests outside of commercial operations, including protected areas.  Table 1 the data sets used in the analysis. To address the main objectives, we 1) acquired coarse and detailed forestry inventory data from Southern Savonia, 2) calculated comparable surrogate indices of conservation value out of these data, 3) carried out six different conservation prioritizations using three different input data sets and testing the influence of connectivity transformations, and 4) compared all prioritization results to each other and areas with known high conservation value. We included both forests on mineral soils and mires in the analyses, because ecologically the forests and mires are often linked by partly overlapping species pools and connecivity. However, the surrogate indices reflect only the conservation value of forests on and thus the results are uninformative for mires.

Data sets
Coarse data. The coarse data used in this study were based on the multi-source national forest inventory (MS-NFI) developed and maintained by the Finnish Forest Research Institute (FRI). The MS-NFI method employs satellite images, digital maps and field measurements to estimate thematic digital maps about structural features of the forest across Finland at a spatial resolution of 20 m. MS-NFI data collection covers all land-use classes and ownership categories throughout the country [62,72,73] including the study area (Table 1 and Fig 2). The final data product contains over 40 forest variables in the form of thematic maps, including the volumes by tree species and timber assortments, stand mean variables, the biomass by tree species groups and tree compartments and forest site type characteristics (e.g. [62,73,74]). In Finland, the MS-NFI is being used mostly for regional level forestry planning, but it has also been used for large-scale conservation prioritization studies [13,74,75]. The MS-NFI data has been publicly available since late 2012, the thematic maps can be viewed through a web portal and the rasters can be downloaded through a file service [76].
The conservation value indices used for the prioritization require that information on both the average diameter and the volume are available for each tree species group. The standard MS-NFI rasters include only one estimate for average diameter over all tree species groups. In order to calculate estimates of average diameter for each tree species group, stand level variables were derived from the MS-NFI by the way of automatic stand delineation. We did the stand delineation in the study area by using automatic segmentation (see also S1 Appendix) on the MS-NFI forest maps of the 10th iteration of the National Forest Inventory done between years 2004 and 2008. As the input data for the segmentation, we used the thematic map layers The figure is divided into three sections. The first describes how we combined the index rasters to produce the input data sets used in the prioritization analyses. The second summarizes the main characteristics of spatial conservation prioritization analyses done using Zonation. The third section summarizes how we analyzed the priority rank maps produced by the six different Zonation analyses. PA = protected areas, WKH = woodland key-habitats, METSO = protected areas from the METSO on stand mean height and volumes of the tree species groups: pine, spruce, birch and other broadleaved trees. The segmentation was carried out using a modified implementation of the "segmentation with directed trees" algorithm by Nagendra & Goldberg [77]. The algorithm is based on using the local edge gradient for linking individual pixels into larger spatially continuous units, i.e. segments. The automatic segmentation process is guided by parameters such as heterogeneity allowed within the segments and the desired minimum size of the segments [78]. The desired size of segments was approx. 1-2 ha. We calculated the stand level variables as average values of the individual pixels within each segment and the variables per tree species by weighting the pixel level variables by the volumes of individual tree species. Detailed data. Here, we refer to detailed data for stands or forestry compartments. The data are produced by a combination of direct field inventories and a representative plot-based sampling system. Nowadays, inventory data are also updated using remote sensing data (LiDAR). These data are collated to provide very fine scale information for forest management planning [72] by different authorities and forestry organizations depending on land tenure. For this study, we used data from two authorities operating in the study region: the FFC on private land and Metsähallitus NHS on public land.
The FFC inventories forest stands only on need-basis or when forestry operations take place. Therefore, some of the inventory data can be relatively old and does not represent the current state of the forest very well. To account for this, we only used data gathered in year 2000 or after covering~44% of the land area (Table 1 and Fig 2). In addition, we used spatial data on the planned forestry operations, such as thinnings and clear-cuts. We used these data to discount the conservation value of forest areas that are planned to go through forest operations of varying degree. The forest inventory data gathered and managed by the FFC are not freely available as the Finnish Personal Data Act restricts the distribution of the data at a resolution that allows linkage of the data to properties of individual forest owners. It is possible, however, to get access to the data for research purposes [79].
Metsähallitus NHS has a similar inventory system in place on public land. Their information system updates the database annually to simulate the growth of forests, and consequently no filtering was needed. We received detailed stand-based data from NHS after signing a research collaboration agreement. We were unable to get any data from regions governed by Metsähallitus Forestry. Detailed data from Metsähallitus NHS covers~2.4% of the land area in Southern Savonia.
Data for validation. We used three different data sets for validating the prioritization results: spatial delineations of 1) the established protected area network, 2) woodland key-habitats, and 3) recently acquired protected areas (Table 1).
Metsähallitus NHS maintains the data on established protected areas and a spatial database is publically available. Protected areas also cover mires, but for validation, we used only protected areas on mineral soils (~1.9% of the whole landscape). Protected areas have often been established for reasons other than their high biodiversity value and this could be the case in Southern Savonia as well. However, many of the protected areas in the region-including two national parks-contain relatively old forests that have been outside commercial forestry management for decades and thus probably contain features important for forest biodiversity.
Woodland key-habitats (WKH) are a conservation instrument designed for maintaining landscape-level biodiversity in production forests by delineating and preserving small habitat patches of high conservation value [80]. The concept is in use in many Fennoscandian and Baltic countries and, while their effectiveness as a conservation measure varies depending on the country and definition [80][81][82], WKHs seem to be hotspots for dead wood dependent and redlisted species, and for species richness in general [80]. Because of potential privacy issues, the exact spatial locations of WKHs are not public information, but the data is available for research use.
Recently acquired protected areas are related to the forest biodiversity conservation programme METSO that is an ongoing effort to halt the decline of forest biodiversity by 2016 [69]. Individual forest owners can offer their forest property for protection, and if the particular offer fulfills given scientific selection criteria, it is admitted into the programme through either a permanent or temporary (10 years) conservation contract. The forest owner then receives a tax-free compensation based on the economic value of the growing stock and timber [83]. The sites selected in METSO are ecologically more valuable than average Finnish forests, containing more dead wood as well as many red-listed species [84]. We used only areas with permanent conservation contracts for validation, as the conservation effectiveness of temporary or fixedterm contracts is questionable [84]. These data are not publicly available before they become integrated into the main protected areas database, but it can be accessed for research use.
In terms of size, the mean size of distinct spatial units in the protected area data set is higher (13.54 ha, Table 1) than in the WKH data set (0.61 ha) or the recently acquired protected areas data set (5.36 ha).

Calculating conservation value indices from original data
We reclassified the original forestry data (both coarse and detailed) into four tree species groups: pine, spruce, birch, or other broadleaved. We calculated an index of conservation value per pixel for each of the tree species groups in each of the input data sets separately (Fig 1). This index measures an expert-derived view on how the average diameter and the volume of the growing stock relate to ecological features desirable for conservation. We transformed the average diameter of the growing stock per tree species group by a sigmoidal benefit function (see S1 Appendix and S1 Fig) and then multiplied the transformed value by the volume of the growing stock. A similar approach has been used earlier in large-scale conservation prioritizations [13,74] and in species-oriented prioritization [75].
All available data sets had information on site fertility class, which is often also associated with the formation of specific forest microhabitats. We hypothesized that using a classification scheme would emphasize rarer forest types that typically have higher biodiversity value. Therefore we created two input data sets based on the coarse data: One with just the four index rasters ("Coarse"), one with the four index rasters each divided into five site fertility classes ("Coarse with classes" , Figs 1 and 3). The division into site fertility classes produces 20 tree species group-site fertility type classes, which we refer to as forest types. We did the same classification operation to the detailed data producing the input data set "Detailed with classes". We also hypothesized that prioritization based on the more detailed-and more precise-inventory data would outperform those based on the coarser data. Note that the input data set "Detailed with classes" is actually a combination of coarse and detailed as the detailed data only covered ~46% of the landscape (Table 1). Both "Coarse" and "Coarse with classes" are completely based on MS-NFI and thus publicly available data.
We converted all detailed vector data to rasters of the same resolution (20 m) and extent as the MS-NFI. For computational reasons, we aggregated the data to 60 x 60 m pixel size using ArcGIS (version 10.2.1) [85]. We wanted to retain as high a resolution as possible because conservation prioritization analyses should be carried out at a spatial scale that is informative about ecological components (e.g. connectivity and average size of habitat patches) and relevant at the scale of operative planning [13]. For calculating conservation value indices at this resolution, we used custom-made geospatial scripts based on Python [86] (version 2.7.2) bindings to GDAL [87] (version 1.10.1).

Prioritizing locations for conservation
For the spatial prioritization, we used Zonation [21,88] (version 4.0 [89]). It is a complementarity-based software that operates on a set of input rasters describing the occurrence levels of biodiversity features across the landscape; in our case the features were the index rasters of forest conservation value. Initially Zonation considers the values of all input rasters in each pixel in the landscape. It then proceeds by iteratively removing the least valuable cells simultaneously accounting for the occurrence of features in cells, the remaining occurrence of each feature across the landscape, and connectivity. At each iteration, the features are normalized by their remaining range-size, meaning that as a feature becomes rarer during the cell-removal process, its relative significance increases. The repeated range-size normalization [90] leads to maintenance of a balance between all features at all iterations. In the end, Zonation has heuristically ranked the whole landscape according to its conservation priority. If used for designing a protected area network, Zonation can account for complementarity and irreplaceability of highpriority sites. We encourage the reader to refer to a large body of existing literature for the conceptual background [88,90,91], the exact operational principles [20,89], and the different applications of Zonation (e.g. [74,75,92,93]).
Following best-practices for constructing Zonation runs [20], we started from the simplest possible configurations, enabling more complex features one at a time. This way, it is possible to test for the exact effects each component of the analysis introduces and the sensitivity of the results to different parameter values. After testing with several different combinations, we set up two runs for each input data sets ("Coarse", "Coarse with classes", and "Detailed with classes"): one with and one without connectivity. Thus, we completed six different analysis runs in total ("coarse", "coarseCon", "coarseClass", "coarseClassCon", "detailClass", "detailClassCon" in Fig 1).
All six runs shared certain Zonation configuration options. We used the additive benefit function mode in Zonation [94], because it is appropriate when dealing with habitat data that acts as surrogate for biodiversity at large [74]. It is possible to weight input biodiversity features differently if there is a reason to do so. For example, the threat level of a species can be used for weighting, but any scheme reflecting the (often subjective) valuation of features can be used [20]. For the current study, we based the weighting scheme (S1 Table) on expert opinion so that more weight was given to forests with deciduous tree species on fertile soil. These types of forests are also emphasized in the METSO programme [69].
Runs also had differences (Fig 1). The number of biodiversity features (the index rasters) varied from four ("coarse" and "coarseCon") to twenty ("coarseClass", "coarseClassCon", "detailClass" and "detailClassCon"). Runs based on the detailed data ("detailClass" and "detail-ClassCon") used additional information about planned forestry operations. Technically, we implemented this in Zonation using the data as a condition layer, where local quality (as measured by the values in the index rasters) was reduced at locations with forestry operations [89].
We gave each forestry operation a value between 0.0 (all conservation value lost) and 1.0 (all conservation value retained) reflecting the subjective view on the effect of that operation. Runs "coarse", "coarseClass", and "detailClass" are do not include any connectivity transformations. Runs "coarseCon", "coarseClassCon", and "detailClassCon" on the other hand account for connectivity between different forest types (S2 and S3 Tables). We used the matrix-connectivity feature of Zonation [74,89], in which partially similar forest types facilitate connectivity for each other. While the landscape is divided into different forest types, each forest type still contribute to the connectivity of every other forest type more than non-forest habitats [89]. The spatial scale of the connectivity transformation effect in Zonation is controlled by a featurespecific parameter (α), which is derived from the scale of landscape use of each species or community occupying a habitat type [13,20,74]. We used a value of α (0.001), which corresponds to an average dispersal distance of 2.0 kilometers in a negative exponential dispersal kernel. See Lehtomäki et al. [74] and Sirkiä et al. [75] for further discussion and references about the distances chosen. We also tested the sensitivity of results, replicating the analysis with scales of 0.2 and 4.0 km, but these did not change the qualitative interpretation of results significantly. See Arponen et al. [13] for further discussion on the role of the spatial scale.

Comparison and validation of analysis runs
One of main outputs of a Zonation analysis is a raster file, so called rank raster, representing the ranking of the landscape in terms of conservation priority. Low values (minimum of 0) indicate low conservation priority and high values (maximum of 1) high conservation priority. We examined the spatial patterns of the resulting rank rasters at different spatial scales. We did comparisons between (i) runs based on the same input data set but different analysis settings (i.e. the effect of connectivity) and (ii) between different input data sets analyzed using the same settings (i.e. the effect of the input data). We performed all comparisons using the standard Zonation outputs and data. We used R (version 3.1.0) statistical language [95] and the zonator R-package [96] (version 0.3.10).
Visual examination of the rank rasters should give an initial idea how well the different runs -and hence the different input data sets-converge especially in terms of high and low conservation priorities. We also compared the spatial overlap between analyses by calculating Jaccard coefficients (the intersection of two sets divided by the union of those sets) for different priority intervals. In other words, we divided each rank raster into 10 equal intervals and compared each interval of each rank raster to each interval of every other rank raster. This way we could compare, for example, the spatial overlap of the best 10% of the landscape in runs "coarse" and "coarseClass", but also the worst 10% in "coarse" with the best 10% in "coarseClass".
We also examined how well different prioritization runs were able to identify forest regions with known high conservation value. We did this by overlaying the priority distributions within areas covered by each of the validation data sets. Using Zonation, it is possible to load the rank order of the solution (i.e. the inverse removal order of the cells) while using different input features or Zonation options. This procedure allows the examination of how much performance is lost when the analysis criteria and evaluation criteria differ [20,89]. For each analysis run, Zonation produces information on how large a fraction of the distributions of each biodiversity feature can be represented by a particular fraction (e.g. top 10%) of the landscape. We evaluated how much these fractions, so called representation levels, differ when the input features are based on the detailed data ("detail-Class") but the priority ranking is taken from analyses based on the coarser data ("coarse" and "coarseClass"). Assuming that the detailed data is also more correct, we can then answer the question of how much feature representation we risk losing if relying only on the coarser data.

Spatial patterns in the rank rasters
Overall, the spatial patterns in the rank rasters were roughly consistent across runs not accounting for connectivity (Fig 4A, 4C and 4E). A major concentration of high-priority areas was identified in the southwestern corner of the study area. Classifying the coarse input data set according to the site fertility classification ("coarseClass", see Fig 1) had only a minor effect of distributing the high-priority areas more equally across the study area (Fig 4A and 4C). Zonation tries to retain a balanced representation of all features throughout the analysis and therefore introducing more classes (i.e. features) will produce more even distribution of highpriority areas unless the most valuable features are spatially aggregated. "detailClass", which is based on the more detailed data, produced a different priority pattern (Fig 4E). High-priority areas are distributed even more equally over the study area (marginal plots in Fig 4E). Regions of concentrated of high-priority areas also partially shift towards the northeastern part of the study area. This shift is at least partly explained by the fact that the more detailed data gives higher value to the two large national parks in the northeastern region.
Runs including connectivity between forest types ("coarseCon", "coarseClassCon", "detail-ClassCon") display very similar rank priority patterns compared to runs without connectivity. Regions dense with high-priority areas (Fig 4B, 4D and 4F) receive higher overall priority because of the connectivity effect, which is evident from the marginal plots in Fig 4B, 4D and 4F.

Spatial overlap of priority intervals
Comparing priority intervals between solutions shows the effect of using the site fertility classification. Fig 5A displays an asymmetrical pattern of overlap between priority intervals in "coarse" and "coarseClass". Large areas in "coarseClass" receive slightly lower priorities than in "coarse", which is balanced by a small set of areas having significantly higher ranks in "coarse-Class" than in "coarse". There is little overlap between high priorities in "coarse" and low priorities in "coarseClass" (upper-left part of panel 5A) whereas the inverse is different: there is some overlap with relatively high priorities in "coarseClass" and low priorities in "coarse" (lower-right part of panel 5A). The classification of the data causes these overlapping patterns in priority intervals. More specifically, some site fertility classes (most notably the herb-rich and xeric types) are rarer than others and consequently receive more emphasis in the Zonation analysis. This is because Zonation will give priority to features that are rare to begin with.
The best and worst 10% of the priorities have the largest spatial overlaps in all comparisons. Since data classification is the only difference between "coarse" and "coarseClass", their overall similarity is larger which also explains the higher overlap of the best and worst 10% of priorities (Fig 5A). The overlap is smaller for the best and worst 10% of priorities between "coarse"/ "detailClass" and "coarse"/"coarseClass", but still those overlaps are higher than for the rest of the priority intervals. In other words, the best and the worst areas are more similar between all the analyses even if the underlying input data sets are different. Fig 5D-5F show the spatial overlaps between runs that account for connectivity. Patterns are similar to the patterns in the runs not accounting for connectivity (Fig 5A-5C) runs with the difference that the patterns are smoother and more aggregated in all comparisons (Fig 5D-5F). Comparisons between runs based on the same input data set with and without accounting for connectivity (Fig 5G-5I) show a strong overlap between the same priority intervals in different runs. The overlap tends to increase when moving towards the highest or lowest priority areas of the study area. This reaffirms that connectivity as defined in this study has an effect only on a local scale. Comparison to spatial validation dataProtected areas have relatively high median priorities in all runs (Fig 6). "detailClass" and "detailClassCon" have the highest median priorities within the protected areas (~0.85 and~0.90), followed by "coarse" and "coarseCon" (~0.71 and 0.69). Woodland key-habitats also have quite high median priorities in solutions "detailClass" and "detailClassCon" (both~0.69), but the distribution of priorities is not as skewed as with protected areas. For "coarseClass" and "coarseClassCon", WKHs have a median priority of 0.48, and the median values are even lower for "coarse" and "coarseCon" (~0.42 and~0.41). Locations admitted to the METSO programme receive the highest median priorities values in "detailClass" and "detailClassCon" (both~0.82). "coarse" and "coarseCon" have a median priority value similar to those of protected areas (~0.72 and~0.70), as do "coarseClass" and "coar-seClassCon" (~0.68 and~0.65). In all cases, the difference between runs with and without connectivity is small, except in the case of protected areas. Overall solutions "detailClass" and "detailClassCon" perform better than the others, potentially indicating higher accuracy of the more detailed data and demonstrating the utility of using detailed data from on-the-ground forest inventories.

Feature representation
Loading the priority rank order from the runs based on coarse input data ("coarse" and "coar-seClass") revealed differences in performance. Fig 7 shows the overall performance, i.e. how much of the initial representation levels from the detailed data can be covered by protecting a given fraction of the landscape. Fig 7A shows that on average, priority rankings "coarse" and "coarseClass" perform much worse than "detailClass". For example, protecting the best 10% of the landscape using the ranking from "detailClass" would cover on average approximately 54% of the original distributions of all features from the detailed input data set. In comparison, solutions "coarse" and "coarseClass" would cover on average only~15% and~16% of the features in the detailed data, respectively (Fig 7A). This difference is even more pronounced when examining the solutions that use additional site fertility classes. For example, the best 10% of the landscape under "detailClass" covers~93% of features in herb-rich sites, whereas solutions "coarse" and "coarseClass" only achieve a coverage of~15% and~14%, respectively (Fig 7B). For all other site fertility classes except for mesic, the performance of "detailClass" is superior to that of "coarse" and "coarseClass". The performance levels of runs that account for connectivity ("coarseCon", "coarseClassCon", "detailClassCon") are omitted here, because they are very similar to those of "coarse", "coarseClass", and "detailClass".

Discussion
Can forest inventory data be used to identify valuable areas for conservation?
Our results demonstrate that 1) inventory data collected primarily for operational forest planning is informative for spatial conservation prioritization, and 2) openly available remote-sensing based data performs reasonably well for large mature forest areas, but fails to detect valuable sites of smaller size. Therefore, if the spatial prioritization includes objectives for detecting small scale biodiversity feature occurrences such as the WKHs, a more detailed input data are needed.
On the scale of the whole study area, priority patterns between runs based on the coarse and detailed data are relatively similar, but have at least three key differences. First, analyses based on the coarser data give higher priority to a large area at the southwestern part of the province. This is because the MS-NFI data has high estimated values for birch and other deciduous trees in the region, which also has a high incidence of fertile soils. Deciduous trees and fertile site types are less common than other tree species and site fertility types. They furthermore have higher weights assigned in the Zonation analysis due to relatively high associated biodiversity values (see also S1 Appendix). Second, analyses based on the more detailed data give existing large protected areas even much higher priorities. This is most probably because, compared to Open Forest Inventory Data in Conservation Prioritization the coarse data, the detailed data available from within protected areas describes more accurately the mature stands within the PAs. Third, since the detailed data have information also on the occurrence of small but valuable forest (e.g. herb-rich sites or mature deciduous trees) that is not correctly represented in the coarse data, the high-priority sites are more evenly distributed over the whole study area (see the marginal plots in Fig 4).
Of the three validation data sets, WKHs have the smallest average size per site and the most fine-grained structural features important for biodiversity. The coarse data is simply unable to pick up such features. This is not surprising as the coarse data we are using (MS-NFI) is known to have low statistical precision for small area estimates [63,97]. Of course, when available, information about WKHs can be included in the prioritization process itself. We did not do so here, because that would have excluded the use of WKH data as an external validation source.
Extent and resolution are important factors in analyses that account for connectivity. The small effect connectivity has on the priority rank distributions within the validation data sets may appear surprising, especially since the effect of connectivity is quite pronounced over larger areas (Fig 4). However, even when combined the validation data sets cover only a small fraction of the total landscape (2.5%, Table 1) and the mean decay distance for dispersal we used (2 km) is relatively large compared to the average size of sites in the validation data. For these reasons, accounting for connectivity actually decreases the median priority for all other The performance of solutions based on coarser data measured by their ability to cover features in the detailed data. The performance curves show for each site fertility class the mean occurrence levels of biodiversity features in the detailed data. The solid curves are for "detailClass", which uses detailed data. The dotted ("coarse") and dashed ("coarseClass") represent coarse data solutions, and show how much representation of the detailed-and presumably more accurate-data would be lost if the prioritization was based on coarser data. The same comparison between "coarseCon", "coarseClassCon", and "detailClassCon" produced very similar results (not shown). validation data sets except the protected areas, which are larger and thus by definition better connected internally.
The validation procedure we have used relies on a few key assumptions. First, we assume that the indices we have constructed truly reflect conservation value. While we have not validated the indices against actual species occurrence data, features we have emphasized in the construction of the index are important for biodiversity in the Finnish boreal forest (see e.g. [74,75]). Second, we assume that the validation data sets actually describe locations of high conservation value, and that they should therefore receive higher than average priority in spatial prioritization analyses. Protected areas have traditionally been established on less productive soils [59,98] and they usually do not represent the full spectrum of species or habitats in any given region. However, being set aside from the prevailing forest management regimes will over time lead to a less even forest structure [99], thereby accumulating important resources such as dead-wood [100]. METSO-sites are on average smaller than many of the existing protected areas, but because of the stringent selection criteria and on-ground evaluation of each site, their ecological quality is high and studies have shown that they do indeed have higher species richness and that they contain more rare and threatened species than their surrounding areas [84]. WKHs are scattered more evenly over the landscape and according to a recent meta-analysis [80] they contain elevated amounts of critical resources (dead-wood, etc.) that support a comparatively large number of species. However, the average size of a WKH site is small (0.67 ha in Finland [101]), meaning that their capability to support populations longterm is questionable.

Trade-offs between different data and prioritization objectives
Conservation scientists, managers, and practitioners are often faced with tight schedules and limited budgets, and thus have to decide whether it is worth the time and money to try to collect more data [102,103]. Collecting more data often includes spending time and money on trying to gain access to more detailed data that may not be openly available. Conservation prioritization based on incomplete data runs the risk of commission and omission errors, selecting sites that are not valuable in reality or missing sites that are [36]. According to our results, the analyses based on coarse and detailed data produce spatial priority patterns that are broadly speaking similar but have differences in more local scale (Fig 5). Top and low priorities are slightly more overlapping than the middle-range. Importantly, however, the high-priorities of any of the analyses do not much overlap with the low priorities in any other run. If they did, using coarse data as basis for prioritization would produce wildly different and often incorrect results.
While coarse data is able to describe broad priority patterns correctly, we found that the less abundant biodiversity features such as herb-rich and xeric forest types are not identified well (Fig 7). For example, if we are interested in the top 10% of the landscape, the prioritization based on coarse data with classes captures only half of the representation of biodiversity features that can be achieved if using detailed data. Even if the high-priority areas have a large overlap spatially, using the coarser data misses much of the occurrences of herb-rich sites and woodland key habitats.
The differences between the analyses based on the coarse and coarse with classes input data sets are particularly interesting, as it is temptingly practical to improve existing data with simple classification scheme. The inclusion of the classification does slightly improve the performance for rarer classes (Fig 7) so everything else being equal, an ecologically-justified classification of the data can improve the results.
Including connectivity in the analysis raises the priority of regions that have high quality sites at high densities, thus identifying regions where metapopulations might be able to persist. This is particularly important for many threatened forest species that suffer from habitat loss and fragmentation [104][105][106]. However, emphasizing connectivity will happen at the expense of individual high-quality sites that are relatively isolated [13,26]. Increasing the priority of medium-quality and well-connected forests will lower the priority of other locally similar sites and possibly even poorly connected high-quality sites (Fig 4). Including connectivity will also emphasize large, overall high-quality areas such as protected areas (Fig 6).
Opening up forest inventory data is an opportunity for integrated forest and conservation planning in the boreal zone Open forest inventory data has a major role in conservation planning and decision-making in the boreal region. It enables equal and inclusive access to the best available data [107], it makes the supporting scientific analysis more transparent, and it enhances the repeatability of the whole conservation planning process [40]. Repeatability is especially important for applied research supporting decision-making, because underlying objectives may change, old data are updated, and new information can accumulate rapidly. Transparency and repeatability are also important for the process of translating regional plans into local conservation action: whereas regional plans incorporate important factors such connectivity and the representativeness of the protected area network as a whole, local action can be understood as individual management actions that sometimes unfortunately are poorly linked to regional planning [93]. Plugging into regional and local forest planning through the use of forest inventory data presents new opportunities for conservation prioritization especially in countries of the boreal zone which already have sophisticated forest planning and inventory systems in place.
In summary, we have shown that coarse, NFI-derived data works reasonably well in the identification of broad spatial conservation priorities, but we also found that more detailed inventory data is needed to capture the structural attributes at the local-scale. While it is encouraging to see that inventory data is becoming more openly available, conservation research and decision-making would benefit from more open data policies especially in government organizations. The approach we have taken in this work builds upon previously published work [13,74,75] and methodology [20]. Here we make all analysis implementations (see S1 Appendix) and data (where possible, see 2.3) available to enable others to adapt the approach for their own uses. The approach described here is being used in the implementation of the Finnish national forest conservation programme, and we continue our efforts to improve the approach.
Supporting Information S1 Appendix. Supporting information on the data and data pre-processing.  Table. Biodiversity feature weights used in the Zonation analyses. Features 1-4 are specific to runs "coarse" and "coarseCon", while features 5-24 are the forest types defined in Material & Methods. (DOCX) S2 Table. Matrix connectivity multipliers used in Zonation runs "coarse" and "coarseCon". The numbers on the header row and column correspond to the feature IDs in S1 Table. Note that the matrix is asymmetrical, i.e. the direction of the connectivity effect matters. Columns represent the forest types causing the connectivity effect, rows the forest types receiving the connectivity effect. (DOCX) S3 Table. Matrix connectivity multipliers used in Zonation runs "coarseClass", "coarse-ClassCon", "detailClass" and "detailClassCon". The numbers on the header row and column correspond to the feature IDs in S1 Table. Note that the matrix is asymmetrical, i.e. the direction of the connectivity effect matters. Columns represent the forest types afflicting the connectivity effect, rows the forest types receiving the connectivity effect. (DOCX)