Modeling the Distribution of Geodia Sponges and Sponge Grounds in the Northwest Atlantic

Deep-sea sponge grounds provide structurally complex habitat for fish and invertebrates and enhance local biodiversity. They are also vulnerable to bottom-contact fisheries and prime candidates for Vulnerable Marine Ecosystem designation and related conservation action. This study uses species distribution modeling, based on presence and absence observations of Geodia spp. and sponge grounds derived from research trawl catches, as well as spatially continuous data on the physical and biological ocean environment derived from satellite data and oceanographic models, to model the distribution of Geodia sponges and sponge grounds in the Northwest Atlantic. Most models produce excellent fits with validation data although fits are reduced when models are extrapolated to new areas, especially when oceanographic regimes differ between areas. Depth and minimum bottom salinity were important predictors in most models, and a Geodia spp. minimum bottom salinity tolerance threshold in the 34.3-34.8 psu range was hypothesized on the basis of model structure. The models indicated two currently unsampled regions within the study area, the deeper parts of Baffin Bay and the Newfoundland and Labrador slopes, where future sponge grounds are most likely to be found.


Rapport technique canadien des sciences halieutiques et aquatiques
Les rapports techniques contiennent des renseignements scientifiques et techniques qui constituent une contribution aux connaissances actuelles, mais qui ne sont pas normalement appropriés pour la publication dans un journal scientifique. Les rapports techniques sont destinés essentiellement à un public international et ils sont distribués à cet échelon. II n'y a aucune restriction quant au sujet; de fait, la série reflète la vaste gamme des intérêts et des politiques de Pêches et Océans Canada, c'est-à-dire les sciences halieutiques et aquatiques.
Species distribution models (SDM), trained using research trawl data and spatial information on depth, slope, chlorophyll-a concentration, shear, surface and bottom temperature, salinity and current speed, were used to make predictions of the distributions of Geodia spp. sponges and sponge grounds in the northwest Atlantic Ocean. We compared two model types often used for SDM, random forest (RF), a machine-learning method based on presence/absence data, and MaxEnt (ME), a machine-learning method based on presence-only data. This comparison included the fits obtained on independent data from the area each model was trained for as well as the fits obtained on independent data from a different area (extrapolation), the spatial predictions each model produced, and the ability of each model to produce sensible partial dependence plots and realistic measures of the importance of each predictor variable for model predictions. Of the two model types tested, RF generally out-performs ME when tested against independent validation data; the difference between the two model types is larger when models trained for the Flemish Cap are extrapolated to the full East Coast study area. Comparing the RF model with the kernel analyses previously employed, there is a general distribution match between the maps. However the RF model predictions have a more refined distribution as would be expected given its use of absence data in addition to the presence data. For management and conservation purposes, RF can be used to refine kernel models of the location of sponge grounds and to predict sponge ground areas where data are either sparse or non-existent.

INTRODUCTION
Sponges form an ancient group of sessile filter-feeders that occur as sparsely distributed solitary organisms or in dense single-or multi-species aggregations. They are characterized by a body plan built around a system of canals through which water is pumped, supplying food and oxygen and removing waste (Hooper and van Soest 2002).
Sponges are highly effective filter feeders and play a clear role in benthic-pelagic coupling (ICES 2009). They also enhance local nutrient and energy exchange (de Goeij and van Duyl 2007) and biodiversity (Klitgaard 1995;Hentschel et al. 2002, Beazley et al. 2013. They constitute an important component of benthic ecosystems, especially when they occur in dense aggregations known as sponge grounds or "ostur", which provide a structurally complex habitat for fish and invertebrates (Herrnkind et al. 1997;Amsler et al. 2009;Kenchington et al. 2013;Beazley et al. 2013). The foundation for these sponge grounds are often the large, erect, aggregating sponges. In the North Atlantic, these are species of Geodia, Stryphnus, Stelletta, Thenea, Tetilla and Polymastia (ICES 2009).
Due to the inaccessibility of their habitats relatively little is known about deep-sea sponges, and most research has focused on the anatomy and taxonomy of trawl-caught specimens (Hooper and van Soest 2002;Cárdenas et al. 2013), biology and associated fauna (Klitgaard 1995;van Soest et al. 2013), broad-scale geographic distribution (Klitgaard and Tendal 2004;Kenchington et al. 2010;Murillo et al. 2012), and the potential of their secondary metabolites for drug development (Hill 2004;Müller et al. 2004).
The threat posed by bottom-contact fisheries has sparked conservation concern, especially in light of deep-sea sponges being slow-growing and long-lived organisms vulnerable to disturbance (Klitgaard 1995, Hogg et al. 2010. The combination of their vulnerability and importance for deep-sea benthic ecosystems makes sponge grounds a prime candidate for Vulnerable Marine Ecosystem (VME) designation as outlined in United Nations General Assembly Resolution 61/105 (UNGA 2007), and hence subject to conservation action (Penney et al. 2009). Because deep-sea conservation is largely carried out through spatial fisheries management, this has generated a need to map the distributions of sponges and sponge grounds at the level of spatial detail needed for such management.

Sponge Grounds
The distributions of individual sponge species and genera can be described or predicted through presence/absence or presence-probability maps (Franklin 2009), as can the distribution of sponge grounds following a definition of what constitutes a sponge ground. Due to the general lack of data on sponge density and the spatial extent of sponge aggregations, published definitions are mostly qualitative in nature, including "a restricted area where large-sized sponges are strikingly common" (Klitgaard and Tendal 2004), and "aggregations of large sponges that develop under certain geological, hydrological and biological conditions to form structural habitat" (Hogg et al. 2010). Klitgaard et al. (1997) defined 'ostur' as areas where 90% of the wet weight of trawl catches is comprised of sponges, excluding fish. One quantitative approach to distinguishing the presence of sponge grounds from a background area with some sponge presence at lower density was developed by Kenchington et al. (2009), who used a kernel density approach to delineate areas covered by research trawl catches above a range of weight thresholds, and used an abrupt change in the area covered by catches above two neighbouring threshold values to indicate the difference between sponge grounds and lower density sponge presence for a local area. The application of this approach to individual trawl gear types and bioregions (DFO 2009) in the northwest Atlantic Ocean (Kenchington et al. 2010), is adopted in this study.

Mapping Sponge Grounds
Knowledge of the spatial distribution of deep-sea sponges and sponge grounds is currently limited to compilations of presence observations derived from a variety of sources, including research trawls (Fuller 2011;Murillo et al. 2012), the local knowledge of fishers (Klitgaard and Tendal 2004), or direct observations using SCUBA, remotely operated vehicles (ROVs) or crewed submersibles (Leys et al. 2004). Although such data compilations occasionally provide dense sampling coverage of an area (Kenchington et al. 2010), they are more often necessarily limited in both spatial extent and sampling density.
Sponge grounds in the northwest Atlantic are found along the continental slopes of Grand Bank and Flemish Cap and northward along the Labrador Slope (ICES 2009;Kenchington et al. 2009;Kenchington et al. 2010;Fuller 2011;Murillo et al. 2012). Murillo et al. (2012) described four areas with large aggregations of sponges in the high seas east of Newfoundland, Canada: the continental slopes of Grand Bank; the southeastern slope of the Flemish Cap; the eastern slope of the Flemish Cap; and the northern slope of the Flemish Cap and the Flemish Pass. There the Demosponges Geodia barretti, G. phlegraei, G. macandrewii (Geodiidae), Stryphnus ponderosus and Stelletta normani (Ancorinidae) are the main structure-forming sponges constituting more than 99 % of the total invertebrate biomass within the sponge grounds (Murillo et al. 2012).

Habitat Suitability Modeling
In the context of typically sparse point observations and a desire for complete and continuous maps of the distribution of a species or habitat type, species distribution modeling (SDM) is often used. We adopt the SDM term (Franklin 2009) for any model that predicts the presence, absence or abundance of a phenomenon (henceforth: the response), typically a species or habitat type, from environmental variables thought to influence it (henceforth: the predictors), when this model is applied to maps of the predictors to create spatially explicit and continuous predictions of the distribution of the response. SDM has been extensively used in both terrestrial and marine environments to make contemporary distribution maps, to predict species/habitat responses to climate change Lawler et al. 2009), to predict the future range of invasive species (Peterson and Robins 2003;Peterson 2003) and more. SDM, despite its utility in data sparse situations, has rarely been used in the deep-sea. Existing studies include several focusing on the cold-water coral Lophelia pertusa, whose local, regional or global distributions are modeled on the basis of statistical relationships with variables that quantify a range of aspects related to seafloor topography as well as physical and chemical oceanography (Dolan et al. 2008;Davies et al. 2008;Guinan et al. 2009;Tittensor et al. 2009;Davies and Guinotte 2011;Yesson et al. 2012). The distribution of Lophelia reefs, as opposed to the presence of individual corals, has also been modeled on the basis of topographic variables derived from multi-beam data (Ross and Howell 2012). However, despite the recognized importance of sponges and sponge grounds as deep-sea benthic habitats, their distributions are still only known from point observations and SDMs have not to our knowledge been applied to sponges in the deep-sea.

Objectives of Study
This study explores the use of SDM-based mapping of sponges and sponge grounds in the northwest Atlantic (NWA) Ocean. Specifically, 1) We estimate the fits achieved by SDM models on independent test data from the study area, and compare results before and after application of two different approaches to elimination of collinear predictor variables; 2) We compare two model types often used for SDM, random forest (RF), a machinelearning method based on presence/absence data, and MaxEnt (ME), a machine-learning method based on presence-only data. This comparison includes the fits obtained on independent data from the area each model was trained for as well as the fits obtained on independent data from a different area (extrapolation), the spatial predictions each model produces, and the ability of each model to produce sensible partial dependence plots and realistic measures of the importance of each predictor variable for model predictions; 3) Sponge grounds, as distinct from a low density background presence of sponges, are defined using a threshold based on the wet sponge weight caught per tow. We compare two approaches to thresholding, a single global threshold and a variable threshold defined by gear type and bioregion (Kenchington et al. 2010), to assess the success of each approach for SDM-based mapping of sponge grounds; 4) We compare the accuracy of SDM models used to map the presence of individual members of the Geodia genus (Geodia spp.) against models used to map the two most prevalent species (Geodia barretti and Geodia phlegraei); 5) We produce SDM-based maps of sponges and sponge grounds that constitute the best available information on their distribution in the study area.

STUDY AREA
Data for this study come from the Canadian continental shelf and slope in the NWA, from the area off Cape Breton in the south to southern Ellesmere Island in the north (henceforth East Coast, abbreviated EC). This area is bounded in the north by NAFO zone 0A and in the south by NAFO zones 4Vs and 3Ps. The area is further bounded at a distance of 20 km from the Canadian coast in the west, and by the 2500 m depth contour or the eastern boundary of NAFO zones 0A and 0B, whichever is shallower, in the east. The area does not extend into Lancaster Sound or the Hudson Strait beyond the limits of NAFO zone 0A and NAFO zones 0B/2G respectively. In addition a smaller area, densely covered by field data, was defined around the Flemish Cap and Grand Bank (henceforth Flemish Cap, abbreviated FC). This area is bounded by the Canadian EEZ in the west and by the 2500 m depth contour at the continental slope, and forms a subsection of the EC area. The extents of both areas are shown in Figure 1 with the NAFO divisions in the area.

DATA AND METHODS
The data used in this study form two major components, biological distribution data and environmental data. The biological data are derived from georeferenced samples collected from Canadian and European Union-Spanish (EU-Spain) research trawl surveys, via box-cores/dredges, or in situ camera/video. The environmental data are derived from satellite data and oceanographic model outputs.

Sponge Ground Locations
The data used to identify locations of sponge grounds were collected via research vessel trawl surveys by Fisheries and Oceans Canada (DFO) and the Spanish Institute of Oceanography (IEO) in collaboration with other European institutions during the period of 2007-2011. Research trawl catches were, as a minimum, separated and weighed by phylum, to yield a measure of the total wet weight of caught sponges and quantified as kg/standard tow. In order to separate sponge grounds from non-sponge ground locations, a threshold was applied to this wet weight and locations with catches above the threshold were classified as a sponge ground presence, while those with catches below the threshold were classified as a sponge ground absence. However, different gear and trawl durations were used by the different missions, leading to differences in the area covered by each research trawl (swept area) and differences in the proportion of the swept benthic fauna that was recovered and available for weighing on deck (catchability). In order to use the entire data set in our study, separate thresholds were developed for each gear type and biogeographic region using kernel density analyses (Kenchington et al. 2009;Kenchington et al. 2010), and applied to the respective data points; these thresholds are summarized in Table 1. A global sponge grounds presence/absence threshold of 70 kg sponge/tow was also used as a comparison to the local gear-specific thresholds.  (Cárdenas et al. 2013). However G. phlegraei and G. parva are very closely related and we could not distinguish them with confidence in our data. As G. parva was only recently resurrected as a valid species from a subspecies of G. phlegraei we refer to our G. parva specimens as G. phlegraei hereafter. Of these, only G. barretti and G. phlegraei were recorded in sufficient numbers to allow species-level distribution modeling. Geodia spp. absences were compiled from DFO and IEO fisheries survey data. These surveys do not identify sponges beyond phylum level, so only catches with no sponges of any kind were used as Geodia spp. absences ( Table 2). As noted by Kenchington et al. (2012), no sponges in a catch does not necessarily mean that there are no sponges on the sea floor, although any sponge presence is likely to be of low density. Data for the EC study area include those listed above for the FC study, as well as additional data collected by DFO and detailed in Kenchington et al. (2010). Data on Geodia spp. absences from the EC study area were added from all DFO survey locations where no sponge of any kind was caught (Table 3). Occasionally the DFO Arctic surveys identified the sponge catch to genus level; when Geodia spp. sponges were identified these are listed in Table 3 as "Geodia spp. presence", G. barretti and/or G. phlegraei may or may not have been present at these locations. Note that the data in Table 2 and Table 3 are treated as point observations, while in reality the research trawls covered an area as determined by the gear type and the direction and duration of the trawl. The geographic coordinate associated with each trawl was obtained at trawl start; for simplicity this location is used in the analysis to represent the entire trawl.
Based on the data outlined above we derived the presence/absence data sets listed in Table 4. It is clear from Table 4 that all data sets are unbalanced, with presences being relatively rare compared to absences. This may present problems for accurately mapping the rarer of the two values (Japkowicz and Stephen 2002;van Hulse and Khoshgoftaar 2009), in our case the presences. It is also clear that the two sponge species have a small number of presences, particularly for the FC study area, which may also be problematic in the model development (Stockwell and Peterson 2002). The distribution of these data points are illustrated in Figure 2 and Figure 3. A spatial bias towards dense sampling in the southern part of the EC study area is clear, with especially dense sampling (not clearly seen in the figure) in the FC study area.

Figure 2.
Spatial distribution of sponge ground presences as defined using local gearspecific thresholds (green circles) and absences (red crosses) as known from the field data.

Environmental Data Layers
While the point observations take the form of georeferenced presence/absence observations, the predictors in SDM take the form of spatial data sets (maps) that describe the spatially continuous distribution of factors thought to influence the spatial distribution of the response variable(s), at the appropriate spatial resolution (Franklin 2009). Given the limited knowledge of the habitat requirements and ecology of the sponge species in question, and the environmental conditions that may enable or limit the formation of sponge grounds, environmental data layers were selected on the basis of general notions of relevance and, crucially, availability. Data on a total of 50 predictors were compiled from different sources (Table 5), and transformed as necessary to geographic (lat/long) coordinates using the WGS 1984 datum and a 0.017° cell size; this cell size is approximately equal to a 1 km cell size in the southern part of the EC study area. The predictor data for the Flemish Cap area form a subset of the data for the East Coast area.

Depth and Slope
Bathymetric data were derived from the 30 arc-second General Bathymetric Chart of the Oceans (GEBCO) data (BODC 2009). Slope was derived in degrees from each depth layer using ArcGIS's Spatial Analyst tool. Depth is expected to act as a general proxy for unmeasured predictors including light availability and historical trawling intensity (negative correlation), in addition to being correlated with variables such as temperature and salinity ranges. Slope is expected to act primarily as a proxy for substrate type (e.g. rock, sand, silt, mud), which is thought to influence the ability of sponge larvae or fragments to settle in an area.

Chlorophyll-a Concentration
Data on surface chlorophyll-a concentration were derived from publicly available Level 3 SeaWiFS data for the period January 2001 -December 2010. These data are spatially composited to a 9 km cell size, and provided as monthly and annual mean values (NASA 2012). Annual minimum, maximum, mean and range statistics were calculated for each cell from the 10 annual data layers. Similarly, winter minimum, maximum, mean and range statistics were calculated from the December, January and February monthly data layers from each of the 10 years. This procedure was repeated to calculate seasonal statistics for spring, summer and fall. Kriging was performed using ArcGIS's Geostatistical Analyst to transform the 9 km cell size of the original data layers to the 0.017° cell size. Surface chlorophyll-a concentration is expected to be related to the export flux of particulate organic carbon (Lutz et al. 2007) and thus nutrient availability at the sea floor. Seasonal rather than annual measures of nutrient availability may be relevant due to the sponges' reproductive cycles, shown for Geodia barretti in the northeast Atlantic (Spetland et al. 2007).

Physical Ocean Variables
Data on shear stress at the seafloor, as well as temperature, salinity and current for the sea surface and seafloor were derived from the GLORYS2V1 ocean reanalysis at ¼ ° resolution. GLORYS2V1 provides 3-hourly estimates of these and other variables from 1993 to 2009 (Ferry et al. 2012). Minimum, maximum, mean and range statistics were calculated for each cell from these 3-hourly values. Kriging was subsequently performed using ArcGIS's Geostatistical Analyst to transform the ¼° cell size of the original data to the 0.017° cell size. Temperature and salinity can be used to identify water masses while all three may indicate physical tolerance limits of the sponges. This list of variables is by no means an exhaustive list of predictors that may influence the distribution of Geodia spp. sponges or sponge grounds in the study area, and other variables (e.g., sea ice, wind, chemistry) are available from online data bases (EU 2013;NODC 2013). SDM studies of Lophelia pertusa and their reefs include a wider range of topographic variables including aspect, curvature and a bathymetric position index (Guinan et al. 2009;Ross and Howell 2012), as well as data on alkalinity, aragonite saturation state, salinity and concentrations of dissolved inorganic carbon, dissolved oxygen, nitrate, phosphate and silicate (Dolan et al. 2008;Davies et al. 2008;Tittensor et al. 2009), all derived with different accuracies and at different spatial resolutions. It is important to note that not all these predictors proved important to predict Lophelia pertusa distributions, and it is at least conceivable that some variables would have gained or lost importance for prediction had they been quantified differently and/or measured at a different spatial resolution. Other potentially important predictors that we did not have data for include the substrate type and historical trawling intensity.

Model Types
Two machine learning techniques were used to model the distribution of each response variable: random forest (RF) (Breiman 2001) and MaxEnt (ME) . RF is a non-parametric ensemble classifier that predicts the value (here: probability of presence) of a single response variable from the values of multiple predictors. It is an effective modeling method in situations where the number of predictors is larger than the number of samples, and is widely used for distribution modeling (e.g., Lawler et al. 2009;). An RF model is composed of multiple regression trees (Breiman et al. 1984), where splits at each node in each tree are based on a random subset of the available predictors. For each response variable and study site, we trained a model using the presence/absence field data and the values of the predictors at the corresponding locations. RF model development was done in R (R Core Development Team 2012) using the "randomForest" package (Liaw and Wiener 2002). Default values were used for RF parameters. ME is a distribution modeling technique that determines the maximum entropy probability distribution given a set of presence locations and, instead of absence data, a set of randomly selected background locations, as well as the values of the predictors at those locations . ME has been shown to perform well in comparison to other methods (Elith et al. 2006). ME model development was done in R using the "dismo" package (Hijmans et al. 2011). Default values were used for ME parameters.

Variable Elimination
To reduce problems arising from a large number of collinear predictors (Graham 2003), two approaches were used to reduce the number of predictors in the models. The first approach was to remove predictors highly correlated with others (Legendre and Legendre 2012), and the second was to remove predictors that did not contribute to reduce a model's prediction error. Variable elimination was important in our study because of the high number of collinear variables that had initially been included (e.g., the various chlorophyll-a concentration data layers) and the desire to produce interpretable models from which ecological hypotheses could be generated.

Removal of Highly Correlated Variables
For each study area, the Pearson correlation coefficients between all predictors were calculated from a complete sample of all raster cells. The two predictors with the highest correlation were then considered, and one of them eliminated. This process was repeated until no remaining variables were highly correlated, here defined as R>0.7.
In most cases high correlations were found between obviously related variables (i.e., the annual range of chlorophyll-a concentrations and the annual maximum chlorophyll-a concentration). When the highly correlated variables represented two metrics (mean, maximum, minimum and range) for the same variable, we preferentially eliminated the mean, then the minimum or the maximum, and then the range. Elimination of the minimum or maximum metric when these two were correlated depended on the variable in question. This preferential elimination of some metrics over others was based on the reasoning that the range (max-min), while not as refined or specific as mean, maximum and minimum, represents the variability of conditions likely to be encountered by an organism, while the maximum and minimum represent tolerance limits. For different variables, regardless of metric, we preferentially eliminated annual chlorophyll-a concentrations over seasonal ones, and preferentially kept spring chlorophyll-a concentrations over those of other seasons, based on the seasonal reproduction of Geodia barretti in Scandinavia (Spetland et al. 2007). We preferentially kept variables describing conditions at the seafloor over those describing conditions at the sea surface. We acknowledge that our preferences are subjective, but given the limited data on the ecology of deep-sea sponges in general, and Geodia spp. in particular, no objective alternative was available. The predictors remaining after the elimination of highly correlated variables are shown in Table 6. For both the FC and EC study areas, 23 variables of the original 50 remained after highly correlated variables had been eliminated.

Removal of Variables That Did Not Contribute to Model Performance
For each study area, response variable, and model type (RF and ME), the area under the receiver operating characteristic curve (AUC) was calculated using 10-fold crossvalidation repeated 10 times for models developed using the 23 predictors (Table 6). The AUC value is a measure of model fit calculated from the combination of true-positive and false-positive rates. Its value ranges from 0 to 1 and equals the probability that the model will produce a higher presence-probability for a randomly chosen observed presence than for a randomly chosen observed absence (Fawcett 2006). AUC is a commonly used measure of model performance in SDM. Each individual variable was then temporarily removed from the model development and testing process, and AUC values were calculated again for the 23 resulting models, each model based on all but the removed variable. If the removal of any variable caused the AUC value to increase (indicating improved fit) or at least not decrease, the variable that, when temporarily removed, produced the highest AUC value, was eliminated. This process was repeated, starting with the 22 remaining variables, and so on until no variable could be removed without decreasing the resulting AUC value. Because the AUC value fluctuates between model runs as a consequence of the random splits used to define the cross-validation groups, AUC values were considered to "not decrease" if a reduction in AUC value was less than 0.005. The resulting variable combinations for each combination of study area, response variable and model type are shown in Table 7 and Table 8 for the FC and EC study areas, respectively.

Model Evaluation
Model Fit for Training Area RF and ME models were trained to predict each response variable separately for each study area, and AUC values were estimated for each combination of model type, response variable and study area using 10-fold cross-validation. Cross-validation attempts to address the problem of how a model will perform on a future data set by splitting the available data into a number of disjoint groups of equal size, train the model on a combination of all but one of them, and test the model performance by comparing predictions for the last group with their observed values. This is repeated using each group as the test group and calculating the mean of a measure of fit for each group, in our case using the AUC value. Because cross-validation estimates of model performance depend on the exact split of the full data set into disjoint groups, we repeated the calculations 10 times for each combination of model, response variable and study area, and calculated the mean of the results. We used the absence data to calculate AUC values for both the RF and ME models, making the two values directly comparable. Our ME AUC values thus differ from those that would have been calculated using the ME software GUI. Model predictions were also applied to the predictors to produce mapped predictions for each study area and response variable, and compared between the two model types.

Model Extrapolation
For each model type, response variable and set of predictors, AUC values were calculated for models trained for the FC study area and applied to the EC study area, and mapped predictions of presence probabilities were produced. These AUC values and maps were then compared with models both trained for and applied to the EC study areas, to assess the effect of extrapolating the models beyond the area they were trained for.

Partial Dependence Plots
Insight into the effect variation in each predictor has on predictions of the response variable can be gained from partial dependence plots, which can be created using two different but related approaches. Conceptually the simplest is to train a model using only a single environmental predictor, and then observe the predictions made by this model using the range of input values that exist for the predictor. We call this the single-variable approach. Another approach is to train a model using all the available predictors, and then observe the predictions made by this model when all predictors but one are kept at their mean value, while using the range of input values that exist for the predictor. We call this the multi-variable approach. Both these approaches are implemented in the ME GUI ; we have implemented them in R to allow their application to both the RF and ME models.

Variable Importance
Given a fitted model, we assess the relative importance for predictions of each predictor by permuting test set values of each individual variable. We then quantify the resulting reduction in model fit, calculated as the AUC value of the model when tested against unpermuted data minus the AUC value of the model when tested against permuted data (delta-AUC). The permutation is repeated 10 times for each of the 10 cross-validation partitions in each of the 10 repetitions, to obtain results that are independent of particular splits of the data set for cross-validation and particular variable permutations . The permutation approach to assessment of variable importance is also found in the RF implementation in R (Liaw and Wiener 2002); we have implemented it independently to allow its application to both the RF and ME models. Although both model types consider interactions between predictors, we did not specifically investigate such interactions through the partial dependence plots or the variable importance measure.

RESULTS AND DISCUSSION
Due to the combinations of two study areas, two model types, five response variables, and the two variable elimination approaches that yielded three different sets of predictors, the study has created a large set of results. The following section will provide all results when feasible and examples, focusing on Geodia spp. presence in the FC study area, when providing all results is not feasible. The complete set of results is available in the Appendix when not provided in this section.

Variable Elimination
The AUC values produced by all models are shown in Table 9 and Table 10, where the top row of each cell corresponds to a model developed using the original 50 variables (Table 5), the centre row corresponds to a model developed using the 23 variables remaining after elimination of variables that were highly correlated (Table 6), and the bottom row corresponds to a model developed using the variables remaining after the AUC-based variable elimination. Model performance in all cases was excellent (AUC > 0.9), showing that the distributions of all the response variables are highly structured by their environment as quantified by our predictors, and that the models are able to emulate that structure. Models developed using the variable combinations remaining after the AUC-based variable elimination have AUC values that are typically similar to those resulting from the models based on a larger number of variables, supporting our two-step variable elimination approach which retained the most influential predictors. Notable exceptions include the ME sponge ground models, where model relative performance deteriorates as variables are eliminated. This is possible when several eliminated variables individually reduce the AUC value by less than 0.005. Other notable exceptions include models of Geodia spp. and G. phlegraei, where models developed with the smallest number of variables produce the highest AUC values for both models types in the FC study area, and for RF models in the EC study area.

Model Fit for Training Area
The RF models perform similarly to or better than the ME models, with the largest differences between the two models seen for the EC study area. We suggest that the superior performance of the RF models is a result of the information content in the absence data, used to train the RF but not the ME models, the latter being specifically developed for use with presence-only data. If false absences in the field data were a substantial problem this would decrease the information content in the absence data points, and reduce the advantage of the RF models over the ME models. For both study areas, the smallest differences in AUC values between the two models are seen for the genus-and species-level models. Differences in performance between the two model types could also be related to the different algorithm employed by each model type. A stricter analysis of the information content in the absences could be done by training RF and/or ME models with and without absences, to compare AUC values within each model type instead of across the two models.
In addition to differences in AUC values, the two models also differ in their spatial predictions. Figure 4, shows Geodia spp. predictions made by ME and RF, respectively, along with the observed presence and absence locations used to train the two models. The ME model has a generous area of high probability which includes all of the known presences on the continental slopes, particularly in the northern part of the study area, but it also includes large areas where the sponges are absent in the field data. The RF model, on the other hand, predicts tighter distribution zones with less area of intermediate probability. This is supported by in situ video and camera observations which show sharp boundary transitions, at least on the Sackville Spur in the north of Flemish Cap. The RF model also indicates a high probability of Geodia spp. in the deep waters in this area. Eight of the ten deepest field data locations show Geodia spp. presence, including the six deepest (depths 1827-2201 m), which provides support to the RF model's extrapolation at depth. Further validation of which model produces the best predictions, especially in deeper areas, will require additional field data. Other differences between the two models include the predicted presence probability for Geodia spp. in the Flemish Pass between Flemish Cap and Grand Bank in the FC area with ME. Beazley et al. (2013) report that Geodia spp. are present but not abundant in this area which is dominated by the glass sponge Asconema sp. and the demosponges of the family Axinellidae. Murillo et al. (2012) report low sponge biomass in this area. Note that the AUC values for the two models are identical for this response variable and study area (Table 9), despite the differences seen in the predictions. Figure 4. Geodia spp. locations in the FC study area, observed (coloured circles) and predicted probability by ME (left) and RF (right).

Model Extrapolation
Models trained on data from the FC study area and tested with data from the EC study area (Table 11) all have poorer fits than the models trained and tested with data from a single area (Table 9 and Table 10), which indicates that the relationships between the response variables and the predictors differ between the FC and EC study areas. This is not a surprising result given the large geographical extent of the EC study area, which includes a range of biogeographic zones, as opposed to the small FC study area ( Figure  1). The RF models consistently outperform the ME models when extrapolated beyond the area they were trained for (Table 9 and Table 10). This may be related to ME's sampling of pseudo-absences from the background (the study area used for training), which changes when the model is extrapolated to a new study area. Based on these results the RF model is recommended over ME for extrapolation with our data.
For the RF models of Geodia spp., G. barretti and G. phlegraei, the complete set of (50) predictors (top rows in Table 11) obtain better fits when extrapolated than those trained with sets of predictor variables reduced through variable elimination, while for the RF models of sponge grounds the set of predictors remaining after removal of highly collinear predictors (centre rows in Table 11) obtained the best fits. A similar pattern is seen for the ME models, with the exception of the Geodia spp. model. Models trained on the smallest sets of variables (bottom rows in Table 11) provide the worst fit for most models, which suggests that the AUC-based variable elimination approach results in models over-fitting to the training data, and consequently to the training area.
The RF models of Geodia spp., G. barretti and G. phlegraei provide better fits than models of sponge grounds. This may be attributed to the nature of the response variables; the Geodia genus and species are well-defined biological entities whose composition and environmental niches can be expected to remain largely spatially constant, while sponge grounds as defined using a weight/tow threshold are composed of different species with potentially varying environmental niches. Using the complete set of predictors, the fits (AUC values) obtained from the extrapolated RF models of Geodia spp., G. barretti and G. phlegraei are close to those obtained from the models trained on the EC study area; this is less the case for models of sponge grounds (compare Table 10 and Table 11).
The difference between using training data from the FC and EC study areas, when making predictions for the EC study area, is illustrated by mapped predictions of Geodia spp. presence probability. Figure 5 shows mapped predictions of Geodia spp. presence probability from the RF model trained with the smallest set of predictors using data from the EC study area. Most of the observed presence locations are correctly identified as having high presence probability, and most observed absence locations have low presence probability. The unsampled Labrador slope shows high Geodia spp. presence probability (1 in Figure 5), as does the unsampled deep part of Baffin Bay (2 in Figure 5). Figure 6 is comparable to Figure 5 except predictions are made from the RF model trained on the FC study area using the complete set of predictors. The observed presence locations in the FC study area are correctly predicted as having high presence probability, which is not surprising given that the model was trained on data from the FC study area. However, the grouping of observed presences near Hudson Strait (3 in Figure 6) is predicted as having only intermediate presence probability. This may partly be due to the shallower depths at which Geodia spp. are found at these latitudes; the mean depths of Geodia spp. presence observations located above and below 60°N are 802 m and 1290 m respectively, with the shallowest Geodia spp. below 60°N found at a depth of 868 m. However, the partial dependence of Geodia spp. on depth for the two models, shown in Figure 7, illustrate that the modeled influence of depth is similar between the two models, and that predicted Geodia spp. presence probabilities are low for depths around 1290 m for both models. Depth alone can therefore not explain the difference between the predictions of the two models. However, the partial dependences of Geodia spp. on bottom salinity range and minimum surface current in the model trained on the EC study area both show a spike in predicted presence probability when using the single-variable approach (red lines in Figure 8) for values around 0.2 PSU bottom salinity range and around 0.015 m/s minimum bottom current, which correspond to the environmental conditions found in the area of the observed presences near Hudson Strait (3 in Figure 6) as well as those found in the northernmost grouping of observed presences (4 in Figure  6). These specific environmental conditions were incorporated in the model trained on the EC study area, and lead to high presence probability predictions as seen in Figure 5. The same spikes are not seen in the corresponding dependence plots of the model trained on the FC study area (red lines in Figure 9), which did not have the presence observations from these areas available for model training.
No predictions are available for the northernmost grouping of observed presences (4 in Figure 6) because predictions for Geodia spp. are restricted to the area for which data were available for winter chlorophyll concentrations, as this predictor is included in the model. The data on chlorophyll concentrations are derived directly from satellite observations of reflected sunlight on cloud-free days, and their availability for the northern part of the EC study area in the winter months is therefore limited.

Partial Dependence Plots
Two examples of the partial dependence plots for the predictors of Geodia spp., FC study area, are shown in Figure 10 for the RF model and in Figure 11 for the ME model, both based on the set of predictors available after both variable elimination approaches were applied. The dependence of Geodia spp. on depth is illustrative of the differences both between the two approaches and the two model types. For RF (Figure 10), with the multivariable approach we see that Geodia spp. presence probability is 0.0 for depths less than approximately 800 m, and then increases to above 0.5 at depths of more than 2000 m (note that depths have negative values, 0 being the sea surface). In other words, depth alone cannot generate very high presence probabilities when the other predictors are kept at their mean value. With the single-variable approach the Geodia spp. presence probability is also 0.0 for depths less than 800 m, but reaches almost 1.0 at depths of more than approximately 1800 m. The difference between the two approaches illustrate that in reality the other predictors do not have their mean values in deep waters, but rather they have values that, in the model, contribute to high Geodia spp. presence probability. The dependence plots are 'spiky' and less interpretable for the three remaining predictors, except for a trend of higher Geodia spp. presence probability for higher values of 'Shear, minimum'.

Figure 10.
Dependence plots including both the single-variable and multi-variable approaches for the four predictors of Geodia spp., using the RF model for the FC study area. Figure 11. Partial dependence plots including both the single-variable and multi-variable approaches for the five predictors of Geodia spp., using the ME model for the FC study area. For ME (Figure 11), the dependence of Geodia spp. on depth differs between the two approaches as it did for RF. At very shallow depths Geodia spp. presence probabilities are 0.0. They then increase until depths of 1200 m, after which they decrease again. As for RF, the single-variable approach reaches higher presence probabilities than the multivariable approach, although the two curves have the same overall shape. The remaining plots are smoother than those from the RF model, and suggest that optimal conditions for Geodia spp. presence can be found in areas of intermediate depth (1000-1500 m), areas with a low salinity range (1.5-2 PSU), high minimum bottom temperature (2-3 °C), surface current range above 0.2 m/s and high minimum shear (>0.005 Pa). However, the different variables used in the RF and ME models and the differences in the partial dependence plots, which exist despite the very good fit of both models (AUC for both is 0.976 (Table 9)), make it hard to draw conclusions about ecological relationships. The information presented in the partial dependence plots should therefore be interpreted cautiously.

Variable Importance
Examples of variable importance results for RF and ME models of Geodia spp. presence, FC study area, are shown in Table 12. As is the case for the partial dependence plots, the different results for each model type, despite the very good fit of both models, makes it hard to draw conclusions about which variables truly have the greatest influence on the suitability of an area for Geodia spp. presence. Depth and minimum shear have the highest delta-AUC values for the RF model, indicating that, of the four predictors in that model, they have the strongest influence on predictions. These two predictors are also important in the ME model, as is surface salinity range. Surface current range has low importance for both models despite being included in both.  Table 13. Here depth is of secondary importance in the RF model, while it is 7.5 times more important than any other variable in the ME model. In addition, minimum shear is not included in either model as it was eliminated during the AUC-based variable elimination. However, the surface temperature range, absent from the models for the FC study area, is now the most importance predictor in the RF model, and the second-most important predictor in the ME model.

Global vs. Local Gear-Specific Thresholds for Defining Sponge Grounds
Based on a comparison of the AUC values, both models perform marginally better when developed with, and applied to, sponge grounds defined using a single global threshold than when developed with the local gear-specific thresholds (Table 10). The single global threshold enables the models to predict environmental conditions that allow a specific density of sponges to develop and persist in a location (in our case the density that corresponds to 70 kg/tow), as opposed to predicting environmental conditions that allow locally significant sponge densities to develop and persist. The 70 kg/tow threshold will underestimate the distribution of sponge ground habitat as defined using the local gearspecific thresholds in areas of the Eastern Arctic sampled by the Cosmos and Campelen gears (Table 1) and on the Scotian Shelf, while overestimating the area on the Newfoundland and Labrador shelves and slopes (Table 1).
The difference between the two different approaches to defining sponge grounds is illustrated by their respective observed and predicted distributions. Sponge grounds defined using the local gear-specific thresholds are shown in Figure 12, while sponge grounds defined using the global threshold are shown in Figure 13. Results from using the local gear-specific thresholds are generally more consistent with known occurrences of the sponge grounds (as defined using these thresholds) but cannot predict the presence of sponge grounds on the portion of the Scotian Shelf (1 in Figure 12) which is at the southernmost limit of the range of Geodia, which are the major constituents of the northern sponge grounds. When the global threshold is applied the northernmost sponge grounds have low predicted presence probability (2 and 3 in Figure 13). This is partly because several of these observations have values below 70 kg/tow, and also due to the shallower depths at which sponge grounds are found in the Arctic bioregion; the mean depths of sponge grounds (as defined by the global threshold) located above and below 60°N are 616 m and 1179 m respectively. The partial dependence plots for sponge grounds (global threshold, RF model, EC study area) are shown in Figure 14, where it is clear that areas with depths around the mean value for sponge grounds above 60°N will be predicted to have very low presence probabilities. The inability of a single model to adequately describe the influence of depth on sponge ground presence probability for the whole EC study area may be attributed to depth not being a causal predictor, but rather having influence on predictions through collinearity with other predictors that influence sponge ground development. If the relationship between depth and these other predictors changes substantially through space, a single model for which depth is an important predictor will not produce good predictions across the study area.
Variable importance results for the models of sponge ground presence for the EC area are shown in Table 14 (local gear-specific thresholds) and Table 15 (global threshold). Both RF models have depth as the dominant predictor, with delta-AUC values 2-3 times that of the second-most important variable, which in both models is the range of surface temperatures. Depth is also not the most important variable for the one ME model (local gear-specific thresholds), while the range of bottom salinity is the most important variable in the other ME model (global threshold). The range of bottom salinity is largely uncorrelated with depth in the EC study area (R=-0.095) so the importance of this variable in the ME model is unlikely to be an artifact of variable selection with collinear variables. The range of surface temperatures is also included in both ME models.

Species-vs. Genus-Level Response Variables
Models of Geodia spp. presence have very similar AUC values to models of G. barretti and G. phlegraei presences, suggesting that the environmental conditions conducive to settlement and growth of the five different Geodia spp. are sufficiently similar for their distribution to be modeled as a single entity. This is contrary to what would be expected in the presence of inter-specific competition and species-specific niche development. One potential explanation for this is that there is a correspondence between the spatial resolution of the environmental data layers (0.017°) and the "thematic resolution" of niche detail for which they are able to describe a spatial distribution, i.e. the different Geodia species may have similar habitat requirements in terms of what can be resolved from the spatial data layers used as predictors in our study, and the spatial expression of niche differentiation between the species may only show at a higher spatial resolution. If so, the similar model fits produced for genus-and species-level response variables suggest that we have only managed to resolve their common environmental niche. Given the limited knowledge of the biology and ecology of the species in question, this must currently remain a hypothesis. Another factor that can influence model fit is the larger number of presence observations at the genus-level than at the species-level, which may improve the ability of both model types to fit to the training data.

SDM-Based Distribution Maps
Distribution maps produced using the RF models trained on the predictor sets remaining after variable elimination are considered the best available description of the spatial distribution of each response variable for the EC study area. The maps of sponge ground distribution are shown in Figure 12 and Figure 13, the map of Geodia spp. distribution is shown in Figure 5, and the maps of G. barretti and G. phlegraei distributions are shown in Figure 15 and Figure 16 below.

CONCLUSION
Our study has demonstrated that species distribution models, trained using research trawl data and spatial information on depth, slope, chlorophyll-a concentration, shear, and surface and bottom temperature, salinity and current speed, are able to make excellent predictions of the distributions of Geodia spp. sponges and sponge grounds in the northwest Atlantic. The mapped predictions can thus be considered a good estimation of the actual distribution of sponge grounds, as defined by the global and local gear-specific thresholds used in this paper, and Geodia spp. presences in the study area.
Of the two model types tested, Random Forest generally out-performs MaxEnt when tested against independent validation data; the difference between the two model types is larger when models trained for the Flemish Cap are extrapolated to the East Coast study area. Unsampled areas predicted to have high presence probabilities can serve as target areas for future sampling to further validate the models. Comparing both with the kernel analyses previously employed, there is a general distribution match between the RF and kernel maps (Figure 38). However the RF model predictions have a more refined distribution as would be expected given its use of absence data in addition to the presence data. For management and conservation purposes, RF can be used to refine kernel models of the location of sponge grounds and to predict sponge ground areas where data are either sparse or non-existent.
The two approaches used for variable elimination, based on removal of collinear variables and variables not contributing to model fit, respectively, both succeeded in reducing the number of predictors from the original 50 and making the fitted models more interpretable without substantial reduction in model fit when tested on independent validation data from the area for which the models were trained. However, when training models for the Flemish Cap and applying them to the East Coast, the models trained on reduced sets of predictors had poorer fits for Geodia spp., G. barretti and G. phlegraei, while the models trained on the set of predictors remaining after removal of highly collinear predictors obtained the best fits for sponge grounds. If models are intended for extrapolation, we conclude that AUC-based variable elimination should not be used, while the benefit of removal of collinear predictors will depend on the response variable.
Model fits for sponge grounds are marginally better when sponge grounds are defined using a global threshold (we used 70 kg/tow) than when they are defined using local gearspecific thresholds; this also holds true for models extrapolated from the Flemish Cap to the East Coast study area. We conclude that the single global threshold enables the models to predict environmental conditions that allow a specific density of sponges to develop and persist in a location (in our case the density that corresponds to 70 kg/tow), as opposed to predicting environmental conditions that allow locally significant sponge densities to develop and persist.
Fits are similar for models of Geodia spp., G. barretti and G. phlegraei. This suggests that the environmental requirements of the different Geodia species are similar, at least when quantified at the spatial resolution and with the predictors used in our study.
Future research should be directed specifically at establishing best practice guidelines for development of distribution models for extrapolation to unsampled areas, as this is one of the primary strengths of the models. In addition, investigation of partial dependence plots should be used to develop and test hypotheses to determine predictor values (or combinations of values for multiple predictors when interactions are significant) that represent physiological tolerance limits for each species. These can be validated against global data sets to provide improved insight into the biology and ecology of these sponges, which are otherwise difficult and costly to study.

APPENDIX
This appendix contains all 1) partial dependence plots, 2) variable importance results, and 3) mapped predictions of distributions for the FC study area.

FC Study Area
Geodia spp.

EC Study Area
Sponge Grounds Figure 24. Partial dependence plots for Geodia spp., RF model, EC study area.

FC Study Area
Geodia spp.

Mapped Predictions, FC Study Area
Geodia spp. Figure 34. Geodia spp. locations in the FC study area, observed (coloured circles) and predicted probability by the RF model (greyscale). Figure 35. G. barretti locations in the FC study area, observed (blue circles) and predicted probability by the RF model (greyscale). Figure 36. G. phlegraei locations in the FC study area, observed (red circles) and predicted probability by the RF model (greyscale). Figure 37. Sponge ground locations in the FC study area, observed (green circles) and predicted probability by the RF model (greyscale).

Figure 38.
Comparison of predicted sponge grounds with kernel biomass. Upper left panel. Sponge ground locations in the FC study area, observed (green circles) and predicted probability by the RF model (greyscale). Upper right panel. Sponge kernel biomass (displayed using a geometric distribution in kg/km 2 ) in the FC study area estimated from Spanish/EU research vessel catches . Lower panel. Sponge grounds as identified in the FC study area by NAFO based on a kernel biomass of 75 kg/km in the FC study area estimated from Canadian (DFO) and EU-Spanish research vessel catches. Areas closed by NAFO to protect sponges and other vulnerable marine ecosystems are indicated in outline in relation to the regulatory area.