Multiscale socio-ecological networks in the age of information

Interactions between people and ecological systems, through leisure or tourism activities, form a complex socio-ecological spatial network. The analysis of the benefits people derive from their interactions with nature—also referred to as cultural ecosystem services (CES)—enables a better understanding of these socio-ecological systems. In the age of information, the increasing availability of large social media databases enables a better understanding of complex socio-ecological interactions at an unprecedented spatio-temporal resolution. Within this context, we model and analyze these interactions based on information extracted from geotagged photographs embedded into a multiscale socio-ecological network. We apply this approach to 16 case study sites in Europe using a social media database (Flickr) containing more than 150,000 validated and classified photographs. After evaluating the representativeness of the network, we investigate the impact of visitors’ origin on the distribution of socio-ecological interactions at different scales. First at a global scale, we develop a spatial measure of attractiveness and use this to identify four groups of sites. Then, at a local scale, we explore how the distance traveled by the users to reach a site affects the way they interact with this site in space and time. The approach developed here, integrating social media data into a network-based framework, offers a new way of visualizing and modeling interactions between humans and landscapes. Results provide valuable insights for understanding relationships between social demands for CES and the places of their realization, thus allowing for the development of more efficient conservation and planning strategies.


Introduction
As visitors' priorities and consumption patterns evolve, people are travelling more frequently, further away from home, and in greater numbers [1]. People interact with the destination sites, affecting landscapes, societies and quality of life. Hence, these recent changing mobility patterns open up new challenges in understanding threats and constraints to the environment. Leisure or tourism activities affect cities and their surroundings, as well as remote natural areas, through the impact of travel movements and the presence of people [2,3]. Socio-ecological interactions generate, in turn, cultural ecosystem services (CES) and relational values, linking people and ecosystems via tangible and intangible relationships [4]. Visitors move according to personal preferences, often influenced by the attractiveness of an area. To gain an understanding of visitor patterns and how humans interact with their environment, it is essential to undertake a holistic approach to socio-ecological systems, by focusing on the different components of the system and the way they interact with each other. Models of spatial relations between CES realization areas and beneficiaries based on empirical data are needed to disentangle interdependencies between social and ecological systems at a high spatio-temporal resolution.
A promising approach is to consider socio-ecological systems as networks [5]. Indeed, nature-based interactions can be represented as a spatial network [6] that offers a way of visualizing and analyzing multiscale spatio-temporal CES demands linked to a particular site. However, the lack of data represents an important limitation for the modeling of CES emerging from socio-ecological interactions particularly at a global scale. Traditional data sources such as census or surveys usually fail at mapping human population dynamics during situations in which detailed spatio-temporal information is required [7], as in the analysis of individual human spatio-temporal trajectories. ICT devices such as mobile phones are now widely accessible and generate a large quantity of high resolution spatio-temporal information on individual human mobility patterns [8][9][10][11][12]. The reliability and the accuracy of these new data sources have been intensively evaluated in recent years, notably by comparing mobility information extracted from ICT data and more traditional data sources [7,[13][14][15]. Among these new data sources, of particular interest is geotagged information produced via social media that has been increasingly used in many scientific fields to study human mobility patterns [12]. Among the most popular, Twitter data has been widely used in understanding social networks [16][17][18] and how people interact with the built environment [11,14,19,20]. Data retrieved from the Flickr photo-sharing platform have been notably used for the identification of users' home locations [21] and the modelling of individual human mobility patterns [22]. Nevertheless, these studies usually focus on the way people interact with each other and with their environment in urban systems. More recently, the digital traces that we leave while visiting touristic and natural spaces have also contributed to the assessment of cultural ecosystem services [3,[23][24][25][26], the measurement of landscape values [27,28], the attractiveness of tourist sites [29][30][31] and the monitoring of visitors in protected areas [2,32,33]. These studies represent a crucial step towards a better understanding of interactions between people and ecological systems, through leisure or tourism activities, but they usually focus only on the presence of individuals on a site, and do not explicitly take into account the spatial relation between humans and nature that underlies beneficial socio-ecological interactions in situ, nor information about the individuals that visit a site.
The aim of this paper is to explore the potential of Flickr data for the study of socio-ecological interactions. The guiding idea is that interactions between individuals and ecological systems can be visualized and modeled using geotagged photographs from the Flickr photosharing platform embedded in a multiscale socio-ecological network. Based on more than 150, 000 photos taken in 16 study sites across Europe, this study examines the potential of the digital traces that we leave while visiting natural sites to efficiently represent socio-ecological interactions at different scales.

Extracting a multiscale socio-ecological network from social media data
Socio-ecological interactions have been extracted from a database containing more than 150, 000 photographs taken between 2000 and 2017 in 16 sites in Europe (Figs 1 and 2) and posted on the Flickr social media platform. Each photo is geo-localized (latitude/longitude coordinates), time-stamped and associated with a unique Flickr user ID. In order to ensure that only photographs representing an interaction between an individual and a natural site are considered, each photo has been manually validated and classified according to the landscape and the activity identified on the picture. These validated photographs have been taken and posted on Flickr by 2, 193 reliable users whose place of residence have been identified based on their Flickr timeline using 100 × 100 km 2 world grid cells. See the Materials and methods section for more details. We define a socio-ecological interaction as the presence of a Flickr user in one of the 16 sites during a given time window. The individuals are characterized by their place of residence. The ecological systems are represented by a geographical location at different scales. Two scales are considered, a global scale (16 European sites) and a local one where every site has been divided in zones using 500 × 500 m 2 grid cells. In practice, an interaction is represented by one or several photos taken by a user in a grid cell during a given hour. Note that if several photos are taken during an interaction, the different types of interactions (landscapes and activities) identified on the photos are aggregated. The resulting network is composed of 7, 354 socio-ecological interactions linking 365 distinct places of residence all over the world to 3, 418 grid cells located in 16 study sites. A spatial representation of the network at a global scale is displayed in Fig 2.

Evaluation of the network's representativeness
New data sources such as Flickr data have the great advantage of being global, in contrast with surveys and census data involving usually only one country or at most only a few countries. In return, they come with several biases associated with the lack of information regarding the users' sociodemographic characteristics. In order to collect more information about our sample we automatically sent a questionnaire to the 2, 193 reliable users of our cleaned database through the creation of a Flickr group. We obtained a response rate of 11%. Fig 3 shows some descriptive statistics about the respondents according to their socio-demographic characteristics. We note that men represent about two thirds of the respondents. There are also very few young people, the respondents were predominently professionals. By asking the respondents to provide us with their zipcode and country of residence, this survey supported the identification of the user's place of residence based on the Flickr timeline (more details available in the Materials and methods section and in the S1 File). The overall agreement is good: in 90 percent of cases, the location entered in the questionnaire is located within the 100 × 100 km 2 world grid cell detected with our algorithm.

Sites' attractiveness
Being able to measure quantitatively the interactions between a particular site and the rest of the world allows for the development of attractiveness indicators that have been already successfully applied to cities [11] or touristic sites [30,31] in the past. Among these metrics, of particular interest is the average distance traveled by the visitors to reach the site. place of residence and case study sites in our network. Note that to take into account the sites' accessibility, the distance was normalized beforehand (see the Materials and methods section for more details). The global attractiveness of a site can be inferred from the area above the curve, while the shape of the curve informs us on the type of attractiveness. We observe that some case study sites are more attractive than others, highlighting different levels of attractiveness from local to global influence. The hierarchical cluster analysis using the Ward distance identified similarities between CDFs. Four well-separated clusters are identified; the corresponding average CDFs are represented by the colored curves in Fig 4. The yellow cluster is composed of case study sites having a local influence, while the other case studies tend to attract people coming from further away. Key examples in mountain regions are the Vercors in the French Alps, that is mainly visited by locals and from nearby cities (yellow cluster), while the Sierra Nevada in Spain and the Carpathians have an international reputation and share the blue cluster. Sites composing the blue and green clusters have a high level of attractiveness, at a more global level for the blue one than for the green one.

Effect of the distance traveled on the socio-ecological interactions
To evaluate how the distance to a site influences the way people explore and interact with this site, we apply five metrics. These metrics summarize the distribution of interactions from a spatial and a temporal dimension, but also from the point of view of landscape diversity. We analyze basic characteristics of the spatial distribution of interactions taking into account the spatial coverage (number of cells with at least one interaction), the spatial dispersion of interactions in the cells measured with an entropy index, and a spatial dilatation index measured as the average distance between interactions. We also measure the temporal dispersion of visits throughout the year at a monthly granularity. Finally, we assess landscape diversity using six In order to assess the effect of the distance traveled we compare the results obtained considering only interactions made by individuals living further than a certain normalized distance to the ones obtained under the null hypothesis that does not take into account the distance, considering therefore all the interactions. However, most of the metrics used are affected by the sample size (i.e. number of interactions). To side-step this difficulty, we introduce a random null model accounting for the distribution with different sample sizes. The five metrics computed as a function of the normalized distance are plotted in Fig 5. Each point on the curve represents the value of a metric X d , taking into account the normalized distance traveled d, divided by X 0 , the value obtained with a random null model assuming that the distance has no influence on the metric (more details in Materials and methods). We observe that the area covered by the interactions and the dilatation index decrease with the distance traveled, while the interactions tend to be spatially distributed in a similar way whatever the distance traveled (as measured with the spatial dispersion index). Hence, as the distance traveled increases, the visitors tend to explore the area less, though the pattern of dispersal within the space explored is similar. In contrast, regarding the temporal aspect of the distribution, we observe that the interactions tend to be more concentrated in a certain period of the year as the distance traveled increases. Finally, it is interesting to note that the complexity of interactions in terms of landscape diversity increases relatively little with the traveled distance. However, it is important to keep in mind that these observations represent a median behavior across the 16 case study sites and are not always representative of all case studies, particularly regarding the landscape diversity metric (see Fig G in S1 File).

Locals and visitors' interactions overlap
We focused so far on the influence of the distance traveled by the users on the distribution of their socio-ecological interactions. However, an important question remains: does the pattern and intensity of the interactions depend on the origin of the user? To answer this question, in this section we analyze the overlap between locals' and visitors' interactions. The interactions are first separated into two groups according to the users' place of residence. A user is considered as local if the normalized distance between her/his place of residence and the site is lower Cumulative distribution function (CDF) of the normalized distance between users' places of residence and case study sites. Each grey curve represents a case study. Four common profiles were found using ascending hierarchical clustering (AHC). Each colored curve represents one of this profile (average CDF in each cluster). The dendrogram resulting from the hierarchical clustering is shown in inset.
https://doi.org/10.1371/journal.pone.0206672.g004 than a predetermined threshold; otherwise they are considered as a visitor. The overlap is defined as the fraction of interactions in common between locals' and visitors' distributions depending on the dimension being considered (spatial, temporal or based on landscape diversity). Here again, to make the locals' and visitors' distributions comparable, we use a random model taking into account the difference in sample size (see the Materials and methods section for more details). To assess the impact of the threshold used to separate the two groups of users, the results have been aggregated over different threshold values ranging between 100 and 1, 000 km. Fig 6 shows the average and standard deviation obtained for each dimension. The spatial overlap between visitors' and locals' interactions is relatively low, with values fluctuating around 25% of overlap between the two spatial distributions. Fig 6b shows the temporal overlap between locals' and visitors' interactions: although the results are more heterogeneous, the overlap is globally higher but still quite low with an average overlap of 50%. These results tend to demonstrate that locals and visitors interact with natural spaces differently in space and time. This is less true for the type of landscapes observed during the interactions, with an 80% overlap between locals and visitors (green points in Fig 6c). It must be noted that focusing only on the type of landscapes observed by locals and visitors in cells frequented only by locals and visitors (i.e. without spatial overlap) does not significantly change the results except for the site of Trnava (yellow points in Fig 6c).

Discussion
Central to the measure of human perception and interest in natural environments is the concept of CES, but it is challenging to relate the supply of these non-material services to specific spatial units. Moreover, the attractiveness of a site and the way we explore it may be influenced by our origins [31,34]. Indeed, beyond the analysis of people's activities in natural sites, an important question remains: how does our origin impact the nature of our relationships with natural ecosystems? Taking advantage of a social media database, we proposed in this work a methodological approach to extract and analyze multiscale socio-ecological networks from volunteered, publicly available data generated from social media. We extracted and analyzed from a Flickr database 7, 354 socio-ecological interactions made in 16 case study sites in Europe by individuals living all around the world. Two scales have been considered. First a

Fig 6. Overlap between locals and visitors' interactions. Spatial (a), temporal (b) and landscapes (c) overlap between locals and visitors' interactions.
In panel (c), the green points represent the landscape overlap between locals and visitors considering all the cells, while the yellow points represent the landscape overlap between locals and visitors in cells frequented exclusively by locals from one side and visitors from the other side (without spatial overlap). Locals and visitors are identified according to the normalized distance. In order to assess the impact of the threshold on the results we averaged the metrics obtained with threshold values ranging between 100 and 1, 000 km. The error bars represent one standard deviation. The effect of the spatial resolution on the spatial overlap is presented in Fig I in  global scale, focusing on the sites' attractiveness based on the distance traveled by the users, and then a local scale, by analyzing how the way Flickr users explore a site varies with the distance traveled to reach this site.
Our results demonstrate that while different levels of attractiveness exist among sites (local, regional and global), the existence of differences in the patterns of socio-ecological interactions according to visitors' origins is remarkably consistent across sites. Indeed, the distance traveled has a significant effect on the way Flickr users interact with natural ecosystems in both the spatial and temporal dimensions. Although further research in this direction is needed, it would appear that the desire for landscape diversity in socio-ecological interactions does not vary significantly with the distance traveled to reach a site. Of particular interest is the concept of overlaps between locals and visitors that could be used within the framework of planning strategies oriented towards conservation and sustainable tourism, for example to improve management of visitor activities in protected areas in order to reduce human impacts [35].

Limitations of the study
In this work, we explore the possibility of making use of social media data to provide information about the way people interact with ecological systems. In particular, we developed a methodology to connect Flickr users' place of residence to places where the interaction took place in the 16 study sites. This allowed us to study how the distribution of interactions varies among sites and according to the distance traveled. Although it would not have been possible to conduct this research at global scale using conventional data sources such as surveys, it is necessary to recognize the potential limits and biases of our approach.
First, we cannot ensure the reliability of the data both in terms of space and time resolution. In order to limit the potential biases we considered the photos with the most precise spatiotemporal Flickr accuracy level (according to the Flickr API which gives access to the accuracy with which Flickr knows the date and location to be registered). It is also worth noting that most of our analysis is based on data aggregated both in space (100 × 100 km 2 world grid cells and 500 × 500 m 2 case study site grid cells) and time (month granularity). The spatial aggregation allowed to overcome the spatial accuracy error linked to the used GPS-enabled devices, or the map scale used to specify the photo location. By using a manual photograph validation and classification process, we were able to avoid potential errors in classification that arise when using automatic image processing tools. However, even though the interpretation of the photographs was performed by between 1 and 6 local experts following a rigorous protocol, interpretation of the images may still be subjective to some extent.
Another important limitation lies in the lack of information regarding the characteristics of individuals using Flickr. The process of identifying the user's place of residence allows us to discard non-reliable Flickr users (those with a collective account or who are not regular Flickr users). A first coarse filter was applied to exclude collective accounts from the data. Then, we applied several filters to ensure that a user shows enough regularity and that the assigned place of residence is the region of the world where he/she is really living (see the S1 File for more details). Nevertheless it was also important for us to be able to evaluate the performance of our place of residence detection algorithm. This is why we decided to integrate an online survey in our analysis. Although the response rate was quite low (11%), this survey permitted us to get a better understanding of the sociodemographic characteristics of Flickr users, which is usually an important limitation of this kind of study. We believe that the integration of online survey approaches combined with crowdsourced data might overcome some of the limitations of using geotagged public photos to analyze the way people interact with nature.
All these filters tended to reduce the size of our initial sample. This severely limited the possibility of performing multi-dimensional analysis (considering space, time and landscape diversity at the same time). Nevertheless, we rigorously studied these dimensions separately and to limit sample size effects we have introduced null models taking into account the sample size and its variability.
Finally, since the distance between a user's origin and the site visited can be biased by the geography, we took this heterogeneity into account by measuring the case study sites' accessibility. In this process, all the distances have been computed with the Haversine formula based on longitude and latitude coordinates. However, distances as the crow flies are rarely a direct proxy for travel time particularly at local scale. In future studies, flight distances, transport APIs and road network data could be used instead to calculate more realistic travel distances between different points on the globe.

Concluding remarks
Within the framework of this research we also developed a visualization application to provide stakeholders with a tool based on the analysis that could be used for planning (more details in S1 File). This web application is also oriented towards Flickr users who participated actively (providing input via the survey) or passively to the experience, and could become a platform in the future to share experiences from the photos and the visual content. Such a platform could limit the biases mentioned above, allowing the users to classify their own photos supported with image processing tools and to fill in an anonymous online survey improving knowledge about their origin and motivations.
Hence, following the approach proposed in this paper, further studies could consider the sociodemographic characteristics as well as psycho-cultural aspects which could reveal significant correlation with the knowledge or appreciation of specific ecosystems. Indeed this approach opens the door to future analysis and applications; further investigation is certainly needed to understand complex human-ecosystem interactions.

Photograph classification process
In order to ensure that only photographs representing an interaction between an individual and an ecosystem are considered, the subject of each photo was manually validated and classified according to the landscape identified in the picture and the different types of cultural services that people benefits from ecosystems. In this study, we focused on six landscape categories: agricultural and open landscape, sparse forest landscape, forested landscape, mountain landscape, manmade infrastructure, water landscapes and wetlands. At the end of the process, 16, 716 photos taken by 2, 967 users between January 2000 and 2017 were classified. Note that 98% of the photos were taken after 2007. More details about the photograph classification process are available in S1 File.

Identification of the user's place of residence
To identify the place of residence of the 2, 967 Flickr users, we retrieved through the Flickr API information related to all the geo-located photos taken by these users worldwide. Then, we divided the world using a grid composed of 100 × 100 square kilometer cells in a cylindrical equal-area projection. We define a user's place of residence as the cell from which she or he has taken most of her/his photos [15]. After discarding users where the place of residence could not be identified, we obtained 12, 850 classified photos taken by 2, 193 users between January 2000 and January 2017 in the 16 case studies. More details about the method are available in S1 File.

Accessibility and attractiveness
For each user, we computed the distance between their place of residence and the centroid of the study site he or she has visited. Since the distance between the user's origin and the visited site can be biased by the geography, we computed a normalized distance taking into account the origin of the user and the accessibility of the site. All the distances have been computed with the Haversine formula based on longitude and latitude coordinates. More details about the method used to compute the sites' accessibility are available in S1 File.

Description of the metrics
Spatial dimension. In order to investigate the impact of the distance traveled on socioecological interactions, we defined a set of metrics to characterize them. We focus here on the spatial and temporal dimensions, but also on the diversity of landscapes identified in the photos. Three indicators are used to characterize the spatial distribution of interactions. The spatial coverage is defined as the area covered by the socio-ecological interactions, estimated as the number of 500 × 500 m 2 cells in which at least one interaction occurred. This metric does not take into account the density of interactions in each cell nor the morphology of the spatial distribution. To compensate these limitations we also introduced a spatial dilatation index defined as the average distance between all the interactions and a metric of spatial dispersion to evaluate whether the interactions are concentrated in a few cells or evenly distributed within the surface covered by the interactions. We measure the spatial dispersion with a spatial entropy index. If we define the probability p i that an interaction occurs in a cell i, then the entropy E is given by: where the normalizing factor A is equal to the number of cells with at least one interaction. A value close to 0 means that the majority of the interactions are clustered in a few cells, and al value close to 1 indicates that the interactions are uniformly distributed among the cells. Temporal dimension. To get a better understanding of the way socio-ecological interactions are distributed within the year, and whether or not the distance traveled affects this distribution, we also rely on the entropy index to compute the temporal dispersion of interactions. In this case, we define the probability p i that an interaction occurs during a given month and the normalizing factor A is equal to log (12).
Landscape diversity. Another important dimension to consider is the diversity of landscapes present in the photographs. Here again, the landscape diversity is based on an entropy metric considering the probability p i to interact with particular landscape categories (agricultural and open landscape, sparse forest landscape, forested landscape, mountain landscape, manmade infrastructure or water landscapes and wetlands). Each interaction is characterized by a vector representing the probability to interact with the six landscape categories. The entropy is computed as an average over all the considered interactions. In this case, the normalizing factor A is equal to log (6).
Overlap. The way socio-ecological interactions are distributed in space, time or type of landscapes may depend on the origin of the user. To answer this question we analyze the overlap between locals' and visitors' interactions. We define the overlap between two distributions of probability p and q on the same finite support as follows, The distribution of probabilities p and q can be based on the fraction of locals' and visitors' interactions per cell (spatial dimension), month (temporal dimension) or type of landscapes (landscape diversity).

Null models
In order to assess the effect of the distance traveled on the metrics described above, we need to compare their values to the ones returned by a random null model that does not take into account the distance. Each indicator X described above can be calculated from a distribution of interactions considering only users living at a normalized distance higher than d from the sites. To be meaningful, this new indicator value X d that takes into account the distance needs to be normalized by X 0 , the value obtained with a random null model that does not take into account the distance and based on the same number of interactions. More specifically, X 0 is computed with the same number of interactions as X d , drawn at random among all the interactions without taking into account the distance. The value of X 0 is averaged over 100 replications.
Regarding the comparison between locals' and visitors' interactions, the two distributions are made comparable by taking the distribution with the lowest number of interactions as a reference, and drawing at random the same number of interactions in the second distribution to obtain a distribution of the same size. The overlap between these two distributions is then computed and averaged over 100 replications.

Ethics statement
Flickr data were collected through the Flickr API complying with Flickr's terms of service. The survey was made through a Flickr group. All survey participants were informed that the survey was anonymous and voluntary, that all data would be kept confidential and evaluated anonymously. Participants were informed that the study results and underlying data was to be published. To secure privacy, all data was collected via a web survey and analyzed anonymously. In particular, aggregation was already built into the questionnaire (age groups, world regions, etc.) and no IP addresses were collected.
Supporting information S1 File. File containing the manuscript supporting information. This file includes one table (Table A: