Figures
Abstract
The complex network framework has been successfully used to model interactions between entities in Complex Systems in the Biological Sciences such as Proteomics, Genomics, Neuroscience, and Ecology. Networks of organisms at different spatial scales and in different ecosystems have provided insights into community assembly patterns and emergent properties of ecological systems. In the present work, we investigate two questions pertaining to fish species assembly rules in US river basins, a) if morphologically similar fish species also tend to be phylogenetically closer, and b) to what extent are co-occurring species that are phylogenetically close also morphologically similar? For the first question, we construct a network of Hydrologic Unit Code 8 (HUC8) regions as nodes with interaction strengths (edges) governed by the number of common species. For each of the modules of this network, which are found to be geographically separated, there is differential yet significant evidence that phylogenetic distance predicts morphological distance. For the second question, we construct and analyze nearest neighbor directed networks of species based on their morphological distances and phylogenetic distances. Through module detection on these networks and comparing the module-level mean phylogenetic distance and mean morphological distance with the number of basins of common occurrence of species in modules, we find that both phylogeny and morphology of species have significant roles in governing species co-occurrence, i.e. phylogenetically and morphologically distant species tend to co-exist more. In addition, between the two quantities (morphological distance and phylogentic distance), we find that morphological distance is a stronger determinant of species co-occurrences.
Citation: Tripathi R, Reza A, Mertel A, Su G, Calabrese JM (2023) A network-based approach to identifying correlations between phylogeny, morphological traits and occurrence of fish species in US river basins. PLoS ONE 18(6): e0287482. https://doi.org/10.1371/journal.pone.0287482
Editor: Benigno Elvira, Complutense University of Madrid: Universidad Complutense de Madrid, SPAIN
Received: January 13, 2023; Accepted: June 6, 2023; Published: June 23, 2023
Copyright: © 2023 Tripathi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All the data used in this work is available on HZDR data repository Rodare and can be accessed using the link https://rodare.hzdr.de/record/2086.
Funding: This work was partially funded by the Center of Advanced Systems Understanding (CASUS), which is financed by Germany’s Federal Ministry of Education and Research (BMBF) and by the Saxon Ministry for Science, Culture, and Tourism (SMWK) with tax funds on the basis of the budget approved by the Saxon State Parliament. A.R is supported by the research program of the Netherlands Organisation for Scientific Research (NWO). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Functional traits and phylogenetic relatedness are important attributes of species in a community to assess the two critical facets of biodiversity i.e., functional and phylogenetic diversity. It is often assumed that the functional diversity of animal and plant species are concordant with their phylogenetic diversity [1–3]. Furthermore, for optimizing conservation strategies, it is crucial to understand whether conserving phylogenetic diversity will help in conserving functional diversity of species [4–6]. The relationship between functional traits and phylogenetic closeness has been widely explored in various taxa in the past decades. Recently, there have been many studies to better understand whether ecologically relevant functional traits are conserved along the phylogeny [7–10]. To do this, phylogenetic signal is typically addressed by testing if closely related species share more similar traits than expected by chance [11–13].
The functional and phylogenetic similarity between co-occurring species also plays a key role in the processes governing community assembly [14, 15]. Understanding the determinants of species co-occurrence in communities is a fundamental question in ecology [16]; therefore a few studies [15, 17] have focused on the structure of communities in species interaction networks as a way of gaining insights on coexistence mechanisms. There are two mutually exclusive mechanisms to illustrate the similarity of co-occurring species in a community. The competition mechanism [18–20] predicts that if a community is structured by competition, then species within the community should be more different functionally and phylogenetically than expected from assemblages that are comprised of random samples drawn from the potential species pool. In contrast, the environmental filtering mechanism predicts that if a particular type of habitat requires possession of certain adaptive traits, species within the communities should have more similar traits or be more closely related than expected by chance [21]. Ultimately, patterns of species co-occurrence may depend on how both biotic interactions and environmental filtering act over ecological and evolutionary time scales [22, 23]. For instance, Winston (1995) [14] reported that functionally similar species co-occurred less frequently than more dissimilar species for 219 fish communities from the Red River basin in the US, supporting the biotic interactions mechanism. In contrast, Peres-Neto (2004) [21] reported that co-occurring species had greater functional and phylogenetic similarity than expected by chance for stream fish species in the Macau River basin in Brazil, and concluded that the co-occurrence patterns were mainly driven by the environmental filtering. As the organization of species into communities is a non-random process constrained by their interactions and system dynamics, it is important to identify such community assembly rules against the backdrop of species’ evolutionary history. Hence, a framework is needed that incorporates phylogenetic and morphological relatedness of species to reveal their groups (communities). Beyond the group-level averages of morphological relatedness and phylogenetic relatedness of species, and number of basins of their co-occurrence, one can study the correlations between these three quantities. Hence, such a framework can help understand if morphological and phylogenetic closeness have differential impact on community assembly rules.
In this work, we complied occurrence data of native freshwater fishes from 2073 watersheds in the US. We measured 10 morphological traits that related to fish functions [24] and subsequently compiled phylogenetic information on these fish species [25]. We test the relatedness of species phylogeny with species morphology; and their relation with the species co-occurrences. For this, we used the concept of complex networks. The complex network framework not only allows the modeling of one-to-one interactions (relatedness) between entities of a complex system [26, 27], but can also explain the emergent structural patterns that have implications for its function [28]. Ecological Networks, in general, are finding increasing applicability in the study of species interactions, spatial ecology, community assembly, and its relation with phylogeny [29–32]. Studies have used fish co-occurrence networks for identifying a spatial cluster of fish-subgroups [33], and phylogenetic networks to retrace species dispersal history [34] in freshwater fishes in Ontario, Canada. Layeghifard et al. (2015) [35] presented a multi-species spatial network framework to describe the relationship between the spatial positioning of sites and the patterns of patch occupation by species across multiple ecological communities. Their approach is useful in inferring dispersal effects on spatial heterogeneity within metacommunities. Li et al. (2021) [29] developed an optimal information flow model to infer causality of interactions in complex ecosystems. Their approach of inferring predictive ecosystem networks based on transfer entropy and functional covariance offers a method to predict biodiversity changes under climate and anthropogenic stressors, and hence has potential in fish preservation. Based on our continental-scale fish dataset, we seek to answer the following questions: 1) Is morphological closeness of species related to phylogenetic closeness (for HUC8 level river basins), and 2) does trait similarity of fish species together with phylogenetic closeness predict species co-occurrence in HUC8 basins, and to what degree. To answer these questions, it is essential to look at all the three attributes of species (phylogeny, morphology and co-occurrence) simultaneously. This is because the correlation between two of these factors, say phylogenetic closeness and co-occurrence, might be influenced by their mutual correlation to a third factor, in this case morphological closeness. Our work therefore builds on and extends earlier network-based studies that explored similar questions [36–38], but only focused on one or two of these attributes. In the present work we attempt to identify possible interdependence among all three attributes using the complex network framework and continental-scale fish species phylogenetic, morphological and co-occurrence data. The goal of these analyses is to more carefully analyze the interrelationships among these three quantities, and to identify which pairwise relationships are strongest. The two classes of networks we constructed from the datasets are 1) co-occurrence based basin network of HUC8 regions, and 2) nearest-neighbour directed networks of species based on the metrics of phylogenetic distances and morphological distances to other species in the system. Through module-extraction on these networks, which identifies groups of densely connected entities in the network, we quantify relationships among these quantities. In summary, our goal is to understand correlations between fish phylogeny, fish morphology and their extent of co-occurrence via a multi-attribute network-based analysis, wherein all the three quantities are simultaneously studied.
About the dataset
- Species occurrence data: We compiled native species occurrence records at the watershed scale (i.e., Hydrologic Unit Code 8; HUC8) from NatureServe (https://www.natureserve.org) and included both extant and extinct species to account for species historically present in a given watershed. Records of non-native species occurrence were excluded from the entire study. Overall, the occurrence dataset has presence and absence information of 804 fish species in 2073 HUC8 regions. The NatureServe dataset provides comprehensive species lists per watersheds, and is commonly used as fish community data in the scientific literature [39–42]. Furthermore, as our aim is to disentangle the morphological and phylogenetic relationship between the species in the native communities under natural processes, non-native species, which were mostly introduced by humans, were removed from our analysis. For showing HUC8 subbasins on the US map, we use their centroid locations, obtained from the Watershed Boundary Dataset, United States Geological Survey (USGS, https://www.usgs.gov/national-hydrography/watershed-boundary-dataset).
- Functional traits: We compiled ten morphological traits related to fish locomotion and food acquisition from the FISHMORPH database [43, 44], and FishBase [45]. The ten morphological traits include maximum body length (Length), body elongation (BlBd), relative eye size (EdHd), oral gap position (MoBd), relative maxillary length (JlHd), vertical eye position (EhBd), body laterally shape (HdBd), pectoral fin vertical position (PFlBl), pectoral fin size (PFiBd), and caudal peduncle throttling (CFdCP). Due to insufficient information on some species, some values were missing in the raw functional trait data. We statistically imputed these missing values [24] with a machine learning algorithm called #x2018;missForest#x2019; [46, 47]. This method uses a random forest trained on the observed values of a data matrix to predict the missing values and automatically calibrates the filling values by a set of iterations [48]. The missforest algorithm is one of the best performing methods to statistically impute missing trait values [46, 47] and has been used in many studies [24, 49–51]. Here, we used this method with all parameters set to default values to impute the 10 trait missing values, while accounting for taxonomic information. To improve the accuracy of the results, we log-transformed the values before putting them into the model, and then back transformed them after imputation.
- Phylogenetic relatedness: We obtained the phylogenetic information of these fish species from Rabosky et al. (2018) [25], which includes 31, 526 marine and freshwater ray-finned fishes. This dataset is based on 11, 638 species whose position was estimated from genetic data; the remaining 19, 888 species were placed in the tree using stochastic polytomy resolution. We pruned the tree and kept only the 804 native freshwater fish species used in our study. Then we computed the pairwise distances between the pairs of these species from the pruned phylogenetic tree by using the ‘cophenetic.phylo’ function from the R package “ape” [52] (version 5.6–2) with all parameters were set to default values.
Methodology
We use the phylogenetic distance (PD) dataset to measure phylogenetic closeness and farness between the fish species. In the trait dataset, 10 morphological traits of all species are normalized (by division by maximum value of the trait) to avoid the impact of their magnitudes on the analysis. Further, we assume them as vectors in a 10-dimensional trait space, and visualize each species as points in this vector space. For assessing the trait resemblances of species from the morphological traits dataset, we needed to define a measure and therefore we borrow the concept of cosine-similarity (CS). Hence, for measuring the trait-similarity between species, we use the CS [53–55] measure. The CS between trait vectors is the angular distance (θ) between the species in the 10-dimensional feature space, measured as the cosine of θ. For any given two species (say S1 and S2), the morphological similarity between them is measured using the definition:
where T1 and T2 are trait vectors of species, and n = 10, as there are 10 traits in total.
Hence, unlike the Euclidean distance, it is independent of the vector magnitudes, with CS = 1 implying that the vectors are co-linear or that the species are similar and CS = 0 that the vectors are orthogonal or that the species are not similar in their traits. However, because we intend to correlate trait similarity with phylogeny, which is conventionally expressed as phylogenetic distance (PD), we use cosine-distance (CD) defined as CD = 1 − CS, instead of CS, and call it morphological distance between species throughout our analyses for consistency.
Complex network framework
First category of networks that we study is the basin network. In the basin network, the nodes are 2073 HUC8 regions and edges depict the number of fish species these basins share. Hence, absence of an edge would depict no common species between the basins. We demonstrate the construction of basin network in Fig 2(a) for eight basins each having some species. This network is weighted and un-directed by nature, where edge-weights are the number of species the two basins in question, share between them. We perform module-detection on the basin network to identify groups of basins with higher co-occurrences and their geographical distribution. For correlating co-occurrence information with phylogenetic closeness and trait similarity, we obtain basin-wise means of PD (〈PD〉) and CD (〈CD〉) of species within each basin. Further, to infer dependence between morphological and phylogenetic distances, we look for association between these quantities by performing linear regression on the data of each cluster and obtaining R2 and corresponding p- value to ascertain the quality of the fit. To explore characteristic ranges of 〈PD〉 and 〈CD〉 in these clusters we obtain clusters of randomly selected basins (from all the 2073 HUC8 regions), of equal size to the actual clusters. The ranges of 〈PD〉 and 〈CD〉 in random clusters is then compared with the actual cluster ranges.
Once we have explored the relationship between phylogenetic closeness and trait similarity from the basin network analyses, the next two questions we ask are: if these quantities correlate or explain variances in number of basins of co-occurrences? And, which of these quantities is a more fundamental or a stronger determinant of species co-occurrence? To explore these questions we look at the species networks, which is the second cataegory of the networks we construct and analyse.
Studying species interactions in a network framework allows us to gauge the overall connection structure in a single snapshot. To this end, we construct directed species networks wherein an edge from a node (species) to another node means that the latter is the nearest neighbor of the former. There are two networks like this, one of nearest neighbors based on phylogenetic distance, and one of nearest neighbors based on morphological distance. For the first network, for example, if the two species are phylogenetically closer to each other, i.e. they have small PD between them, they are connected via an edge, and similarly, other node pairs in the network are assigned their connections. The number of connections beginning from all source nodes are fixed beforehand and each of them have those many outgoing connections. This results in directed networks wherein edges capture species-specific interactions or the asymmetry of interactions. For example, if species P is among the top nearest phylogentic neighbours of species Q, it does not imply that species Q is among the top nearest phylogentic neighbours of species P. Hence, if a directed edge exist from Q to P, it may not exist from P to Q. In this manner, we identify a fixed number of nearest neighbors (NN) nodes for each node in the network and obtain directed networks of fish species. The procedure for constructing species networks based on a pre-defined number of NN is explained in Fig 1(b). The matrix defining the connections in the network is called the adjacency matrix (A), where Aij = 1 denotes the presence of connection and Aij = 0 denotes the absence of the connection between nodes i and j. As described, these are directed networks (identified by the presence of directed links between the nodes as shown in Fig 1(b)), as each of the species selects its NNs depending on the value of its PD or CD to target species. The two kinds of species’ networks, constructed and analyzed in this work are:
(a) The diagram shows the procedure for construction of species co-occurrence based basin network of eight representative basins. Each basin is assumed to have a fixed number of species. The number of common species between two given basins determines the edge-weight between the basins, which is here represented by the thickness of the edge. On the right is corresponding adjacency matrix for the network on the left, with numbers indicating the number of common species between the corresponding basins. (b) The schematic diagram demonstrates the procedure for construction of network of species based on their nearest neighbours (NNs). The panel on the right shows an example network of five species, with each node connecting to two of its NNs. This network (main) is formed by aggregation of different sub-networks. The panel on the left shows these sub-networks; first sub-network shows species S1 connecting to S2 and S3, second shows S2 connecting to S1 and S4, and so on. Alongside each of these sub-networks and the main network is shown an adjacency matrix, where 1 (0) in a cell indicated the presence (absence) of connection between the nodes. The network on the right shows all the connections in the sub-networks on the left, and its adjacency matrix is the sum of all the sub-network adjacency matrices. The solid edges are the ones explicitly shown in the sub-networks, and the dashed edges are edges from sub-networks whose display has been skipped on the left.
- PD based Network: Each species in the network connects to NN other species that are phylogenetically closest to itself.
- CD based Network: Each species connects to NN other species that are trait-wise most similar to itself.
Hence, for PD based species network, we connect all the 804 species to their NN phylogenetic nearest neighbours, and for CD based species network, we connect them to their NN trait-wise nearest neighbours. Further, for both categories of these species networks we identify modules (or groups) of species using a network community-detection algorithm and then explore co-occurrence of the species within modules, in the HUC8 regions. For species within each of these modules, we obtain mean over pair-wise PD (〈PD〉) and mean over pair-wise CD (〈CD〉). Following this, we obtain the number of HUC8 regions in which at least two of the species within the modules occur together. Hence, each of the species modules is assigned three quantities: 〈CD〉, 〈PD〉 and the number of basins of co-occurrence of at least two species. We denote the last quantity as NBS in the upcoming text.
As stated earlier, our goal is to ascertain which of these metrics, actually and significantly plays a role in community assembly rules. Hence, we define two null hypotheses: H1 and H2 as follows.
- H1: The number of basins of co-occurrence of species are not related to their trait similarity.
- H2: The number of basins of co-occurrence of species are not related to their phylogenetic closeness.
Notice that the averages (denoted by 〈x〉) in the species network depict the mean of pair-wise x over species in a given module, whereas in the basin network they are the mean of pair-wise x over species in a given basin (HUC8). To summarize, we construct and analyze both of these kinds of networks (basin network and species networks) to assess the co-dependence between the three quantities, i.e. phylogenetic distance, morphological distance and probability of co-occurrence of species. While the basin network is helpful in understanding the geographical distributions of phylogenetic and morphological diversity, the species networks help us find meaningful species communities, their collective chances of occurrences vis-à-vis their phylogentic and morphological relation. The following section explains the significance of and method for identifying modules in the networks.
Communities in the network
An important concept in a network study is that of a module [56]. A module is a set of nodes that are more densely connected among themselves than to the rest of the network. For example, in a collaboration network, this would mean a group of authors that write papers with each other more frequently than with other authors in the network. For our basin network, a module is a subset of basins that on average share more species within the subset than with the remaining basins. The identification and visualization of these modules can help in understanding which basins tend to cluster together and if the modules are restricted to specific geographical locations on the network. Similarly, a module in a network of species is a group of species that are phylogenetically (if the network is constructed based on PD) or trait-wise (if the network is constructed based on CD) closer to each other, than to the rest of species. To identify modules in the undirected basin network, we use the Louvain algorithm [57] that returns the best partition of the network into modules by optimizing network modularity (Q) [58]. For the species networks, we use Leiden algorithm [59], that also uses modularity optimization and can efficiently identify modules in directed networks. A high value of Q (close to 1) indicates high divisibility of the network into clearly defined modules and vice-versa. Therefore, module compositions of the network are robust representations of the clustering of network nodes when the modularity is high, i.e., close to 1.
Results
Here we present our results from the analysis of basin network first and then from the species networks.
Relation between species phylogenetic distances and trait dissimilarity at the watershed level: Analysis of basin network
Using the basin network, we explore the relation between mean phylogenetic distances and mean trait dissimilarity of species within basins in the context of their geographical occurrence. Due to a higher species richness in the eastern part of the US than the western part (see S1c Fig), the network is denser in the eastern part, signifying a larger number of co-occurrences of species in those basins. On performing the module detection on the basin network, geographically clustered groups of basins are obtained. By definition, basins within a cluster share more species among them than they share with basins in other clusters. The obtained clusters are shown in different colours on the US map in Fig 2(a) (the network edges are lightened for visualization purposes). The clusters on the map reveal geographical basin groups that have larger co-occurrences of species. Additionally, the clusters on the West end (red) and East end (orange) of the map seem to be broadly demarcated from their adjacent clusters by geographical divides that separate watersheds flowing into different oceans (shown by black bold lines on the map).
The figure shows a network of the US watersheds with nodes plotted at the centroids of 2073 HUC8s and colored according to the cluster they belong to, the edges are lightened for visualization. Basins in only the coterminous United States is shown here. The clusters are the modules identified using community detection algorithm on the network of HUC8s, where the network is constructed based on number common species between HUC8s. The map was made with Natural Earth base maps data (public domain) and USGS WBD—Watershed Boundary Dataset (U.S. Public Domain). The map also contains information about the major drainage basin boundaries (USGS) as represented by the black bold lines that separate watersheds flowing into different oceans. (b) The figure shows morphological traits based mean cosine distance (〈CD〉) versus mean phylogenetic distance (〈PD〉) between the species in the basins in the HUC8 clusters. The colors are indicative of the module index. (c) The figure shows 〈CD〉 versus 〈PD〉 of randomly selected basins from across the network. The number of randomly selected basins are same as in the actual modules shown in part (b). The blue lines in parts (b) and (c) are the best linear fits to the data.
For each of these basin clusters with high co-occurring species within their basins, we obtain basin-wise means of PD and CD of species within each basin. On plotting mean CD (〈CD〉) versus mean PD (〈PD〉), for each of these five basin clusters, we make two interesting observations. The first one supports the correlation between 〈CD〉 and 〈PD〉 of the species as can be seen from subplots in Fig 2(b) for all the five clusters. The R2 score from regression analysis for five clusters are (0.70, 0.48, 0.60, 0.70 & 0.60) with p- value < 0.05 for all. The regression lines on the data for all the clusters are shown with blue solid lines in the plots.
The second observation is related to the ranges of 〈CD〉 and 〈PD〉 for basins within the clusters. These ranges can be read from the Fig 2(b) for all five clusters. We observe that both 〈CD〉 and 〈PD〉 ranges are towards higher values ([〈CD〉> = 0.02] and [〈PD〉> = 200]) for three clusters (1, 2, 4) out of five. This observation becomes clearer on comparison with random clusters. For all five random clusters, shown in Fig 2(c), although we again obtain significant correlation between the 〈CD〉 and 〈PD〉, the ranges of the quantities also cover the lowest values of the respective means, i.e. 〈CD〉< = 0.02 and 〈PD〉< = 200, apart from higher values. This is also true for other randomizations (not shown) over basin clusters.
On our dataset, we calculated the phylogenetic signal—a traditionally used method to infer statistical dependency between trait values of species as a consequence of their phylogenetic relation. In Table 1, we present the values of Bloomberg’s K and Pagel’s λ, which suggest existence of strong phylogenetic signal for native fish species in the US.
For each morphological trait, the table presents the phylogenetic signal in terms of two metrics–Blomberg’s K [11] and Pagel’s λ [60], along with their p- values. The ten morphological traits are relative eye size (EdHd), oral gap position (MoBd), relative maxillary length (JlHd), vertical eye position (EhBd), body elongation (BlBd), body laterally shape (HdBd), pectoral fin size (PFiBd), pectoral fin vertical position (PFlBl), caudal peduncle throttling (CFdCP) and maximum body length (Length).
Primary determinant of species co-occurrence: Phylogeny or morphology?
In the previous sub-section, we established that basin level 〈PD〉 and 〈CD〉 are correlated quantities for US freshwater fish species, irrespective of the geographical location of the basin in consideration. The already established robust correlation between 〈PD〉 and 〈CD〉 hint that it is safe to expect that either both or neither of these quantities explain variance in chances of species co-occurrence. In this sub-section our focus is to analyze PD and CD based species networks to ascertain which among these two quantities is a stronger determinant of species co-occurrence. These networks for CD metric for two nearest neighbour choices, NN = 10 and NN = 50 are shown in S3 Fig, with clusters obtained from the community detection procedure shown in different colours. For the higher number of NN, the networks are organized into smaller number of modules and the clustering is of lower quality. This can be understood from the plot in S2a and S2b Fig showing decreasing value of modularity (Q) as a function of NN choice for both CD based and PD based networks.
To understand if the species co-occurrences are dictated by their trait dissimilarity, i.e., to test hypothesis H1 as defined in methods section, we obtain the correlation of 〈CD〉 and NBS, for the CD-based networks for a range of NN values (NN = 1 to NN = 100). These plots are shown in Fig 3, for NN = 10 (a) and NN = 50 (b) where data is sorted in ascending order of NBS. The horizontal dashed blue line is the mean over 〈CD〉 of all clusters. For each cluster in the CD-based network, we also obtain the 〈PD〉 of species within them. The red dots and the dashed red horizontal line in the same plots shows cluster-wise 〈PD〉 and mean over 〈PD〉 of all the clusters, respectively. From these two plots, it appears that NBS increases with 〈CD〉 with some exceptions. Additionally, we observe that 〈CD〉 and the corresponding 〈PD〉 fluctuate similarly around their mean values (red and blue horizontal lines). To statistically test the dependence of NBS on 〈CD〉, i.e. to test our hypothesis H1, we obtain R2 coefficient and p- value with 〈CD〉 as the independent variable and NBS as the dependent variable (see values in Table 2). From this analysis, we find that the dependence is significant (p- value < 0.05) for NN = 10 network but not for NN = 50 network. On performing similar test with 〈CD〉 as the independent variable and 〈PD〉 as the dependent variable, we again find the dependence is significant (p- value < 0.05) for NN = 10 network but not for NN = 50 network.
For the network constructed using cosine distance between species, the figure show relationship between the number of basins having at-least two of the species of the module (along x-axis) and corresponding means 〈CD〉 (right y-axis) and 〈PD〉 (left y-axis) of species within modules, for NN = 10 (a), and NN = 50 (b) networks. Similarly, the bottom row shows corresponding results for species network constructed using PD between species for NN = 10 (c), and NN = 50 (d).
x and y represent independent and dependent variables, respectively, in the regression analysis. Results for four settings with respect to the metric used of species network construction and the number of nearest neighbours are shown. The R2 coefficient and the p-values shown here are for a single experiment.
To check hypothesis H2, we do similar analysis with PD-based network, i.e. investigate if 〈PD〉 can explain the variance in NBS. The results are shown in Fig 3(c) and 3(d). In this case, we see a clear trend of variation of the quantities among each other for NN = 10 (c) network and for NN = 50 network. To ascertain the significance of observed relations between the three quantities, we obtain linear regression statistics for these networks in terms of R2 coefficient and p-value as shown in Table 2 for these particular nearest neighbour choices. For the NN = 10 PD-based network we find that the 21.7% of the variance in NBS (R2 = 0.217, p < 0.05) is significantly explained by 〈PD〉. For NN = 10 CD-based networks we observe significant correlations between NBS and 〈CD〉 (R2 = 0.412, p < 0.05). The R2 values for NN = 50 networks are also presented alongside which tell that correlation between NBS and 〈CD〉 (for CD based network) and correlation between NBS and 〈PD〉 (for PD based network) are not significant. This motivated exploring statistics for a range of NN values for both these network types. Restricting to NN values that lead to higher modularity values of obtained networks and hence result in meaningful modules (see S2 Fig), we chose the range 2 < NN < 100 for both the network types. The statistics for the full range of nearest neighbour choices are presented in plot (a) in Fig 4, which shows R2 values, with p- value < 0.05 shown by filled circles and p- value ≥ 0.05 shown using hollow circles. All the results in this figure are averages over 100 runs of module identification algorithm for each NN choice, to account for any randomness in module assignment.
(a-b) Variation of R2 with nearest neighbours (NN) for cosine distance (CD) and phylogenetic distance (PD) based networks with y are dependent variable and x as independent variable. For the CD based network, mean CD (〈CD〉) is the independent variable, and for the PD based network mean PD (〈PD〉) is the independent variable. The dependent variables in the former case (CD based network) are the number of basins of co-occurrence of at least two species (NBS) and 〈PD〉, and in the latter case (PD based network) they are NBS and 〈CD〉. The filled circles indicate that corresponding p- value < 0.05, and hollow ones indicate p- value ≥ 0.05. (c-d) These are the corresponding plots for the random module analysis.
From the full range plots (Fig 4(a)), we observe that there exist (almost) distinct ranges of NN values for which the hypotheses H1 and H2 can be rejected. For the PD-based networks, this range is 1 ≤ NN ≤ 10, but for the CD-based networks, the range is 10 < NN ≤ 23. Additionally, we observe that for the CD-based network, the significant R2 values (〈CD〉 explaining variance in NBS) are higher 0.4 < R2 < 0.6 than the PD-based network where significant R2 values (〈PD〉 explaining variance in NBS) are 0.0 < R2 < = 0.2. From the latter observation, we can infer that NBS is more strongly dependent on 〈CD〉 than 〈PD〉. Next in our the full range analysis, we return to correlations between 〈CD〉 and 〈PD〉. For the CD-based network, we explore if 〈CD〉 explains variance in 〈PD〉, and for the PD-based network, we explore how 〈PD〉 explains variance in 〈CD〉. We observe, as shown in plot (b) of the Fig 4, that there is much longer range of NN values where 〈PD〉 explains variance in 〈CD〉 than the case where 〈CD〉 explains variance in 〈PD〉. This observation reflects that species clusters obtained based on phylogenetic closeness tend to incorporate morphologically similar species for a larger range of nearest neighbours.
So far we have not given any biological meaning to modules we identify, nor do we claim that these are the unique representations of phylogenetically close or morphologically similar species. However, to understand if (indirect) evidence points to the relevance of these modules and to check for the validity of our results, we examine the statistics obtained from random modules of species. To this end, we randomly select species and consider them as modules such that they are of the same sizes as the actual modules for the whole NN range that we are interested in. We observe that neither 〈CD〉 nor 〈PD〉 explains variance in NBS for any of the NN values. On the other hand, both 〈CD〉 and 〈PD〉 are correlated with each other, but for a smaller range of NN than with analysis using the actual networks’ module. The plots for the random module analysis are shown in Fig 4(c) and 4(d). Note that, here too the range of NN for 〈PD〉 significantly explaining variance in 〈CD〉 is longer than the reverse case. Both these observations argue in favour of the modules we identify and the conclusions we draw from their analysis.
Discussion
In this work, we set out to understand how phylogenetic diversity is related to (or explains) morphological diversity of fish species and if morphology and phylogeny govern the species content of ecological fish communities. We use the phylogenetic distances, morphlological traits, and occurrence information of native fish species in the HUC8 regions of the US. Clearly, the dataset we use in our experiments accounts for a large geographical expanse in terms of species occurrence and also accounts for all native fishes in the US. To the best of our knowledge our study is one of the first to utilize continental level fish data for this domain of research.
Traditionally ecologists have relied on using Phylogenetic Signal of morphological traits to understand statistical dependency between trait values of species as a consequence of their phylogenetic relation. Although some recent studies have pointed out that the use of phylogenetic signal is not always reliable [12, 13], it is still the state-of-the-art method used in most ecological studies. Blomberg’s K and Pagel’s λ calculated on the data suggest existence of strong phylogenetic signal for native fish species in the US. This implies that the morphological distance is statistically explained by phylogentic distance. Our modular analysis on basin network results showed significant R2 values of (0.70, 0.48, 0.60, 0.70 & 0.60), implying that these percentages of variances in 〈CD〉 are explained by mean 〈PD〉 in each basin cluster. This result, not only (broadly) confirms existence of strong phylogenetic signal, but also shows how this relation varies in different geographical regions across the US, by design. Through a comparison with random basin clusters, we find that basins (HUC8 regions) in the species rich East end of the map (clusters 1, 4) show stronger statistical dependence between these quantities than the rest of basins. These basins also happen to be those that have only higher values of mean phylogenetic distances and higher mean morphological distances given the full range these quantities take for all the basins. In other words, species in basins with higher co-occurrences (in eastern part of the US) stick to larger 〈CD〉 and 〈PD〉 ranges, whereas this cannot be strictly said for the western part (3, 5 clusters) of the US. Cluster 5 also has a uniquely high number of basins with high mean morphological distance, which means that a few basins in the west have the most morphologically distinct species. The species spatial network proposed by Layeghifard et al. (2015) [35] incorporates spatial network of sites and species co-occurrence information into a common framework, which is used to infer impact of dispersal on species assemblages. The modular structure of our basin network (Fig 2(a)) captures the impact of species dispersal, which is an important factor governing meta-community structure of species [61]. Specifically, nearby basins, which have higher species similarity in species composition, are spatially clustered together (modules); and the red and the orange clusters are separated from those in the middle (as explained by geographical divides shown in black).
There have been studies which have used co-occurrence as a means to infer the trait matching between species that have antagonistic and mutualistic interactions [62]. Similarly, another study explored how co-occurrence of a plant species in Cape Floristic Region is limited by phylogentic relationship between them [63]. In the present study, we also use species co-occurrence of freshwater fish species as a marker to infer the key determinant of their community structure among the two well-known factors: morphology and phylogeny. Our CD based and PD based species network analysis reveals that both phylogenetic distance and morphological dissimilarity determine the chances of species co-occurrences, but between these two, morphological dissimilarity between species more strongly determines if they co-occur or not than the phylogenetic distance. The latter observation aligns with the results in the study by Winston (1995) [14] on cyprinid fish species in Red river basin in the US. The study reports that morphologically more similar species pairs co-occurred to a lower degree than morphologically less similar species. For our null-hypotheses (see Methods) tests pertaining to primary determinant of species co-occurrence, we constructed networks of species where interactions between species pairs are assigned if phylogenetic and morphological distances are small. As a result, the modules in these networks are determined by and have small phylogenetic and morphological distances between species. For CD based networks, the overall 〈CD〉 values are small, as expected, and similarly for PD based networks, the 〈PD〉 values of different clusters are small, when compared to the overall ranges CD and PD can take (as shown in histograms for CD and PD in S1a and S1b Fig). Although 〈CD〉 and 〈PD〉 values of the clusters are low in the respective networks, they vary; and this allows to investigate how the number basins of co-occurrence (NBS) of species within the clusters vary with 〈CD〉 and 〈PD〉 of the clusters. Hence, on studying of variation of NBS with 〈CD〉 and 〈PD〉 we could infer statistical dependence between them. Moreover, the analyses of modules identified from the networks of species allows for a more complete understanding of the relationship between morphology, phylogeny and chances of species co-occurrence, as compared to looking at differences in mean values among these quantities. A recent study highlights the usefulness of network modules of fish co-occurrence networks to characterize interspecific relationship [33]. Secondly, our observations on CD and PD based networks are made for specific values of NN, and there are NN values for which the null-hypotheses could not be rejected. The larger NN values where the latter is true, result in networks with smaller number of modules of large sizes, which are also not very discrete, i.e. there are a increased number of inter-modular edges. These large modules although show similar ranges of variation in 〈CD〉 and 〈PD〉 (as for low NN), the NBS values are restricted to a very small range towards very high values. The smaller number of data points (modules) for higher NN values might have lead to the correlations being not captured with statistical significance. Nevertheless, since meaningful modules pertain to good modularity scores (low NN values), our results still stand firmly where this is true. As a result of decreasing modularity score (poorer modular structure) with increasing NN, we limit to NN values in the range NN = 1 and NN = 100 which results in good modularity scores. For the CD based network the Q value drops steeply with NN, whereas for the PD based network the decline is gentler. See S2 Fig where QNN = 10, CD ≈ 0.7 and QNN = 50, CD ≈ 0.55 for CD-based networks and QNN = 10, PD ≈ 0.9 and QNN = 50, PD ≈ 0.75 for PD based networks. Apart from modularity score, the number of clusters decreases with increasing NN, due to merger of the modules (obtained at smaller NN with each other). For example, for CD-based networks for NN = 10, the number of clusters was 11, but for NN = 50, the number of clusters was just 6 (see S3 Fig). Similar situation also occurs for the PD based networks.
To estimate extent of co-occurrence (in species network analyses), we used “basins with at least two of the species in a module” (NBS) as the measure. So for a basin to qualify as a co-occurrence basin for that module of species, it should have at least two of the species in the module, and NBS is the total number of such basins. We did not choose higher percentage of species in the module since there are basins with a very small number species, and this can result in NBS to be zero especially for smaller sized modules. Therefore to avoid the trivial values of NBS being zero and still retaining the definition of co-occurrence, we stick with the bare minimum condition of at least two of the species in the module. We also checked that the results do not qualitatively differ when co-occurrence of a higher percentage of species in the module are used as a criteria.
From our first main observation, we reiterate that although phylogenetic distances explain the morphological dissimilarity between fish species (basins lying close to regression line in Fig 2(b)), which also supports the biodiversity conservation strategies prioritizing phylogenetic diversity as a proxy, there are basins that have higher morphological diversity for a smaller phylogenetic diversity (basins lying far from the regression line in Fig 2(b)), for example in basin cluster 2 in eastern US. Our second observation, of morphology being a stronger determinant of species co-occurrence than phylogeny, also points to phylogeny being an imperfect proxy for biodiversity conservation. A few recent studies [5, 6, 64] also raise similar arguments.
Conclusion
This work proposes a novel framework for identifying correlations between Phylogeny, morphology, and the co-occurrence of fish species. The complex networks framework, wherein connections are determined through nearest-neighbours, not only allows modeling local node-specific interactions as edges, but these local interactions can cause the overall structure of the network to be comprised of clustered group of species (revealed through module identification) which are functionally meaningful. Through our module-level analyses of species-species networks based on phylogenetic and morphological distances, and the basin network based on number co-occuring species, we uncover the statistical dependence also confirmed by traditional measures like Phylogenetic signal of morphological traits, between fish phylogeny and morphology. Additionally, using this framework, we determine which of the two quantities, fish phylogeny and fish morphology, is a stronger determinant of their co-occurrences. We can extract a few take-home messages from our analysis. Firstly, phylogenetic distance (closeness) in fish species explains morphological distance (closeness) among species. Secondly, basins with a higher species richness have higher mean phylogenetic distances and higher mean morphological distance than basins with smaller richnesses. Thirdly, although both phylogentic distance and trait dissimilarity are significant determinants of number of basins of co-occurrence, morphological distance explains greater degree of variance in number of basins of co-occurrence than phylogentic distance. In summary, our observation points to the important role that evolutionary history of species play in their form and function; and the the role of species morphology in their probability of co-occurrence.
Supporting information
S1 Fig.
(a-b) Distribution of Phylogenetic distances and Cosine distances between fish species (c) Map of number of species in each basin: The dots indicate the centroids of HUC8 regions (obtained from USGS) and colours the number species present in the region. The range of species numbers is shown in the color map along side.
https://doi.org/10.1371/journal.pone.0287482.s001
(TIF)
S2 Fig.
(a-b) Variation of network modularity with number of nearest neighbours (NN) in cosine distance (CD) and phylogenetic distance (PD) based networks. The vertical line mark NN = 10 and NN = 50, and horizontal line marks Q = 0.5 for (a) CD based network, and (b) PD based network.
https://doi.org/10.1371/journal.pone.0287482.s002
(TIF)
S3 Fig. Nearest neighbor networks of species constructed based on Cosine distances between them.
(a) The network, where each node is connected to its 10 nearest neighbors, shows 11 different modules or network communities. (b) The network where each node is connected to its 50 nearest neighbors, shows 6 modules. The colours identify different network modules. In both of these networks the directed edges within module are shown in black colour and inter-modular edges are shown with grey colour.
https://doi.org/10.1371/journal.pone.0287482.s003
(TIF)
References
- 1. Kelly S, Grenyer R, Scotland RW. Phylogenetic trees do not reliably predict feature diversity. Diversity and Distributions. 2014;20(5):600–612.
- 2. Mouillot D, Bellwood DR, Baraloto C, Chave J, Galzin R, Harmelin-Vivien M, et al. Rare species support vulnerable functions in high-diversity ecosystems. PLoS biology. 2013;11(5):e1001569. pmid:23723735
- 3. Mouillot D, Villéger S, Parravicini V, Kulbicki M, Arias-González JE, Bender M, et al. Functional over-redundancy and high functional vulnerability in global fish faunas on tropical reefs. Proceedings of the National Academy of Sciences. 2014;111(38):13757–13762. pmid:25225388
- 4. Faith DP. Conservation evaluation and phylogenetic diversity. Biological conservation. 1992;61(1):1–10.
- 5. Winter M, Devictor V, Schweiger O. Phylogenetic diversity and nature conservation: where are we? Trends in ecology & evolution. 2013;28(4):199–204. pmid:23218499
- 6. Mazel F, Mooers AO, Riva GVD, Pennell MW. Conserving phylogenetic diversity can be a poor strategy for conserving functional diversity. Systematic Biology. 2017;66(6):1019–1027. pmid:28595366
- 7. Díaz S, Purvis A, Cornelissen JH, Mace GM, Donoghue MJ, Ewers RM, et al. Functional traits, the phylogeny of function, and ecosystem service vulnerability. Ecology and evolution. 2013;3(9):2958–2975. pmid:24101986
- 8. Xu J, Chen Y, Zhang L, Chai Y, Wang M, Guo Y, et al. Using phylogeny and functional traits for assessing community assembly along environmental gradients: A deterministic process driven by elevation. Ecology and Evolution. 2017;7(14):5056–5069. pmid:28770046
- 9. de Bello F, Berg MP, Dias AT, Diniz-Filho JAF, Götzenberger L, Hortal J, et al. On the need for phylogenetic ‘corrections’ in functional trait-based approaches. Folia Geobotanica. 2015;50:349–357.
- 10. Martiny AC, Treseder K, Pusch G. Phylogenetic conservatism of functional traits in microorganisms. The ISME journal. 2013;7(4):830–838. pmid:23235290
- 11. Blomberg SP, Garland T Jr, Ives AR. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution. 2003;57(4):717–745. pmid:12778543
- 12. Mouquet N, Devictor V, Meynard CN, Munoz F, Bersier LF, Chave J, et al. Ecophylogenetics: advances and perspectives. Biological reviews. 2012;87(4):769–785. pmid:22432924
- 13. Revell LJ, Harmon LJ, Collar DC. Phylogenetic signal, evolutionary process, and rate. Systematic biology. 2008;57(4):591–601. pmid:18709597
- 14. Winston MR. Co-occurrence of morphologically similar species of stream fishes. The American Naturalist. 1995;145(4):527–545.
- 15. Krasnov BR, Fortuna MA, Mouillot D, Khokhlova IS, Shenbrot GI, Poulin R. Phylogenetic signal in module composition and species connectivity in compartmentalized host-parasite networks. The American Naturalist. 2012;179(4):501–511. pmid:22437179
- 16. Baraloto C, Hardy OJ, Paine CT, Dexter KG, Cruaud C, Dunning LT, et al. Using functional traits and phylogenetic trees to examine the assembly of tropical tree communities. Journal of ecology. 2012;100(3):690–701.
- 17. Bascompte J. Networks in ecology. Basic and Applied Ecology. 2007;8(6):485–490.
- 18. Götzenberger L, de Bello F, Bråthen KA, Davison J, Dubuis A, Guisan A, et al. Ecological assembly rules in plant communities—approaches, patterns and prospects. Biological reviews. 2012;87(1):111–127. pmid:21692965
- 19. Levine JM, Bascompte J, Adler PB, Allesina S. Beyond pairwise mechanisms of species coexistence in complex communities. Nature. 2017;546(7656):56–64. pmid:28569813
- 20. Spasojevic MJ, Suding KN. Inferring community assembly mechanisms from functional diversity patterns: the importance of multiple assembly processes. Journal of Ecology. 2012;100(3):652–661.
- 21. Peres-Neto PR. Patterns in the co-occurrence of fish species in streams: the role of site suitability, morphology and phylogeny versus species interactions. Oecologia. 2004;140(2):352–360. pmid:15138880
- 22. Webb CO, Ackerly DD, McPeek MA, Donoghue MJ. Phylogenies and community ecology. Annual review of ecology and systematics. 2002;33(1):475–505.
- 23. Helmus MR, Savage K, Diebel MW, Maxted JT, Ives AR. Separating the determinants of phylogenetic community structure. Ecology letters. 2007;10(10):917–925. pmid:17845292
- 24. Su G, Villéger S, Brosse S. Morphological diversity of freshwater fishes differs between realms, but morphologically extreme species are widespread. Global Ecology and Biogeography. 2019;28(2):211–221.
- 25. Rabosky DL, Chang J, Cowman PF, Sallan L, Friedman M, Kaschner K, et al. An inverse latitudinal gradient in speciation rate for marine fishes. Nature. 2018;559(7714):392–395. pmid:29973726
- 26. Onnela JP, Saramäki J, Hyvönen J, Szabó G, De Menezes MA, Kaski K, et al. Analysis of a large-scale weighted network of one-to-one human communication. New journal of physics. 2007;9(6):179.
- 27. Szell M, Lambiotte R, Thurner S. Multirelational organization of large-scale social networks in an online world. Proceedings of the National Academy of Sciences. 2010;107(31):13636–13641.
- 28. Newman ME. The structure and function of complex networks. SIAM review. 2003;45(2):167–256.
- 29. Li J, Convertino M. Inferring ecosystem networks as information flows. Scientific reports. 2021;11(1):1–22.
- 30. Bastos RC, Brasil LS, Oliveira-Junior JMB, Carvalho FG, Lennox GD, Barlow J, et al. Morphological and phylogenetic factors structure the distribution of damselfly and dragonfly species (Odonata) along an environmental gradient in Amazonian streams. Ecological Indicators. 2021;122:107257.
- 31. Rezende EL, Jordano P, Bascompte J. Effects of phenotypic complementarity and phylogeny on the nested structure of mutualistic networks. Oikos. 2007;116(11):1919–1929.
- 32. Fontaine C, Thébault E. Comparing the conservatism of ecological interactions in plant–pollinator and plant–herbivore networks. Population Ecology. 2015;57(1):29–36.
- 33. McGarvey DJ, Veech JA. Modular structure in fish co-occurrence networks: A comparison across spatial scales and grouping methodologies. Plos one. 2018;13(12):e0208720. pmid:30550572
- 34. Layeghifard M, Peres-Neto PR, Makarenkov V. Using directed phylogenetic networks to retrace species dispersal history. Molecular phylogenetics and evolution. 2012;64(1):190–197. pmid:22491069
- 35. Layeghifard M, Makarenkov V, Peres-Neto PR. Spatial and species compositional networks for inferring connectivity patterns in ecological communities. Global Ecology and Biogeography. 2015;24(6):718–727.
- 36. Shukla R, Bhat A. Patterns and drivers of species co-occurrence networks in a tropical stream fish metacommunity. Hydrobiologia. 2022;849(12):2797–2811.
- 37. Echevarría G, Rodríguez J. Co-occurrence patterns of fish species in two aquatic habitats of the Arauca River floodplain, Venezuela. Community Ecology. 2017;18(2):137–148.
- 38. Rezende EL, Albert EM, Fortuna MA, Bascompte J. Compartments in a marine food web associated with phylogeny, body mass, and habitat structure. Ecology Letters. 2009;12(8):779–788. pmid:19490028
- 39. Muneepeerakul R, Bertuzzo E, Lynch HJ, Fagan WF, Rinaldo A, Rodriguez-Iturbe I. Neutral metacommunity models predict fish diversity patterns in Mississippi–Missouri basin. Nature. 2008;453(7192):220–222. pmid:18464742
- 40. Anas MM, Mandrak NE. Drivers of native and non-native freshwater fish richness across North America: Disentangling the roles of environmental, historical and anthropogenic factors. Global Ecology and Biogeography. 2021;30(6):1232–1244.
- 41. Comte L, Grantham T, Ruhi A. Human stabilization of river flows is linked with fish invasions across the USA. Global Ecology and Biogeography. 2021;30(3):725–737.
- 42. Qian H, Cao Y, Chu C, Li D, Sandel B, Wang X, et al. Taxonomic and phylogenetic β-diversity of freshwater fish assemblages in relationship to geographical and climatic determinants in North America. Global Ecology and Biogeography. 2021;30(10):1965–1977.
- 43. Brosse S, Charpin N, Su G, Toussaint A, Herrera-R GA, Tedesco PA, et al. FISHMORPH: A global database on morphological traits of freshwater fishes. Global Ecology and Biogeography. 2021;30(12):2330–2336.
- 44. Su G, Logez M, Xu J, Tao S, Villéger S, Brosse S. Human impacts on global freshwater fish biodiversity. Science. 2021;371(6531):835–838. pmid:33602854
- 45. Froese R, Winker H, Coro G, Demirel N, Tsikliras AC, Dimarchopoulou D, et al. A new approach for estimating stock status from length frequency data. ICES Journal of Marine Science. 2018;75(6):2004–2015.
- 46. Penone C, Davidson AD, Shoemaker KT, Di Marco M, Rondinini C, Brooks TM, et al. Imputation of missing data in life-history trait datasets: Which approach performs the best? Methods in Ecology and Evolution. 2014;5(9):961–970.
- 47. Stekhoven DJ, Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28(1):112–118. pmid:22039212
- 48.
Su G, Mertel A, Brosse S, Calabrese JM. Species invasiveness and community invasibility of US freshwater fish fauna revealed via trait-based analysis. bioRxiv. 2022. https://doi.org/10.1101/2022.03.04.481515.
- 49. Su G, Villéger S, Brosse S. Morphological sorting of introduced freshwater fish species within and between donor realms. Global Ecology and Biogeography. 2020;29(5):803–813.
- 50. Toussaint A, Brosse S, Bueno CG, Pärtel M, Tamme R, Carmona CP. Extinction of threatened vertebrates will lead to idiosyncratic changes in functional diversity across the world. Nature communications. 2021;12(1):5162. pmid:34453040
- 51. Carmona CP, Tamme R, Pärtel M, de Bello F, Brosse S, Capdevila P, et al. Erosion of global functional diversity across the tree of life. Science Advances. 2021;7(13):eabf2675. pmid:33771870
- 52. Paradis E, Blomberg S, Bolker B, Brown J, Claude J, Cuong HS, et al. Package ‘ape’. Analyses of phylogenetics and evolution, version. 2019;2(4):47.
- 53. Sidorov G, Gelbukh A, Gómez-Adorno H, Pinto D. Soft similarity and soft cosine measure: Similarity of features in vector space model. Computación y Sistemas. 2014;18(3):491–504.
- 54.
Reyes N, Connor R, Kriege N, Kazempour D, Bartolini I, Schubert E, et al. Similarity Search and Applications. vol. 13058. Springer Nature; 2021.
- 55. Xia P, Zhang L, Li F. Learning similarity with cosine similarity ensemble. Information Sciences. 2015;307:39–52.
- 56. Newman ME, Reinert G. Estimating the number of communities in a network. Physical review letters. 2016;117(7):078301. pmid:27564002
- 57. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment. 2008;2008(10):P10008.
- 58. Newman ME. Equivalence between modularity optimization and maximum likelihood methods for community detection. Physical Review E. 2016;94(5):052315. pmid:27967199
- 59. Traag VA, Waltman L, Van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports. 2019;9(1):1–12. pmid:30914743
- 60. Pagel M. Inferring the historical patterns of biological evolution. Nature. 1999;401(6756):877–884. pmid:10553904
- 61. Heino J, Alahuhta J, Ala-Hulkko T, Antikainen H, Bini LM, Bonada N, et al. Integrating dispersal proxies in ecological and environmental research in the freshwater realm. Environmental Reviews. 2017;25(3):334–349.
- 62. Bartomeus I, Gravel D, Tylianakis JM, Aizen MA, Dickie IA, Bernard-Verdier M. A common framework for identifying linkage rules across different types of interactions. Functional Ecology. 2016;30(12):1894–1903.
- 63. Slingsby JA, Verboom GA. Phylogenetic relatedness limits co-occurrence at fine spatial scales: evidence from the schoenoid sedges (Cyperaceae: Schoeneae) of the Cape Floristic Region, South Africa. The American Naturalist. 2006;168(1):14–27. pmid:16874612
- 64. Mazel F, Pennell MW, Cadotte MW, Diaz S, Dalla Riva GV, Grenyer R, et al. Prioritizing phylogenetic diversity captures functional diversity unreliably. Nature Communications. 2018;9(1):1–9. pmid:30038259