Figures
Abstract
We propose a novel similarity-based clustering approach to venture capital investors that takes as input the bipartite graph of funding interactions between investors and startups and returns clusterings of investors built upon 5 characteristic dimensions. We first validate that investors are clustered in a meaningful manner and present methods of visualizing cluster characteristics. We further analyze the temporal dynamics at the cluster level and observe a meaningful second-order evolution of the sectoral investment trends. Finally, and surprisingly, we report that clusters appear stable even when running the clustering algorithm with all but one of the 5 characteristic dimensions, for instance observing geography-focused clusters without taking into account the geographical dimension or sector-focused clusters without taking into account the sectoral dimension, suggesting the presence of significant underlying complex investment patterns.
Citation: Carniel T, Halloy J, Dalle J-M (2023) A novel clustering approach to bipartite investor-startup networks. PLoS ONE 18(1): e0279780. https://doi.org/10.1371/journal.pone.0279780
Editor: Zilin Gao, Chongqing Three Gorges University, CHINA
Received: August 23, 2022; Accepted: December 14, 2022; Published: January 5, 2023
Copyright: © 2023 Carniel et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data cannot be shared publicly because of third party ownership. The data underlying the results presented in the study are available through the Crunchbase Academic Research Access Program (https://about.crunchbase.com/partners/academic-research-access/) or by subscribing to the Crunchbase API (https://data.crunchbase.com/docs/using-the-api).
Funding: The authors received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Within the active field of entrepreneurship research [1], quantitative analyses of the structural properties of investor-startup interactions have been conducted so far on a simplified version of the investor-startup network, namely, on the network of investor-investor relationships, through the construction of syndication networks where two investors are linked if they either invested jointly in a startup or have a common startup in their portfolios [2–4]. The venture capital network, however, actually consists in investor nodes interacting with startup nodes through funding events that occur relatively sparsely and according to a sequence of so-called stages (Seed, Series A, Series B, Series C, etc.). As a consequence, investor-startup interactions could be and should be associated with a temporal bipartite network structure, of which the previously mentioned syndication networks are, in reality, one-mode projections, with valuable structural information being lost in this folding process [5]. These limitations are typically manifest when trying to address and account for the important and structural heterogeneity between investors: startup investors have marked differences, with respect to sectoral specialization, to the average amounts invested (from hundreds of thousands of dollars to hundreds of millions), or else to their geographical focus, to name but a few relevant dimensions. Ignoring this heterogeneity or failing to address it appropriately results in biased, if not misleading, conclusions, and certainly makes the observation and characterization of larger-scale collective phenomena with respect to entrepreneurial ecosystems and of their temporal dynamics an impossible task. Community detection algorithms [6, 7] have been applied to traditional syndication networks but have either failed to incorporate explicit information about investment stages [8], which typically results in overestimating actors who invest early in startups and are therefore linked to numerous subsequent investors according to syndication links, or have relied on a semi-supervised approach [9] that relies on ex ante and partly subjective and/or largely unavailable segmentation of investors, or else have been structurally limited by the definition of the networks studied: [10], using a modularity-based community detection algorithm, identifies communities of investors based on their interactions, but cannot do so based on their similarity and therefore are unable to address the heterogeneity of structural investors. Syndication networks, as one-mode projections, cannot capture the complex and multi-layered interactions characteristic of bipartite venture networks, and therefore relevant aspects of entrepreneurial ecosystems are lost.
More recent methods such as multi-view data clustering [11–13] are promising, but are not able to deal with our specific constraints: our data is fundamentally bipartite, with each of the views containing different types of data (numerical vs. categorical vs. logarithmic) that are either node-based or edge-based. Specific clustering algorithms incorporating domain-specific knowledge to cluster similar investors through their position and representation along the various axes of the complex bipartite multilayer multigraph are thus necessary in order to study investment dynamics in the investor-startup network.
New analytical tools are required to take advantage of the distinctive structure of these networks and to extract more information, associated with more complete datasets that would allow to build both sides of the bipartite networks and the interactions between them. Fortunately, the use of databases giving both large-scope and in-depth data on investor and startup companies and on their interactions is now rapidly becoming standard [14] while, following notably the ecological literature, methods for bipartite graph analysis have recently become more and more developed and accessible [15]. In this context where both tools and materials have become available, we initiate in this article an enriched analysis of interactions in entrepreneurial networks and ecosystems, with a direct look at the funding events rather than at the syndication shadow they project.
Objectives
We propose a novel, unsupervised investor clustering approach for entrepreneurial investors that mitigates some of the difficulties described earlier. It was developed both as a direct tool to probe and characterize the typology of actors in venture capital ecosystems and as a methodological building block with respect to the quantitative analysis of the dynamics of entrepreneurial ecosystems. Our method is based on an unsupervised community detection algorithm using a Hellinger-based similarity measure, computed over all pairs of investors, and accounting for 5 well-defined characteristic dimensions to describe investors. As a consequence, the similarity between investors is easily quantifiable and interpretable, compared to traditional clustering method based on machine learning techniques—and although significant progress has been made in terms of interpretability [16]. The similarity graph pruning threshold is the only parameter, and the number of outputted classes is freely determined by the clustering algorithm and is not constrained. As it happens, this method also allows for a controlled modification of the clustering parameters and features, which results in the identification of unexpected community-level patterns that help better understand the dynamics of the different classes of investors.
Materials and methods
Dataset
The dataset used for this study was extracted through the Crunchbase API on October 7th, 2020. It contains information on 1 156 085 startups (name, creation date, headquarter location, sectors of activity), 348 020 funding events (target startup, date, investors involved, amount, investment stage), 159 585 investors (name, creation date, investor type, investor location) and 1 067 089 individuals (name, past and current professional experiences, level and sectors of education, company board memberships and advisory roles). We removed the Software sector from all startups’ sectors of activity as this tag is overly represented (occurs in roughly 25% of startups, almost twice as frequent as the second most frequent tag) and is relatively non-descriptive.
Investor-startup network
We create a temporal bipartite multigraph where top nodes are the investors, bottom nodes are the startups and edges correspond to funding events between the investor and the startup (see Fig 1 for a schematic representation of the graph). As an investor can fund a startup at several points in time, two nodes can be linked through several temporal edges. We removed nodes for which the geographical information was not available and edges where the financing event was not an investment event (grants, debt financing, etc.), and afterwards removed isolated nodes as they do not take part in the network interactions studied. This process resulted in a network with 65 653 top nodes, 95 329 bottom nodes and 392 204 edges linking these two sets.
The red nodes on the left represent investor nodes, the blue nodes on the right represent startup nodes. The edges between investor node i and startup node s represent a funding interaction where investor i invested in startup s at a given time. As an investor can invest in a startup several times, multiple edges can connect two given nodes as shown on the figure.
Hellinger distance and investor similarity
The Hellinger distance h [17] and the associated similarity θ between two normalized discrete probability distributions P and Q are defined as:
(1)
(2)
where ‖.‖2 is the Euclidean (or L2) norm [18] and
is the vector with elements the square root of the elements of P. By definition, 0 ≤ h(P, Q)≤1 and thus 0 ≤ θ(P, Q) ≤ 1 with θ = 0 corresponding to minimal similarity (maximal distance) and θ = 1 to maximal similarity (minimal distance) between two distributions. The Hellinger distance is used as the probability distributions are low-dimensional and it has been shown to be more suitable than Minkowski distances for probability vector comparisons [19–21].
The similarity Θ between two investors and
is then defined as follows:
(3)
where
is the distribution characterizing investor a along the k-th dimension and n the total number of dimensions characterizing an investor.
Investor characterization
We characterize investors along n = 5 dimensions related to their investments in startups, each of which being associated with a frequency distribution, chosen in order to collectively exhaustively describe investment portfolios and therefore to allow to accurately characterize investors. Within the bipartite graph, these dimensions depend both on edges linking an investor to startups (for instance the date of the investment, as several different temporal edges can link an investor and a startup) or on the startup nodes (e.g. the geographical location of an investment made by investor i targeting startup s will be the geographical location of startup s). These characteristic dimensions can be measured for all investors, are public enough so that the information is available for most transactions and are linked to common descriptors used by practitioners of the domain to characterize investors (for instance early-stage vs. late-stage [22], domestic vs. international [23], specialized vs. generalist [24], historical vs. emergent [25]).
- Temporal investment distribution: the frequency of investments per year of the investor (Fig 2). This is an edge attribute.
- Geographical investment distribution: the frequency of investments of the investor in each country (an investor invests in a country if the target startup’s headquarters are located in the country) (Fig 3). This is a startup node attribute.
- Sectoral investment distribution: the frequency of investments of the investor in each sector of activity (an investor counts as investing in a sector if the target startup of the investment is labeled in this sector) (Fig 4). This is a startup node attribute.
- Stage investment distribution: the frequency of investments of the investor in each stage of the venture capital cycle (Fig 5). This is an edge attribute.
- Amount investment distribution: log-binned distribution of the funding amounts of all investments of the investor in USD (Fig 6). Logarithmic binning was used because the amounts of start-up financing rounds follow a power-law type distribution [26]. This is an edge attribute.
Temporal investment distribution of Softbank Capital (A), a telecom-focused US-based venture capitalist that stopped its activity in 2017, and of Y Combinator (B), a US-based startup accelerator founded in 2005. The two temporal patterns of actvitity are quite different between the two structures, as Softbank Capital stops investing near the end of the period whereas Y Combinator’s activity steadily grows throughout the whole period.
Geographical investment distribution of Softbank Capital (A), and Y Combinator (B). Only the top 4 target countries in terms of frequency of investment are labeled. Both structures heavily target US-based ventures.
Sectoral investment distribution of Softbank Capital (A) and Y Combinator (B). Only the top 8 sectors of investment are labeled. Softbank Capital shows a strong focus on IT-related ventures whereas Y Combinator shows a wider sectoral breadth.
Stage investment distribution of Softbank Capital (A) and Y Combinator (B). Softbank Capital shows a strong focus in late-stage investment (most of its investments are in Series B or later) whereas Y Combinator shows a very strong early-stage specialization (over 80% of its investments in Seed stage).
Amount investment distribution of Softbank Capital (A) and Y Combinator (B). In line with Fig 5, we see that Softbank Capital invests relatively high amounts (peak frequency of investment between 6 million USD and 10 million USD) whereas Y Combinator invests smaller amounts in a very systematic manner (peak frequency of investment between 80 000 USD and 200 000 USD). This is in line with the accelerator model where accelerators invest a set amount in all ventures they decide to support. Furthermore, Y Combinator has also developed funds such as Y Combinator Continuity dedicated to investing in its alumni companies after their initial investment. This can be seen in the small bump in the funding amount distribution between 700 000 USD and 10 million USD.
Self-difference index
For each community g and each year t in the period of study, the set of the top p sectors in terms of number of investment is computed. The self-difference index d ∈ [0, 1] between years t1 and t2 for community g is defined as follows:
(4)
where Δ is the symmetric difference between both sets and P is the total number of sectors. This self-difference index ranges from 0 (identical sets) to 1 (no overlap between the top p sectors of investment at year t1 and the top p sectors of investment at year t2). As there is a natural inflation in terms of number of investment rounds due to an increase in venture capital activity during the latter part of the period of study, the index takes into account the ordering of the sectors in terms of number of investments rather than the raw number of investments.
Results
Investor communities
Clustering.
We reduce the set of top nodes (investors) worldwide to top nodes with degree d ≥ 60 investments throughout the 1998–2019 period (a low number for a professional investor over this time frame) to ensure a sufficient number of observations for each dimension characterizing an investor. Note that the same clustering results hold for a graph reduced to investors with d ≥ 100 or more investments. This procedure results in 1014 investor nodes in the final graph with 159 353 edges connecting them to startup nodes, isolate nodes being removed (see previous section). We compute the pairwise similarity Θ as defined in Eq 3 between all investors in our sample and then define a complete weighted similarity graph with investors as nodes and the similarity between two investors as edge weights. We prune the graph by retaining for each investor the 1% edges with the highest similarity. We then run the best_partition community detection algorithm from the Python community package [27] resulting in an investor clustering with 11 different communities.
For each of the communities, a theoretical representative investor defined as the barycenter of the communities’ investors in the 5-dimensional probability space is computed: in each dimension, the distribution of the representative investor of a given community is the average of the distributions of all investors in the community. This representative investor allows for a compact visualization and understanding of each community, yielding some relevant understanding as to how the communities are formed. Fig 7 for instance shows the representative investor for community A6 and shows that investors in community A6 have an obvious China-focused geographical bias since over 84% of the cluster’s investments target China-based startups. As another example, Fig S11 in the S1 File shows a similar sectoral focus on Health Care-related investments in community A7, with around 27%, 30% and 26% of investments in Science and Engineering, Health Care and Biotechnology respectively.
Community A6 appears comprised of investors targeting China-based ventures during the second half of the 2010s with no clear sectoral specialization. Panel A shows the representative geographical investment distribution of community A6, panel B the distribution of the series of investment, panel C the temporal distribution of investments, panel D the distribution of the amounts of investment and panel E shows the sectoral distribution of investment.
Fig 8 shows the similarity graph pruned as described previously without (left) and with (right) the results of the clustering superimposed on the individual nodes. In light of these observations, we further characterize each of the resulting communities as described in column A of Table 1 by analyzing the representative investors of each of the 11 communities, which can be found in the S1 File (Figs S4-S14)), and referring also to the identity of individual investors in the clusters (see Table 2 for a sample of individuals from each cluster). We observe that each community corresponds to a strong and specific pattern: a specific geographical area of investment, a specific sector of investment, investing at specific startup development stages, or displaying a specific temporal pattern notably in relation to the 2008 financial crisis i.e. grouping investors that were either active throughout the whole period, or that belonged to older or newer generations of investors typically active either before or after the 2008 crisis.
Pruned similarity graph without (left) and with (right) community assignment of the nodes as characterized in column A of Table 1. The neon yellow community corresponds to China-focused venture capital firms (A6), the dark red community to India and Japan-focused venture capital firms(A10), the gold community to Health Care specialists (A7), the blue community (far left) to accelerators (A2).
Temporal evolution patterns.
Based on this investor clustering, Figs 9 and 10 reveal the temporal evolution of two communities in terms of target sectors of investment over the 2010–2019 period. Community A0, composed of general investors active over the whole period studied, typically shows a relatively slow evolution in terms of sectoral trends, with a gradual shift (Fig 9) in preferred sectors of investment towards so-called deeptech sectors (shift from sectors such as Media and Entertainement, Mobile towards sectors such as Science and Engineering, Health Care). Community A7, composed of health-care focused investors, shows a very strong dominance of Health Care-related sectors throughout the whole period (Fig 10A), but where the top 10 sectors have significantly evolved over the 10-year period of study (Fig 10B). A closer look at the non-health related sectors reveals a clear shift from Manufacturing and Hardware-related investments towards Data Science and Analytics and Artificial Intelligence-related investments, in line with the widespread adoption of these technologies in Health Care-related sectors during recent years [30].
Temporal community investment patterns of the target startups’ sectoral tags for each year aggregated at the community level. Community A0 is comprised of large, historical, rather late-stage focused venture capital firms. Panel A shows for each year the ten tags that received the most investments, panel B shows the community self-difference index described in Eq 4. We see a gradual but consequent shift in the target industries of community A0 throughout the period of study as evidenced in panel B, notably with the disappearance of relatively low-tech sectors such as the Mobile, Apps and Advertising sectors.
Temporal community investment patterns of the target startups’ sectoral tags for each year aggregated at the community level. Community A7 is comprised of Health Care-specialized venture capitalists. Panel A shows for each year the ten tags that received the most investments, panel B shows the community self-difference index described in Eq 4, with two markedly different areas of coherence, before and after 2014–2015.
Clustering factor analysis highlights underlying investment patterns
Since the 5 characteristic dimensions are based on domain knowledge, we ran the clustering algorithm 5 additional times, each time using only 4 of the 5 dimensions previously defined, computing the representative investors of all communities for each of these alternative clusterings in order to understand the characteristics of the new communities. Fig S21 in S1 File shows the representative investor of community B6 resulting from a clustering without the geographical investment dimension. Surprisingly, the community shows a strong focus on the Chinese startup market, with around 80% of all investments targeting China-based startups although the geographical dimension was not taken into account, therefore suggesting the existence of an underlying structure: the existence of an investment pattern according to the 4 other investment dimensions that is actually characteristic of investors investing mostly in China. Similarly, Fig S32 in S1 File shows the representative investor of community C7 resulting from a clustering without the sectoral dimension, but shows a community strongly focused on Health Care startups (around 17%, 18% and 15% of investments in Science and Engineering, Health Care and Biotechnology respectively) not unlike the community shown in Fig S11 in S1 File, even though sectors were not taken into account in this clustering.
Following these observations, we systematically investigate the bivariate distributions for all pairwise combinations for each alternative clustering, with the discrete bivariate distribution f of group g at coordinates (m, n) defined as:
(5)
where investor distribution k1 has dimension V and k2 has dimension W with group g being comprised of T investors.
Geographical.
Fig 11 shows the resulting bivariate distribution for all pairs of dimensions for community B6, here presented as heatmaps. It shows that B6 investors take part mostly in series A investments between $10M and $20M after 2015, which could correspond to a pattern characteristic of China-focused investors in our sample. For all bivariate distributions shown in Fig 11 (community B6) and Fig 12 (community A6), both communities display virtually identical behaviors: most likely due to this underlying investment pattern, taking into account the geographical dimension is not necessary to characterize this cluster despite its very strong geographical footprint.
This community corresponds to China-focused investors. Only the top 8 sectors and the top 4 countries in terms of frequency of investments are labeled for readability purposes.
This community corresponds to China-focused investors.
Sectoral.
Similarly, Fig 13 shows the resulting bivariate distribution for all pairs of dimensions for community C7. It shows that C7 investors invest mainly in series B rounds between $20M and $50M in North American ventures, which appears to be an investment pattern for investors specialized in Health Care in our sample. Fig 14 shows community A7 resulting from the complete clustering. Figs 13 and 14 show a strong agreement in terms of Series and Amounts of investments but still display slight differences as community A7 has been active for a longer time than community C7. We therefore observe different generations of Health Care-focused investors with the newer generations associated with a wider scope of investment in terms of sectors. These new investors tend to invest in Health Care-oriented companies with a stronger IT component in the latter part of the 2010s (see Fig 13), a pattern not found in Fig 14. This suggests that the current shift in Health Care venture funding (linked notably to the use of Artificial Intelligence solutions) could on a global level not be the result of a shift of focus of traditional Health Care-focused investors but rather the outcome of the emergence of a new group of investors in the domain.
This community corresponds to a Health Care-focused community of investors. Only the top 8 sectors in terms of total number of investments and the top 4 countries of investment are labeled for readability purposes.
These distributions correspond to Health Care specialists.
Temporal.
Again in a similar manner, and analyzing this time the clustering computed without the temporal dimension, Fig S43 in S1 File shows the representative investor of community D7, associated with a very specific temporal pattern of investment that appears markedly similar to community A1 from the complete clustering (see Fig S5 in S1 File), even though the temporal dimension was excluded in the case of D7. This observation therefore again suggests the existence of underlying investment patterns associated with investors. Here, historical, older generation investors appear to have been clustered together independently of their temporal activity, and rather on the basis of a qualitatively specific investment pattern that differs from those of newer generation venture capital firms.
Conclusion
In this article, we approached investors through clustering methods in order to help us and fellow researchers make a better sense of the “venture capital community”, perhaps in the sense of advocating for the end of their analysis as that of an homogeneous community. We thus described a novel approach to quantitatively group startup investors based only on the characteristics of their investments, as gathered from a bipartite investor-startup network. This clustering approach results in interpretable and homogeneous subgroups of investors with markedly different profiles, which we hope could prove helpful for the community of researchers interested in studying venture capital communities and networks by allowing them to differentiate among venture capitalists. In that sense, “the” venture capital community, as often referred to, might actually be composed of several venture capital communities whose investment behaviors and in particular whose co-investment behaviors might considerably differ. As a consequence, we would plead for some of the literature on venture networks to be assessed again on each of the venture communities separately, for instance with respect to the relationship between network position and centrality and the profitability of venture investments.
In addition, and by allowing the conditions under which investors are clustered according to our approach to vary, notably by reducing the number of characteristic dimensions taken into account, we were able to observe the presence of relatively surprising underlying and robust investment patterns characteristic of certain clusters of startup investors. For instance, the fact that some investors specialize as Health Care specialists seems to have consequences with respect to their other investment patterns notably in terms of funding amounts or funding rounds: we did observe a cluster of Healh Care-focused investors even when the sectoral dimension was not accounted for in the clustering. Similarly, the fact that some investors focus on investments in China also results in the existence of patterns with respect to their investment behaviors, once again in terms of funding amounts and funding rounds in particular: we indeed observed a cluster of investors focused on China even when the geography of investments was not taken into account. From a research point of view, these observations raise the issue of whether they would be the result of a behavioral phenomena or rather market outcomes. More broadly, the existence of such underlying patterns could also result in modifying how financial actors directly interpret and evaluate opportunities, compared then to such benchmarks.
Furthermore, similar underlying investment patterns were also observed to characterize different generations of investors, notably in relation to the 2008 financial crisis. We notably observed a cluster of investors mostly active before the 2008 crisis even when the temporal distribution of their investments was not taken into account. In our sample, this observation is particularly striking with respect to the aforementioned crisis, but we also observed preliminary evidence of a similar phenomenon in the case of Health Care focused investors with 2014 as a breaking point, which we can relate to the significant increase in startup investment activity that occurred around that date. Altogether, and adding also that the cluster of so-called accelerators (A2) also corresponds to a completely new “species” of investors that appeared in the late 2000s, these preliminary observations might suggest a mechanism that would evoke the notion of speciation in ecology: whenever the “financial environment” would change, newer “species” of investors could appear in an evolutionary way, by seizing the newer opportunities offered by the new environment, while existing investors might either adapt or stay locked in their previous patterns even though these patterns might eventually not represent an adaptive advantage in a new financial environment. Rather than simply suggesting an evolutionary perspective, these observations could also shed more light on the determinants of success for so-called “Limited Partners” [31], i.e. investors in venture capital funds, by potentially providing a supplementary explanation of why returns would differ systematically across limited partners [32]. They could also provide limited partners and other actors in the finance community themselves with a new understanding of the dynamics of innovation in the venture capital market.
References
- 1. Chandra Y. Mapping the evolution of entrepreneurship as a field of research (1990–2013): A scientometric analysis. PloS one. 2018;13(1):e0190228. pmid:29300735
- 2. Gu W, der Luo J, Liu J. Exploring small-world network with an elite-clique: Bringinghdb embeddedness theory into the dynamic evolution of a venture capital network. Social Networks. 2019;57:70–81.
- 3. Sorenson O, Stuart TE. Syndication networks and the spatial distribution of venture capital investments. American journal of sociology. 2001;106(6):1546–1588.
- 4. Hochberg YV, Ljungqvist A, Lu Y. Whom you know matters: Venture capital networks and investment performance. The Journal of Finance. 2007;62(1):251–301.
- 5. Zhou T, Ren J, Medo M, Zhang YC. Bipartite network projection and personal recommendation. Physical review E. 2007;76(4):046115. pmid:17995068
- 6. Fortunato S. Community detection in graphs. Physics reports. 2010;486(3-5):75–174.
- 7. Rodriguez MZ, Comin CH, Casanova D, Bruno OM, Amancio DR, Costa LdF, et al. Clustering algorithms: A comparative approach. PloS one. 2019;14(1):e0210236. pmid:30645617
- 8. Jin Y, Zhang Q, Li SP. Topological properties and community detection of venture capital network: Evidence from China. Physica A: Statistical Mechanics and Its Applications. 2016;442:300–311.
- 9. Xiong H, Fan Y. How to Better Identify Venture Capital Network Communities: Exploration of A Semi-Supervised Community Detection Method. Journal of Social Computing. 2021;2(1):27–42.
- 10. Bubna A, Das SR, Prabhala N. Venture capital communities. Journal of Financial and Quantitative Analysis. 2020;55(2):621–651.
- 11. Wang H, Yang Y, Li B, Fujita H. A study of graph-based system for multi-view clustering. Knowledge-Based Systems, 2019, vol. 163, p. 1009–1019.
- 12.
Li L, He H. Bipartite graph based multi-view clustering. IEEE Transactions on Knowledge and Data Engineering, 2020
- 13. Yang Y, Wang H. Multi-view clustering: A survey. Big Data Mining and Analytics 1.2 (2018): 83–107.
- 14. Dalle JM, Den Besten M, Menon C. Using Crunchbase for economic and managerial research; 2017.
- 15. Coscia M. The Atlas for the Aspiring Network Scientist; 2021.
- 16. Molnar C. Interpretable Machine Learning; 2019.
- 17.
Deza MM, Deza E. Dictionary of distances. Elsevier; 2006.
- 18.
Knapp AW. Basic real analysis. Springer Science & Business Media; 2005.
- 19.
Legendre P, Legendre L. Numerical ecology. Elsevier, 2012.
- 20. Sohangir S, Wang D. Improved sqrt-cosine similarity measurement. Journal of Big Data, 4(1), 1–13.
- 21.
Zhu S, Lizhao L, Wang Y. Information retrieval using Hellinger distance and sqrt-cos similarity. 7th International Conference on Computer Science & Education (ICCSE) (pp. 925-929). IEEE 2012.
- 22. Elango B, Fried VH, Hisrich RD, Polonchek A. How venture capital firms differ. Journal of business venturing. 1995;10(2):157–179.
- 23. Devigne D, Vanacker T, Manigart S, Paeleman I. The role of domestic and cross-border venture capital investors in the growth of portfolio companies. Small Business Economics. 2013;40(3):553–573.
- 24. Hochberg YV, Mazzeo MJ, McDevitt RC. Specialization and competition in the venture capital industry. Review of Industrial Organization. 2015;46(4):323–347.
- 25. Drover W, Busenitz L, Matusik S, Townsend D, Anglin A, Dushnitsky G. A review and road map of entrepreneurial equity financing research: venture capital, corporate venture capital, angel investment, crowdfunding, and accelerators. Journal of management. 2017;43(6):1820–1853.
- 26. Crawford GC, Aguinis H, Lichtenstein B, Davidsson P, McKelvey B. Power law distributions in entrepreneurship: Implications for theory and research. Journal of Business Venturing. 2015;30(5):696–713.
- 27. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment. 2008;2008(10):P10008.
- 28. Cohen S, Hochberg YV. Accelerating startups: The seed accelerator phenomenon. 2014;.
- 29. Cohen S, Fehder DC, Hochberg YV, Murray F. The design of startup accelerators. Research Policy. 2019;48(7):1781–1797.
- 30. Waymel Q, Badr S, Demondion X, Cotten A, Jacques T. Impact of the rise of artificial intelligence in radiology: what do radiologists think? Diagnostic and interventional imaging. 2019;100(6):327–336. pmid:31072803
- 31. Lerner J, Schoar A, Wongsunwai W. Smart institutions, foolish choices: The limited partner performance puzzle. The Journal of Finance, 62(2), 731–764.
- 32. Cavagnaro D, Sensoy B, Wang Y, Weisbach M. Measuring institutional investors’ skill at making private equity investments. The Journal of Finance, 74(6), 3089–3134.