Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Comparative analysis of technological fitness and coherence at different geographical scales

  • Matteo Straccamore ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Validation, Visualization, Writing – original draft

    matteo.straccamore@cref.it

    Affiliations Centro Ricerche Enrico Fermi (CREF), Rome, Italy, Sony Computer Science Laboratories Rome, Joint Initiative CREF-Sony, Centro Ricerche Enrico Fermi, Rome, Italy

  • Matteo Bruno,

    Roles Conceptualization, Formal analysis, Investigation, Writing – review & editing

    Affiliation Sony Computer Science Laboratories Rome, Joint Initiative CREF-Sony, Centro Ricerche Enrico Fermi, Rome, Italy

  • Andrea Tacchella

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Supervision, Writing – review & editing

    Affiliation Centro Ricerche Enrico Fermi (CREF), Rome, Italy

Abstract

Debates over the trade-offs between specialization and diversification have long intrigued scholars and policymakers. Specialization can amplify an economy by concentrating on core strengths, while diversification reduces vulnerability by distributing investments across multiple sectors. In this paper, we use patent data and the framework of Economic Complexity to investigate how the degree of technological specialization and diversification affects economic development at different scales: metropolitan areas, regions and countries. We examine two Economic Complexity indicators. Technological Fitness assesses an economic player’s ability to diversify and generate sophisticated technologies, while Technological Coherence quantifies the degree of specialization by measuring the similarity among technologies within an economic player’s portfolio. Our results indicate that a high degree of Technological Coherence is associated with increased economic growth only at the metropolitan area level, while its impact turns negative at larger scales. In contrast, Technological Fitness shows a U-shaped relationship with a positive effect in metropolitan areas, a negative influence at the regional level, and again a positive effect at the national level. These findings underscore the complex interplay between technological specialization and diversification across geographical scales. Understanding these distinctions can inform policymakers and stakeholders in developing tailored strategies for technological advancement and economic growth.

1 Introduction

Economic Complexity (EC) is a concept that has gained significant attention in the field of economics in recent years [15]. It refers to the idea that the productive capabilities of countries play a crucial role in determining their economic development and competitiveness in the global market. The concept was initially proposed to argue that traditional measures of economic performance, such as GDP or trade balances, fail to capture the complexity and sophistication of an economy. At its core, Economic Complexity is based on the notion that the productive structure of an economy is not solely determined by the availability of resources, but also by the diversity and interconnectedness of its industries and technological capabilities [4]. Geographical entities such as countries, regions and metropolitan areas that possess a diverse range of industries and a high level of technological know-how are found to have a higher level of Economic Complexity. The importance of EC lies in its potential to explain the long-term economic growth and the development of nations [2,3], which brought important institutions such as the World Bank [6] and the European Commission [7,8] to adopt the methods and tools of the EC.

One of the notable tools in the field of EC is the Fitness and Complexity (EFC) algorithm [3]. This iterative algorithm can be employed to analyze bipartite networks, i.e. systems with two different types of actors interacting between them, and was originally developed to assess a country’s export capacity (Fitness) and measure the complexity of the exported product (Complexity). More tools based on networks [9,10] or machine learning [1114] have been developed to leverage the concept of similarity and predict a country’s future range of exportable products. The idea underlying these studies is that capabilities possessed by individual countries are reflected in their ability to export products that are similar to each other: when sharing a relatively large number of capabilities products are likely to be exported together [15]. Fitness is a generalized measure of the diversification of an economic actor. In particular, it weights diversification by the complexity of its economic activities. In turn, complex activities are defined as those only performed by actors with high Fitness. Fitness is considered a proxy for the Economic Complexity of an economic actor, namely its ability to combine a large set of endowments to produce complex outputs. This measure is an extremely effective predictor of economic performance [6].

On the other hand, Coherence is a second key measure presented by Economic Complexity. Introduced by Pugliese et al. [16], it quantifies the average distance between the economic activities in which an actor is involved. This distance is measured on a similarity network that captures the relationships between economic activities in terms of their underlying capability requirements [10,16,17]. Compared to Fitness, which is more related to the diversification of an economic actor, Coherence is a measure of how much more specialized the actor is in a given economic activity.

EC is not solely based on production: patent data and technology innovation can also be employed to measure innovation activity. The literature extensively acknowledges the use of patent data for tracking technological advancements [1820]. Pugliese et al. [21] further demonstrated that technological metrics serve as superior predictors for industrial and scientific output in the forthcoming decades. In addition, the increased availability of geolocalized patent data have transformed patents into a crucial resource for studying technological evolution [22,23]. Finally, a key feature of patent documents significant for economic analysis is the inclusion of classification codes, which categorize inventions into specific technological sectors [2426]. These classification systems provide detailed descriptions of technological fields, facilitating targeted analyses within particular domains. However, using patents as a proxy for innovation measurement has notable limitations [27]: for instance, only a subset of all patents possesses a significant market value [28], and sometimes aggregated patent statistics can misrepresent economic and inventive activities [29]. Moreover, patents do not represent all facets of knowledge production within the economy [19], do not encompass all knowledge generated [30] and do not represent all economic sectors [31,32].

The EC approach was used with technology codes on different geographical scales, analyzing firms [16,3336], metropolitan areas [17,3741], and regions [4249]. Despite their wide application, a comparison of the insights provided by EC instruments at different geographical scales is still lacking. This study contributes to the literature on Economic Complexity by comparing for the first time, to the best of our knowledge, the heterogeneous relations between Economic Complexity indicators and growth at different geographical scales.

We utilize the EFC algorithm to compare technological outputs at different geographic scales, including metropolitan areas (MAs), regions, and countries (from now on, unless specified, by the term entity we will mean without change in meaning, either MAs or regions or countries). We aim to compare the correlation levels between Technological Fitness (F) and Technological Coherence (Γ) [16,17] with economic growth, measured by the GDP per capita (GDPpc) of these entities. In our results, Fitness and Coherence display a rich and heterogeneous correlation structure, which has, to the best of our knowledge, not been studied extensively in the literature. Here we present a first attempt to assess their joint impact on economic growth, and how this relationship changes with the scale of the entities considered. We find that they play different roles, with different scaling relations. Previous research [17] has shown that Γ plays a more significant role in predicting GDP per capita growth for MAs. In this paper, we emphasize this finding by examining the importance of different features. Additionally, as we expand our analysis to regions and countries, we observe how Γ has a different impact on the economic growth of all three geographical scales. In particular, we find that Coherence has a positive effect only at the MA level, but its impact turns negative at the regional scale, a finding that becomes even more pronounced at the national level. The results for Fitness are less clear-cut: it exhibits a U-shaped relationship with a positive association with growth at the MA level, a negative correlation with growth at the regional scale, and again a positive correlation at the national level. By comprehending the distinct roles played by Technological Fitness and Coherence in various contexts, policymakers and stakeholders can develop customized strategies to promote technological progress and regional growth.

This paper is organized as follows. In Sect 2, we describe the data sources: patent and technology codes, GDP per capita information and metropolitan areas and region boundaries data. Sect 3 outlines the methodological framework, including the construction of bipartite networks, the implementation of the Fitness and Complexity algorithm, and the computation of Technological Coherence. In Sect 4, we present our empirical results, highlighting the distinct roles of Technological Fitness and Coherence across different geographical scales. Finally, Sect 5 concludes the paper and summarizing the key contributions of the research.

2 Data

Technology codes

In this research, we employed the PATSTAT database (available at www.epo.org/searching-for-patents/business/patstat) as our main source for patent and technology code data. This database contains about 100 million patents registered across 100 global Patent Offices. Each patent is uniquely identified and associated with multiple Cooperative Patent Classification (CPC) codes [50]. The World International Patent Office (WIPO) uses the CPC system, which is more detailed than the previously used International Patent Classification (IPC) system [24]. The CPC provides a hierarchical classification scheme that includes sections, classes, subclasses, and groups, facilitating fine-grained categorization of patents. The first level of CPC codes indicates broad technology categories, such as “Chemistry; Metallurgy” under the code C or “Electricity” under the code H. Subsequent levels further specify the technology, for instance, “Inorganic Chemistry” under C01 or “Organic Chemistry” under C07. Each patent is also assigned a filing date based on its initial submission.

For the geolocation of patents, we used the De Rassenfosse et al. database [23], which includes data on 18 million patents spanning from 1980 to 2014 and provides precise geographical coordinates for each patent, enabling accurate geolocation. In our analysis, each patent is linked with a unique identifier, a set of CPC codes, and geographic coordinates that identify the corresponding MA, region, and country. The De Rassenfosse et al. database also includes the country ISO code, which aids in constructing the country-technology bipartite graph. Additional details about the importance and features of the De Rassenfosse et al. database are discussed in the Supplementary Information.

GDP per capita

To retrieve data on the Gross Domestic Product (GDP) per capita for countries, we utilized the World Bank’s comprehensive database, World Development Indicators (WDI), accessible at data.worldbank.org. The WDI provides extensive GDP information for countries worldwide.

Regarding the GDP per capita of metropolitan areas (MAs) and regions, we referred to the work of Kummu et al. [51]. Their study aimed to create high-resolution global datasets for GDP and Human Development Index (HDI) spanning the period from 1990 to 2015. By integrating various data sources such as national accounts, satellite imagery, and geospatial datasets, the authors developed gridded datasets that accurately portray the spatial distribution of GDP and HDI indicators at a fine resolution. Statistical techniques and modelling approaches were employed to estimate GDP and HDI values for areas lacking direct measurements. The Kummu et al. database is all in 2011 international U.S. dollars at purchasing power parity.

Metropolitan areas and regions boundaries

In order to calculate the Gross Domestic Product per capita (GDPpc) and link patents to specific metropolitan areas (MAs) and regions, it was necessary to have access to the geographical boundaries of these entities. Regarding MAs, we downloaded the boundaries from the Global Human Settlement Layer (GHSL) [52]. The GHSL provides global spatial information on human settlements, including the boundaries of metropolitan areas. The GHSL dataset incorporates satellite imagery, census data, and other sources to map urban areas and classify them into different settlement types. It offers valuable information for various research and planning applications, including urban studies, infrastructure development, and environmental analysis.

Regarding the boundaries of the regions, we downloaded them from the Global Administrative Areas (GADM) [53] website. GADM provides administrative boundary data for regions, countries, and other administrative divisions worldwide. It is a reliable and widely used resource for accessing shapefiles or spatial data of administrative boundaries. The GADM database contains detailed and up-to-date boundary information for regions across the globe. It is a valuable tool for various applications, including geographic analysis, research, and mapping. The database offers different administrative levels, allowing users to access region-level boundaries as well as higher or lower administrative divisions depending on their needs. We selected the administrative level 1 of the GADM database, the first subdivision below the country level.

After obtaining the boundaries, we are able to associate the patents and calculate the GDPpc for each of them. Starting from the patents’ geographical coordinates, we can select all those that fall within a specific boundary and, consequently, which technology codes. To compute the GDP per capita of each MA and region, we consider the GDP grid in one year, we compute the GPD per capita of a MA or region as the average of all the grid points within its boundaries.

3 Methods

Bipartite networks construction

After collecting the data, we construct bipartite networks linking metropolitan areas (MAs), regions, or countries with technology codes. These networks are represented by bi-adjacent rectangular matrices, , where each element is set to 1 or 0. This indicates whether the presence of a technology code t in patents filed by entity e is statistically significant for the year y.

To build the matrices, we employed an innovative methodology involving statistical tests to validate the number of technologies present in an entity’s patents. We start with two preliminary matrices: , linking entities e to their patents p, and , linking patents to their corresponding technologies t. In these matrices, (or 0) indicates whether patent p is made by entity e, and Bp,t = 1 (or 0) if patent p includes technology code t. Given all patents belonging to an entity e, we count the number of observations of each technology, i.e. the number of times technologies t fall into the entity e for all p associated with e. We validate these counts by comparing them against an ensemble of 1000 matrices generated using the Bipartite Configuration Model (BiCM) [54,55], thus obtaining a matrix of p-values, . We compute the BiCM by using the NEMtropy Python package (https://github.com/nicoloval/NEMtropy) [56]. Each element, , represents the p-value associated with the link between entity e and technology t for year y. In Supplementary Information, the construction process for matrices is dissed in more detail.

Initially, there were 8641 MAs listed in the Global Human Settlement Layer [52], covering the entire globe. To minimize statistical fluctuations, we keep the top 80% of entities by number of patents. In particular, we computed the mean diversification corresponding to the top 80% entities for each year, and selected the diversification threshold as the mean during the years for each entity. Moreover, we excluded technologies appearing in fewer than 100 patents. This adjustment reduced the number of patent-producing MAs to 1847. Additionally, for some of the MAs and regions, the computation of the GDPpc is not possible due to their small dimensions. In particular, this appears for 277 MAs and 8 regions. Consequently, the network finally comprises 1570 MAs, 738 regions, and 46 countries, linked to 621 distinct technology codes.

To account for the volatility of year-to-year data, we consider a rolling window of five years for constructing both and . Thus, in this study, and refer to five-year periods, ranging from 1980–1984 to 2010–2014. The dataset comprises 31 in such five-year window matrices. This choice is also intended to mitigate the influence of short-term fluctuations and transitory shocks (e.g., economic downturns or external crises), allowing for more robust estimation of long-term structural relationships.

Lastly, we binarize using a p-value threshold of 0.05 as follows:

Fitness and complexity algorithms

The Fitness and Complexity (FC) framework [3], introduced in 2012, provides a method for quantifying the competitiveness (Fitness) of a country’s economy. In this study, we apply the FC framework to quantify the Technological Fitness of metropolitan areas (MAs), regions, or countries, considering patent data exclusively. The iterative process for determining these quantities is as follows:

(1)

Here, in each iteration step n, the quantities are normalized:

(2)

The initial conditions are set as , and . Extensive studies have been conducted on the convergence of this algorithm [57]. In our case, we compute and for each 5-year window y using the bi-adjacency matrices . The iteration process is stopped when there is no further change in the Fitness ranking of entities. The rationale behind this approach is as follows: a technology developed in an already advanced entity provides limited information about the complexity of the technology itself since advanced entities produce a large proportion of technologies. In contrast, a technology exported by an underdeveloped entity is likely to possess a lower level of sophistication. Hence, an entity’s technological competitiveness can be measured based on the complexity of its technologies. However, a different approach is required to assess technology quality. Fitness Fe is proportional to the sum of technologies, weighted by their complexity Qt. Intuitively, the Complexity of a technology is inversely proportional to the number of entities that have implemented it. If an entity has a high Fitness value, it should contribute less to limiting the complexity of the technology, while entities with low Fitness values should strongly influence Qt.

Coherent diversification

Previous studies have highlighted the significance of coherence in production and innovation diversification as a key driver of productivity [58,59]. Thus, to gain a better understanding of the performance of our entities based on their technology portfolio, we examine their coherent diversification [16]. The central question is whether the accumulation of knowledge and capabilities associated with a coherent set of technologies leads metropolitan areas (MAs), regions, or countries to experience greater benefits in terms of GDP per capita. Coherent diversification is defined as the coherence of a technology field t concerning the technology basket of entity e:

(3)

where M is the bipartite adjacency matrix between MAs, regions, countries, and technologies. represents the similarities matrix between pairs of technologies and is computed with the same statistical method used to build M with the same p-value threshold 0.05. However, for the computation, we used only the patent-technology matrices for each 5-year window. To have a clear interpretation, we statistically validated how many times two technologies t and appear in different patents.

For each technological field t and each entity e, we count the number of technologies adopted by e that are connected to t using the . If the technological portfolio of entity e consists of numerous strongly connected technologies surrounding t, then t will exhibit high coherence to e, resulting in a high value of . Conversely, if t belongs to a portion of the technology network that is distant from the patenting activity of entity e, will be low. It is important to note that has the same dimensions as M, with its elements quantifying the coherence of a technology t to the technology basket of entity e. Finally, we can calculate the Technological Coherence of entity e [16] as follows:

(4)

where represents the diversification of entity e. The Technological Coherence () computes the average coherence γ of the technologies in which entity e is engaged in patenting activities. Compared to the original definition [16], we have divided the following quantity twice by de. The reason is related to the square dependence of de on diversification. One contribution comes from M in the definition of , the second from M in Γ. By dividing by , we eliminate this dependence and are better able to measure the coherence of an entity e.

In Fig 1, we present a pictorial representation of how Technological Fitness and Coherence work. In the upper panel, the calculation of the Fitness and Complexity algorithm is presented. The Fitness of entity e1 represents the cumulative technological Complexity of all technologies developed by e1. As for Complexity, technology t4 is considered highly complex because it is pursued by entities with high Fitness. Conversely, technology t1 is deemed to have low complexity since it is undertaken by e4, which possesses low Fitness. Thus, entities e1, e2, and e3 can be viewed as technologically sophisticated, suggesting that if only they are involved in producing t4, it underscores the complexity of the technology. On the other hand, e4 lacks comparable technological advancement. Therefore, its ability to produce t1 alongside other entities indicates that t1 is less complex. The lower panel elucidates the concept of Technological Coherence. Both diagrams offer a streamlined depiction of the technological network S. Each node corresponds to a technology and the connections show whether there is significant similarity between the technologies, reflecting whether or not their development requires similar capabilities. The technologies attributed to entities e1 and e2 are shown as coloured nodes. Entity e1 exhibits a high Coherence value for t1 due to the closely related technologies within its portfolio. In contrast, the portfolio of e2 features unconnected technologies, resulting in a lower Coherence value.

thumbnail
Fig 1. Technological Fitness and Coherence explanation. Schematic representation of the workings of the Fitness and Complexity algorithm and Coherence diversification.

In the top panel, we show the computation of the Fitness and Complexity algorithm. The Fitness of the entity e1 is the technological Complexity sum of all the technology done by e1. Regarding the Complexity, for t4 it is high because entities do it with high Fitness. Instead, t1 has a low Complexity because is done by e4 which has a low Fitness. In other words, we can see e1, e2 and e3 as technologically advanced entities, so if t4 is made only by these it means that it is a technology that is, indeed, complex. Unlike the previous ones, e4 does not have the same technological development, and since this one can produce t1 together with the other entities it implies that t1 has low Complexity. At the bottom, we explain the idea of Coherence diversification. Both figures offer a simplified representation of the network of technologies S. Each node is a technology and the links quantify if technologies are connected, i.e., if they are similar and if the capabilities required to develop both are approximately the same. The coloured nodes are the technologies made by entity e1 and e2, respectively. Entity e1 has a high Coherence value concerning t1 since its technology portfolio has very closely related elements. In contrast, e2 has a portfolio with disconnected technologies and consequently has a low Coherence value.

https://doi.org/10.1371/journal.pone.0329746.g001

In Fig 2, we plot the World and Europe maps colouring each country in our database or European region according to the mean technology’s Fitness and Coherence from y0 = 2005 to y1 = 2010 values. We can note that the two measures are not directly related. For example, China has high Coherence and low Fitness. In contrast, at the European level, regions in the West have high Fitness while those in the East have high Coherence. As an example, we report some of the domains in which China and Eastern European regions demonstrate high Coherence. In our investigation, we observed significant Coherence in China’s technological production, particularly in the F21 (Lighting) and H04 (Electric Communication Technique) categories. Additionally, the regional analysis highlighted several areas of noteworthy Coherence. For instance, the Karlovarský region in the Czech Republic demonstrated high coherence in the production of technologies within the E01 (Construction of Roads, Railways, or Bridges) category, as well as in the Physics and H04 categories. Similarly, Gävleborg in Sweden exhibited coherence across diverse sectors, including Personal or Domestic Articles, Health; Life-Saving; Amusement, Shaping, Metallurgy, and Earth or Rock Drilling; Mining. In Finland, the Oulu region showed pronounced coherence in H03 (Basic Electronic Circuitry) and H04. Notably, it also displayed strong production capabilities in A01 (Agriculture; Forestry; Animal Husbandry; Hunting; Trapping; Fishing), A23 (Foods or Foodstuffs), A61 (Medical or Veterinary Science; Hygiene), along with Chemistry and Physics categories.

thumbnail
Fig 2. Mean Technological Fitness and Coherence in the world and European regions.

We compute the mean Technological Fitness and Coherence from y0 = 2005 to y1 = 2010 considering the country case ( (a) for Fitness and (b) for Coherence) and the European regions case ( (c) For Fitness and (d) for Coherence). We highlight how the two measures are not directly related. For example, China has high Coherence and low Fitness. In contrast, at the European level, regions in the West have high Fitness while those in the East have high Coherence.

https://doi.org/10.1371/journal.pone.0329746.g002

4 Results

In this section, we present our findings on the impact of technological Fitness and Coherence on GDP growth at different scales. We quantify economic growth as the Compound annual growth rate (CAGR) of GDPpc from y1 to . CAGR is defined as:

(5)

In this Section, we present the results following this scheme:

  • Initial Overview: We begin by presenting a pairwise correlation analysis between Fitness, Coherence and CAGR. This analysis provides a qualitative description of these interactions and demonstrate non-trivial and heterogeneous correlation patterns. These findings are presented in Fig 3.
  • Feature Importance Analysis: In this second step we consider the impact of Fitness and Coherence in the prediction of future CAGR, in a non-linear multivariate setting, with multiple controls. The results of this analysis are presented in Figs 4 and 5
thumbnail
Fig 3. Technological Fitness VS Technological Coherence to evaluate GDPpc growth.

We compute the mean Technology Fitness and Coherence for y0 = 2005 and y1 = 2010. The left panels are the scatter plot of the previous values and the colour bar is the CAGR(y0,y1). In the right panels, we calculate the means and the standard deviation of the mean of the CAGR(y0,y1) weighting the points with a Gaussian Kernel centred in 200 parts between the minimum and maximum value of both Fitness and Coherence of the points in the left panels. In all three cases, we observe the same trend: the Fitness curve tends to decrease, less evident in region case where it has a concave behavior. The Coherence has a convex one, more evident in the case of MAs and less so in the case of countries. In addition in regions and MAs, we highlight two distinctive trends that could suggest two types of economic development.

https://doi.org/10.1371/journal.pone.0329746.g003

thumbnail
Fig 4. SHAP values feature importance.

The feature importance of Fitness, Coherence, Diversification, starting GDPpc, and geographic and time information is depicted by each pair of bars. We notice how the feature GDPpc(y1) is the most significant followed by year and country geographic information. We incorporated them solely to control the behaviours of Fitness and Coherence. A focused comparison of these two metrics is illustrated in the main Figure. We find that Fitness and Coherence relative importance switch from MA to the country in explaining the trends in CAGR. However, we do not yet know whether this importance translates into a positive or negative effect.

https://doi.org/10.1371/journal.pone.0329746.g004

thumbnail
Fig 5. SHAP values VS F and Γ feature value.

We plot the trends of SHAP values for the Fitness (F) and Coherence (Γ) features against their respective values across three geographical scales. Each bar in the three couples of figures represents a decile of the dataset, ensuring an equal number of samples per bin. Values below/above zero (marked by the black dotted line) negatively/positively impact the model. In addition, we include a weighted linear fit in each plot, reporting the corresponding and p-value. MA case: We performed an additional fit by removing the first two deciles to better highlight the increasing trend and the growing importance of Fitness. This was not applied to other plots due to the absence of similar behaviors. While the Γ trend is not statistically significant, we note how high Coherence has a strong positive impact on the economic growth of MAs. Region case: Both features exhibit a statistically significant downward trend. Notably, the last decile in the F plot has a substantial negative impact on economic growth. Country case: F shows an increasing but statistically not significant trend. However, it is important to note that our analysis focuses exclusively on highly developed countries. This limitation may reduce the observed role of Fitness in driving economic growth at the national level, as also highlighted in the literature [6]. Conversely, Γ displays a strong and statistically significant negative trend, underscoring the importance of economic diversification rather than specialization in a narrow range of activities. Another key aspect of these findings that we aim to emphasize is that these measures are highly heterogeneous. Because of this, regressions are not the most appropriate strategy for dealing with these kinds of scenarios [74].

https://doi.org/10.1371/journal.pone.0329746.g005

4.1 Initial overview

As an initial step, we want to highlight the relation between Fitness, Coherence and CAGR. To do this, we decided to present a snapshot of these relations in the period 2005-2010. In Fig 3, we plot the mean Fitness and Coherence for y0 = 2005 and y1 = 2010 as a function of CAGR(2005,2010). Panels a, b, c are respectively for MA, region, and country. In all three cases, we observe a similar trend: the Fitness curve tends to decrease, less evident in region case where it has a concave behavior. The Coherence has a convex one, more evident in the case of MAs and less for countries. We observe a decreasing trend between Fitness and GDP per capita growth (CAGR) in multiple cases. This suggests that entities with higher Fitness in a given period may experience, on average, lower relative growth in the subsequent years. This means that entities with very high levels of Fitness are already technologically advanced and economically mature, thus showing slower growth due to saturation effects. Conversely, entities with lower Fitness may have more possibilities to grow, especially if they are diversifying or undergoing transitions. This behavior between Technological Fitness and GDP was already established in literature [60].

Finally in regions and MAs, we decided to highlight two distinctive trends that could suggest two types of economic development. These likely correspond to different innovation strategies or structural profiles. One trajectory tends to group highly diversified entities, often large metropolitan hubs with broad technological capabilities and high GDP growth. The other appears to include more specialized entities, where the technological portfolio is coherent but narrower, sometimes tied to dominant sectors. This divergence suggests that both paths, diversified innovation or focused specialization, may coexist and drive growth in different ways, depending on local context. To further illustrate these patterns, we provide two additional plots in the Supplementary Information. The first colors each MA according to its continent (with Western and Eastern Europe distinguished), showing that different geographical clusters align with the two trajectories. The second figure highlights specific countries such as Brazil, Russia, India, China, South Africa, Turkey, Mexico, and Poland, which predominantly populate the lower branch of the plot and illustrate a distinct growth regime compared to more developed entities. This pattern is consistent with previous findings on technological innovation dynamics in economic complexity [61]. It is also worth noting that the top-left corner of the scatter plots, corresponding to areas with very high Coherence but low Fitness, is predominantly populated by Chinese cities. This pattern highlights China’s strongly coherent diversification strategies, which is consistent with previous findings on technology spillovers within Chinese regions [62]. Exploring these regimes in a formal model remains an important direction for future work.

4.2 Feature importance with SHAP analysis

To determine the impact of Technological Fitness (F) and Technological Coherence (Γ), we treat our problem as a regression problem. Our independent variables, or features, are given by the average of Technological Fitness and Technological Coherence over a range of years from y0 to  +  δ, and ; the dependence variable, on the other hand, is represented by the Compound annual growth rate (CAGR) of GDPpc from y1 to y1  +  δ. We employ a Random Forest Regression model and conduct a feature importance analysis. This allows us to quantify their respective roles in the analysis. RandomForest algorithm [63] is a tree-based machine learning algorithm used to capture the non-linear links between variables better. The field of Economic Complexity literature has demonstrated its utility in making accurate predictions and effectively capturing the nonlinear relationships between variables used in Economic Complexity [1114,33,64].

Given the high correlation between Fitness and diversification, and the high dependence on the starting GDPpc, we also include these quantities in the analysis to control both F and Γ. In addition for this purpose, we control taking into account geographical fixed effects (MAfe, regionfe and countryfe) and time fixed effect y0.

First, we use the “RandomForestRegressor” from the “sklearn” python library [65] to train our regression model:

Because we are interested not only in comparing different features in model training but also in how they impact the output, we make use of SHapley Additive exPlanations (SHAP) analysis [66]. SHAP is a groundbreaking approach to machine learning interpretability, designed to explain the output of any machine learning model. It is based on the concept of Shapley values, a method from cooperative game theory [67] that assigns a fair distribution of both gains and costs to each participant in a coalition. In the context of machine learning, SHAP values measure the impact of each feature in a prediction model. Each feature value of a data instance contributes either positively or negatively to the prediction, compared to the average prediction across the dataset. SHAP effectively decomposes a prediction to show the contribution of each feature to the overall output. One of the key advantages of SHAP is its consistency and local accuracy, ensuring that each feature’s contribution is consistently calculated across different predictions. This makes SHAP particularly useful for providing insights into complex models, such as random forests [6870] or neural networks [7173], where traditional feature importance measures might fall short.

In our work, we use SHAP analysis to understand how Fitness and Coherence affect the CAGR measure. We used the implementation of SHAP in the SHAP Python package (https://shap.readthedocs.io/en/latest/#) to compute the SHAP values for all features by the command

By doing this, we obtain for all three models (one for each geographical scale) the SHAP values for each feature relative to each sample. We perform this computation considering all the y0 and y1 with for all three scales MA, region and country.

First, we show the feature importance comparison in Fig 4. Each pair of bars in this figure represents the feature importance of Fitness, Coherence, diversification, starting GDPpc and geographic and time information, computed with SHAP. The y-axis represents the average of the absolute SHAP values for each feature. We observe how the GDPpc(y1) is the more important feature, as expected, followed by the year and country geographic information (countryfe). However, we used them together only to control for the behavior of Fitness and Coherence. A comparison of only these two quantities is displayed in the main figure of Fig 4. We find that the Fitness and Coherence behavior switch from the single MA scale to the country scale in explaining the trends in CAGR. However, we do not yet know whether this importance translates into a positive or negative effect.

Finally, we plot in Fig 5 the trend of the SHAP values of the Fitness and Coherence features against their values, for all three scales. Values less than 0 impact negatively on the model, positively otherwise. Each bar in all three figures represents a decile of the data, ensuring an equal number of samples per bin. Furthermore, we apply a weighted linear fit to each plot and report the corresponding and p-value.

  • MA case: We adding a second fit after removing the first two deciles to better emphasize the increasing trend and the growing relevance of Fitness. Although the Γ trend is not statistically significant, we observe that high Coherence has a strong positive effect on the economic growth of MAs.
  • Region case: Both features exhibit a statistically significant downward trend. In particular, the last decile in the F plot has a pronounced negative impact on economic growth.
  • Country case: F displays an increasing trend, although not statistically significant. It is important to emphasize that our analysis focuses exclusively on highly developed countries, which may attenuate the observed role of Fitness in driving national economic growth, as previously noted in the literature [6]. Conversely, Γ shows a strong and statistically significant negative trend, reinforcing the idea that economic diversification is more beneficial than specialization in a limited set of activities at this level.

To conclude, it is crucial to emphasize also the high heterogeneity of these measures. As a result, regression techniques may not be the most suitable approach for addressing such scenarios, as discussed in the work of Cristelli et al. [74].

5 Discussion

The present study examines the impact of two technological measures—Technological Fitness (F) and Technological Coherence (Γ)—on economic growth across different geographical scales (metropolitan areas, regions, and countries). We assess their distinct contributions to growth, as measured by the Compound Annual Growth Rate (CAGR), and elucidate the interplay between technology composition and economic development. The findings of this study reveal intriguing patterns in the significance of Technological Fitness and Coherence across varying scales.

First of all, we make a pairwise correlation analysis between Fitness, Coherence and CAGR as shown in Fig 3. This analysis offers a first qualitative perspective on these interactions, revealing complex and heterogeneous correlation patterns. In particular, we observe bifurcated trends for MAs and regions that could suggest the coexistence of two development strategies: one based on specialized high-Coherence technological portfolios with moderate growth, and one based on broader diversification with higher growth. This divergence warrants further theoretical investigation.

Subsequently, the SHapley Additive exPlanations (SHAP) feature importance analysis shows a switched importance between Technological Fitness and Coherence in the correlation with CAGR. However, this result does not clarify whether this importance has a positive or negative impact. To understand this, we explore how the Fitness and Coherence features influenced growth.

We find that high Coherence implies faster growth only at the MA scale and this result aligns with previous research emphasizing the importance of Technological Coherence in calculating GDPpc for MAs [17]. At the regional and national level, Coherence is systematically correlated with lower growth.

This finding suggests that at broader scales, excessive specialization, captured by high Coherence, might reduce a system’s adaptive capacity, long-term resilience and ability to further innovate. In other words, while coherent technological portfolios may boost short-term efficiency and innovation in urban contexts, at regional or national level such coherence could signal over-concentration in mature or path-dependent domains, which may hinder diversification, adaptability and potential for recombination.

The results for Fitness are less sharp: it presents a U-shaped impact on growth for MAs, a decreasing impact on regions, and an upward trend at the national level. At the MA level, both very low and very high Fitness values are associated with positive growth, possibly reflecting the benefits of either flexible opportunistic innovation or highly structured technological leadership. In contrast, the negative correlation in regions may reflect intermediate diversification not backed by strong capabilities, while the national level seems to reward high Fitness, likely due to the focus on highly developed economies. In this sense, our work echoes findings in [6], but also highlights that Fitness effects are more subtle and depend on the heterogeneity of the dataset.

Finally, the heterogeneity of these effects deserves special attention. As highlighted by Cristelli et al. [74], the presence of multiple regimes or divergent behaviors within a single dataset suggests that linear models may fail to capture the true dynamics at play. Our SHAP analysis confirms this, showing that the impact of both Fitness and Coherence varies non-monotonically across deciles and scales.

The findings of this study contribute to our understanding of the intricate relationship between technology composition and economic development at different geographical levels. The results imply that the factors driving economic prosperity in metropolitan areas differ from those at regional and national scales.

However, it is important to acknowledge some limitations of this study. The analysis focused solely on the relationship between Technological Fitness, Coherence, and GDP per capita, without considering other potentially influential factors such as institutional frameworks, human capital, or infrastructure. Future research should aim to integrate these additional dimensions to provide a more comprehensive understanding of the drivers of economic development. Moreover, the study relies exclusively on patent data as a proxy for technological capabilities and innovation. While informative, patents may not capture the entire spectrum of technological complexity and knowledge production. Incorporating complementary indicators, such as R&D expenditure, collaboration networks, or firm-level innovation surveys, would strengthen the robustness of the results.

In conclusion, this study highlights the distinct and scale-dependent roles played by Technological Fitness and Coherence in shaping economic performance. The findings underscore the delicate balance between technological concentration and diversification in fostering growth, depending on the geographical context. By clarifying these patterns, we hope to inform policymakers and stakeholders designing innovation strategies tailored to local conditions. Finally, as this work raises several open questions about the mechanisms behind the observed scale effects, we envisage future studies developing a micro-founded theoretical model to explain the interplay between coherence, diversification, and economic resilience across scales.

References

  1. 1. Hausmann R, Hwang J, Rodrik D. What you export matters. J Econ Growth. 2006;12(1):1–25.
  2. 2. Hidalgo CA, Hausmann R. The building blocks of economic complexity. Proc Natl Acad Sci U S A. 2009;106(26):10570–5. pmid:19549871
  3. 3. Tacchella A, Cristelli M, Caldarelli G, Gabrielli A, Pietronero L. A new metrics for countries’ fitness and products’ complexity. Scientific Reports. 2012;2(1):1–7.
  4. 4. Hausmann R, Hidalgo CA, Bustos S, Coscia M, Simoes A. The atlas of economic complexity: Mapping paths to prosperity. Mit Press. 2014.
  5. 5. Caldarelli G, Cristelli M, Gabrielli A, Pietronero L, Scala A, Tacchella A. A network analysis of countries’ export flows: firm grounds for the building blocks of the economy. PLoS One. 2012;7(10):e47278. pmid:23094044
  6. 6. Cristelli MCA, Tacchella A, Cader MZ, Roster KI, Pietronero L. On the predictability of growth. World Bank Policy Research Working Paper. 2017;(8117).
  7. 7. Pugliese E, Tacchella A. Economic complexity for competitiveness and innovation: a novel bottom-up strategy linking global and regional capacities. Joint Research Centre (Seville Site). 2020.
  8. 8. Alves Dias P, Amoroso S, Bauer P, Bessagnet B, Cabrera Giraldez M, Cardona M. China 2.0-Status and Foresight of EU-China Trade, Investment and Technological Race. Joint Research Centre (Seville Site). 2022.
  9. 9. Hidalgo CA, Klinger B, Barabási A-L, Hausmann R. The product space conditions the development of nations. Science. 2007;317(5837):482–7. pmid:17656717
  10. 10. Zaccaria A, Cristelli M, Tacchella A, Pietronero L. How the taxonomy of products drives the economic development of countries. PLoS One. 2014;9(12):e113770. pmid:25486526
  11. 11. Albora G, Pietronero L, Tacchella A, Zaccaria A. Product progression: a machine learning approach to forecasting industrial upgrading. Sci Rep. 2023;13(1):1481. pmid:36707529
  12. 12. Fessina M, Albora G, Tacchella A, Zaccaria A. Identifying key products to trigger new exports: An explainable machine learning approach. Journal of Physics: Complexity. 2024.
  13. 13. Albora G, Zaccaria A. Machine learning to assess relatedness: the advantage of using firm-level data. Complexity. 2022;2022(1).
  14. 14. Tacchella A, Zaccaria A, Miccheli M, Pietronero L. Relatedness in the era of machine learning. Chaos, Solitons & Fractals. 2023;176:114071.
  15. 15. Saracco F, Di Clemente R, Gabrielli A, Pietronero L. From innovation to diversification: a simple competitive model. PLoS One. 2015;10(11):e0140420. pmid:26544685
  16. 16. Pugliese E, Napolitano L, Zaccaria A, Pietronero L. Coherent diversification in corporate technological portfolios. PLoS One. 2019;14(10):e0223403. pmid:31600259
  17. 17. Straccamore M, Bruno M, Monechi B, Loreto V. Urban economic fitness and complexity from patent data. Sci Rep. 2023;13(1):3655. pmid:36871046
  18. 18. Frietsch R, Schmoch U, Van Looy B, Walsh JP, Devroede R, Du Plessis M. The value and indicator function of patents. Studien zum deutschen Innovationssystem. 2010.
  19. 19. Griliches Z. Patent statistics as economic indicators: A survey. R&D and productivity: The econometric evidence. National Bureau of Economic Research, Inc.; 1998. p. 287–343.
  20. 20. Leydesdorff L, Alkemade F, Heimeriks G, Hoekstra R. Patents as instruments for exploring innovation dynamics: geographic and technological perspectives on “photovoltaic cells”. Scientometrics. 2014;102(1):629–51.
  21. 21. Pugliese E, Cimini G, Patelli A, Zaccaria A, Pietronero L, Gabrielli A. Unfolding the innovation system for the development of countries: coevolution of science, technology and production. Scientific Reports. 2019;9(1):1–12.
  22. 22. Youn H, Strumsky D, Bettencourt LMA, Lobo J. Invention as a combinatorial process: evidence from US patents. J R Soc Interface. 2015;12(106):20150272. pmid:25904530
  23. 23. de Rassenfosse G, Kozak J, Seliger F. Geocoding of worldwide patent data. Sci Data. 2019;6(1):260. pmid:31695047
  24. 24. Fall CJ, Törcsvári A, Benzineb K, Karetka G. Automated categorization in the international patent classification. SIGIR Forum. 2003;37(1):10–25.
  25. 25. Falasco L. Bases of the United States Patent Classification. World Patent Information. 2002;24(1):31–3.
  26. 26. Falasco L. United States patent classification: system organization. World Patent Information. 2002;24(2):111–7.
  27. 27. Hall B, Helmers C, Rogers M, Sena V. The choice between formal and informal intellectual property: a review. Journal of Economic Literature. 2014;52(2):375–423.
  28. 28. Hall BH, Jaffe A, Trajtenberg M. Market value and patent citations. RAND Journal of Economics. 2005;:16–38.
  29. 29. Pavitt K. Patent statistics as indicators of innovative activities: possibilities and problems. Scientometrics. 1985;7(1–2):77–99.
  30. 30. Arts S, Appio FP, Van Looy B. Inventions shaping technological trajectories: do existing patent indicators provide a comprehensive picture?. Scientometrics. 2013;97(2):397–419.
  31. 31. Kogler D. Intellectual property and patents in manufacturing industries. The handbook of manufacturing industries in the world economy. 2015. p. 163–88.
  32. 32. Lanjouw JO, Mody A. Innovation and the international diffusion of environmentally responsive technology. Research Policy. 1996;25(4):549–71.
  33. 33. Straccamore M, Pietronero L, Zaccaria A. Which will be your firm’s next technology? Comparison between machine learning and network-based algorithms. J Phys Complex. 2022;3(3):035002.
  34. 34. Arsini L, Straccamore M, Zaccaria A. Prediction and visualization of mergers and acquisitions using economic complexity. PLoS One. 2023;18(4):e0283217. pmid:37011046
  35. 35. Albora G, Straccamore M, Zaccaria A. Machine learning-based similarity measure to forecast M&A from patent data. arXiv preprint 2024. https://arxiv.org/abs/240407179
  36. 36. Di Clemente R, Chiarotti GL, Cristelli M, Tacchella A, Pietronero L. Diversification versus specialization in complex ecosystems. PLoS One. 2014;9(11):e112525. pmid:25384059
  37. 37. Balland PA, Rigby D. The geography of complex knowledge. Economic Geography. 2017;93(1):1–23.
  38. 38. Boschma R, Balland P-A, Kogler DF. Relatedness, technological change in cities: the rise and fall of technological knowledge in US metropolitan areas from 1981 to 2010. Industrial and Corporate Change. 2014;24(1):223–50.
  39. 39. Kogler DF, Heimeriks G, Leydesdorff L. Patent portfolio analysis of cities: statistics and maps of technological inventiveness. European Planning Studies. 2018;26(11):2256–78.
  40. 40. Kogler DF, Rigby DL, Tucker I. Mapping knowledge space and technological relatedness in US cities. European Planning Studies. 2013;21(9):1374–91.
  41. 41. Balland P-A, Rigby D, Boschma R. The technological resilience of US cities. Cambridge Journal of Regions, Economy and Society. 2015;8(2):167–84.
  42. 42. O’Neale DRJ, Hendy SC, Vasques Filho D. Structure of the region-technology network as a driver for technological innovation. Front Big Data. 2021;4:689310. pmid:34337398
  43. 43. Napolitano L, Evangelou E, Pugliese E, Zeppini P, Room G. Technology networks: the autocatalytic origins of innovation. R Soc Open Sci. 2018;5(6):172445. pmid:30110482
  44. 44. Dettmann E, Dominguez LI, Günther J, Jindra B. Determinants of foreign technological activity in German regions-A count model analysis of transnational patents 1996 -2009). Higher School of Economics Research Paper No WP BRP. 2013;17.
  45. 45. Tavassoli S, Carbonara N. The role of knowledge variety and intensity for regional innovation. Small Bus Econ. 2014;43(2):493–509.
  46. 46. Colombelli A, Krafft J, Quatraro F. The emergence of new technology-based sectors in European regions: a proximity-based analysis of nanotechnology. Research Policy. 2014;43(10):1681–96.
  47. 47. Sbardella A, Zaccaria A, Pietronero L, Scaramozzino P. Behind the Italian regional divide: An economic fitness and complexity perspective. 2021 /30. Pisa, Italy: Laboratory of Economics and Management (LEM), Sant’Anna School of Advanced Studies; 2021. https://ideas.repec.org/p/ssa/lemwps/2021-30.html
  48. 48. Sbardella A, Perruchas F, Napolitano L, Barbieri N, Consoli D. Green technology fitness. Entropy (Basel). 2018;20(10):776. pmid:33265864
  49. 49. Sbardella A, Pugliese E, Pietronero L. Economic development and wage inequality: a complex system analysis. PLoS One. 2017;12(9):e0182774. pmid:28926577
  50. 50. Montecchi T, Russo D, Liu Y. Searching in cooperative patent classification: comparison between keyword and concept-based search. Advanced Engineering Informatics. 2013;27(3):335–45.
  51. 51. Kummu M, Taka M, Guillaume JH. Gridded global datasets for gross domestic product and Human Development Index over 1990 –2015. Scientific Data. 2018;5(1):1–15.
  52. 52. Schiavina M, Moreno-Monroy A, Maffenini L, Veneri P. GHS-FUA R2019A - GHS functional urban areas, derived from GHS-UCDB R2019A 2015 . European Commission, Joint Research Centre (JRC); 2019. http://data.europa.eu/89h/347f0337-f2da-4592-87b3-e25975ec2c95
  53. 53. Areas GA. GADM database of global administrative areas. Global Administrative Areas; 2012.
  54. 54. Saracco F, Straka MJ, Clemente RD, Gabrielli A, Caldarelli G, Squartini T. Inferring monopartite projections of bipartite networks: an entropy-based approach. New J Phys. 2017;19(5):053022.
  55. 55. Saracco F, Di Clemente R, Gabrielli A, Squartini T. Randomizing bipartite networks: the case of the World Trade Web. Sci Rep. 2015;5:10595. pmid:26029820
  56. 56. Vallarano N, Bruno M, Marchese E, Trapani G, Saracco F, Cimini G, et al. Fast and scalable likelihood maximization for Exponential Random Graph Models with local constraints. Sci Rep. 2021;11(1):15227. pmid:34315920
  57. 57. Pugliese E, Zaccaria A, Pietronero L. On the convergence of the fitness-complexity algorithm. Eur Phys J Spec Top. 2016;225(10):1893–911.
  58. 58. Quatraro F. Knowledge coherence, variety and economic growth: manufacturing evidence from Italian regions. Research Policy. 2010;39(10):1289–302.
  59. 59. Kalapouti K, Varsakelis NC. Intra and inter: regional knowledge spillovers in European Union. J Technol Transf. 2014;40(5):760–81.
  60. 60. Angelini O, Gabrielli A, Tacchella A, Zaccaria A, Pietronero L, Di Matteo T. Forecasting the countries’ gross domestic product growth: the case of Technological Fitness. Chaos, Solitons & Fractals. 2024;184:115006.
  61. 61. Straccamore M, Loreto V, Gravino P. The geography of technological innovation dynamics. Sci Rep. 2023;13(1):21043. pmid:38030886
  62. 62. Gao J, Jun B, Pentland A ‘Sandy,’ Zhou T, Hidalgo CA. Spillovers across industries and regions in China’s regional economic diversification. Regional Studies. 2021;55(7):1311–26.
  63. 63. Breiman L. Random forests. Machine Learning. 2001;45(1):5–32.
  64. 64. Napoletano A, Tacchella A, Pietronero L. A context similarity-based analysis of countries’ technological performance. Entropy (Basel). 2018;20(11):833. pmid:33266558
  65. 65. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: machine learning in Python. The Journal of Machine Learning Research. 2011;12:2825–30.
  66. 66. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems. 2017;30.
  67. 67. Shapley LS. 17. A value for n-person games. Contributions to the Theory of Games (AM-28), Volume II. Princeton University Press; 1953. p. 307–18. https://doi.org/10.1515/9781400881970-018
  68. 68. Deb D, Smith RM. Application of random forest and SHAP tree explainer in exploring spatial (In)Justice to aid urban planning. IJGI. 2021;10(9):629.
  69. 69. Hatami F, Rahman MdM, Nikparvar B, Thill J-C. Non-linear associations between the urban built environment and commuting modal split: a random forest approach and SHAP evaluation. IEEE Access. 2023;11:12649–62.
  70. 70. Wang H, Liang Q, Hancock JT, Khoshgoftaar TM. Feature selection strategies: a comparative analysis of SHAP-value and importance-based methods. J Big Data. 2024;11(1).
  71. 71. Younisse R, Ahmad A, Abu Al-Haija Q. Explaining intrusion detection-based convolutional neural networks using shapley additive explanations (shap). Big Data and Cognitive Computing. 2022;6(4):126.
  72. 72. Zheng Q, Wang Z, Zhou J, Lu J. Shap-CAM: visual explanations for convolutional neural networks based on Shapley value. In: European Conference on Computer Vision. 2022. p. 459–74.
  73. 73. Chen J, Koju W, Xu S, Liu Z. Sales forecasting using deep neural network and SHAP techniques. In: 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE). 2021. p. 135–8.
  74. 74. Cristelli M, Tacchella A, Pietronero L. The heterogeneous dynamics of economic complexity. PLoS One. 2015;10(2):e0117174. pmid:25671312