From innovation to diversification: a simple competitive model

Few attempts have been proposed in order to describe the statistical features and historical evolution of the export bipartite matrix countries/products. An important standpoint is the introduction of a products network, namely a hierarchical forest of products that models the formation and the evolution of commodities. In the present article, we propose a simple dynamical model where countries compete with each other to acquire the ability to produce and export new products. Countries will have two possibilities to expand their export: innovating, i.e. introducing new goods, namely new nodes in the product networks, or copying the productive process of others, i.e. occupying a node already present in the same network. In this way, the topology of the products network and the country-product matrix evolve simultaneously, driven by the countries push toward innovation.


Appendix Fitness and Complexity
In order to extract the information contained in theM -matrix, the authors of [1,2,3,4], overcoming flaws present in the seminal works [5,6], propose a metric for countries and products, the celebrated Fitness and Complexity algorithm: this recursive and non linear algorithm is a sort of PageRank applied to bipartite networks, where Fitness is the quantity for countries, while Complexity is the one for products. The idea at the the basis of the algorithm is that highest fitness countries are those which are able to export the highest number of the most exclusive products, i.e. those with the highest complexity. In particular, the Fitness F c for the generic country c and Quality Q p for the generic product p at the n−th step of iteration, are defined as where the symbols ⟨ · ⟩ indicate the average taken over the proper set. The initial condition are taken as F 0 c = Q 0 p = 1 ∀c ∈ N c , ∀p ∈ N p , where N c and N p are the number respectively of countries and products (the convergence of the algorithm described by Eqs.(1) depends on the shape of the matrixM , as it has been discussed in [7]).

Non trivial benchmarks Fitness and Quality distributions
Using Fitness and Complexity it is possible to reveal several non-trivial properties of theMmatrix: the very first observation is that, once reordered rows and columns respectively by fitness and complexity, theM -matrix shows a peculiar triangular form, already observed in biological systems, [8,9,10,11] . The triangular shape ofM shows that even the most diversified countries do not export just the most exclusive products, but even the common ones [1,2,3,4,5,6,7,12,13]. The form of the distribution for the Fitness (Quality) ranking against country diversification (products ubiquity) depends strongly on the shape ofM . They are sparse distributions, due to the non-linearity of the algorithm of Fitness/Complexity, which cause even the peculiar shape, as shown in the Fig. 3(a, b) (main paper): real data are blue points, while the cloud represent the frequency of the simulated data.

Nestedness
As already mentioned, the "triangularity" of theM -matrix is a typical feature of biological mutualistic networks. Traditionally, in the biology literature the triangular form of the biadjacency matrix is measured by the nestedness, i.e. how much a row (or a column) is a subset of the others. In the literature there is a great amount of different proposals for the nestedness definition, [8,9,11,10,14]; here we decide to use the NODF (Nestedness metric based on Overlap and Decreasing Fill) presented in [8] because, according to us, it is the most intuitive. Using the definition of Eqs. 3, The total nestedness measure NODF t is then where N c and N p are the number respectively of countries and products. Note that the final value of the nestedness gets contribution just in case the number of the non-zero elements of the two rows (or columns) considered is different (for that, the name "decreasing fill" in the NODF acronym). Eventually, the final formula for the nestedness is the weighted sum of the contributions from rows and from columns, so it is possible to isolate the two different nestedness for rows (i.e. countries, NODF c ) and for columns (i.e. products, NODF p ): Since usual matricesM are quite "rectangular", i.e. the number of products is much greater then the number of countries, the total NODF 4 is biased by the contribution from the products. In fact, combining Eqs.4 and 5: Because of the previous relation, for the analysis of the initial conditions we just compared the results for NODF c and NODF p , while the effect of the previous approximation is shown in the comparison between the Table S1 and the Table S3.

Assortativity
The assortativity parameter r [15] has been introduced in order to measure how much nodes tend to be linked by nodes with a similar degree. More in details, r takes values from -1 to 1, where -1 denotes a network perfectly disassortative, i.e. lowest degree nodes are linked to the highest degree ones, while 1 denotes a network perfectly assortative, i.e. high degree vertices links with highest degree one; in terms of the adjacency matrix and node degrees, r can be written as For the previous discussions about the triangularity, one may expect that the value of assortativity is negative (since poorly diversified countries exports the most ubiquitous products), but with a low absolute value, say much less than 0.5, since high degree countries have low complexity products in their export basket, as well as more complex ones. This expectation is just partly satisfied, since the value of r for the matrices observed is indeed negative, but its absolute value is quite large, say of the order of 0.6: at a second look at the Fig. 3(c) (main paper), in effect, it is possible to see that the density of the export basket of most diversified countries moves toward the low degrees products, i.e. the most exclusive ones.

Motifs for bipartite networks
Some motifs for bipartite networks have been defined in the contest of biological mutualistic networks. One of the most used is the checkerboards score [16] , i.e. the number of patterns of 2 × 2 submatrices present in the biadjacency matrices as mutually-exclusive terms such as ■□ □■ , □■ ■□ 12 . The checkerboards score, in other words, measures the how much mutually exclusive are the choices made by different countries about the composition of their export basket. The total number of possible checkerboards patterns can be written as Other several motifs for bipartite networks have been proposed in [17,18] in order to uncover the structural properties of the system at hand. Among others, we decide to focus on V and Λ motifs 3 . Respectively, the total number of V −motifs and Λ−motifs count the total cooccurrence of products in two different export baskets and of countries in the set of producer of two different goods; in term of the entries of the biadjacency matrixM , they are defined as

Tuning parameters
The model, described in details in section "The model", has a four-dimensional parameter space. In order to determine which of the values of (α, β, γ, k 0 ) are more compatible with observations we generated matrices for all possible values of the parameters and compared the results with the observed data. In particular, we focus on initial conditions with N roots ≤ 25, according to the conclusions of the following section. In the Tables S1, S2, S3, S4, S5, S6, S7, simulated data have been generated for initial conditions with N roots = 20 and P 0 = 0.3. 2 Since we prefer absolute measures, thus not depending on the order of rows and column, in the total number of checkerboards patterns we consider the occurrence of both ■□ □■ and □■ ■□ . 3 The names V (Λ) come from the fact that once the two layers of the bipartite network are rotated such that the countries layer is over the products one, the former (the latter) motifs look like V s (Λs) between the layers.
We observed that the parameter k 0 appears not to be influential on the measures for k 0 ≥ 4. It is worth noticing that 4 is even the average value of the minimum of the ubiquity for real matrices; effectively k p = 4 is the first value of the ubiquity for which the probability of a novelty exceed the probability of innovating. In the following we will present results with k 0 = 4; for every (α, β, γ) configuration we generated 56 matrices.
As the Tables S1, S2, S3, S4, S5 , S7 show, once β and γ are fixed, i.e. the parameters of the second and the third choices, the value of the α parameter which are able to replicate real data are quite narrow around the value α = 1.6. Instead, for a fixed value of α, the best results are in an area around the "anti-diagonal" of the tables shown, so for decreasing β values once γ grows and vice versa.
The results for the number of V −motifs in Table S6 deserves a special treatment: in fact, while, the Λ−motifs are well reproduced, it happens quite rarely that the model is able to replicate the observed results. The meaning of the results reside in the definition of V − and Λ−motifs: V s (Λs) are the number of co-occurrence of countries (products) in the set of producers for every product (in the export set for every country). In effect, our algorithm is driven by a hierarchy imposed on the products, while no kind of structure is forced onto the countries set, so the constraints drive the total number of Λs, while the number of V s is too "free".

Tuning Initial Conditions
In Table S8 we compare different values of N roots from 10 to 40 (at steps of 5 roots) and different P 0 , from 0.2 to 0.6 (at steps of 0.05); we fix the values of the parameter for the evolution dynamics among the best performing ones, according to the previous analyses, i.e. α = 1.6, β = 1, γ = 0.6 and k 0 = 4; for every initial conditions configuration we produced 50 simulated M −matrix. The results can be observed in the Table S8: for low values of N roots , the discrepancy among different distributions relative to different values of P 0 is limited and very often cross the red line representing real data. Because of it, we mostly used low N roots ≤ 25 , since they need less fine tuning on the P 0 . Moreover, higher P 0 are less precise, especially for a higher number of roots: thus it seems that the mean number of roots per countries to make the algorithm start should be quite small, say ∼ 6.
We tried, imposing some offsets for every choice, even to make the algorithm start from no product, but the result are not satisfying since the usual measures on the matrix are not replicated.  Table S1: Parameters space analysis: NODF t . It is possible to observe the variation of the NODF t at the changing of the parameters α, β γ; the parameter k 0 has been kept fixed to the value 4, since no variation in any of the measure analysed has been observed for greater values. The NODF t measured on the original matrix is represented as a red line. The best values of the parameter α are the lowest analysed, i.e. α ≤ 1.65. Instead, the wider area of acceptance for the γ parameters is for the central area of the table, i.e. 0.4 ≤ γ ≤ 0.6, while β is more or less non-influential on the the acceptance of the measure.   Table S2: Parameters space analysis: NODF p . It is possible to observe the variation of the NODF p at the changing of the parameters α, β γ; the parameter k 0 has been kept fixed to the value 4, since no variation in any of the measure analysed has been observed for greater values. The NODF p measured on the original matrix is represented as a red line. The best values of the parameter α for reproducing NODF p are the lowest analysed, i.e. α ≤ 1.65. Instead, the wider area of acceptance for the γ parameter is for the central area of the table, i.e. 0.4 ≤ γ ≤ 0.6, while β is more or less non-influential on the the acceptance of the measure. Note that all the graphs presented here are completely overlapping with the one of the Table S1, because for the analysed network the approximation of Eq. (6) Table S3: Parameters space analysis: NODF c . It is possible to observe the variation of the NODF c at the changing of the parameters α, β γ; the parameter k 0 has been kept fixed to the value 4, since no variation in any of the measure analysed has been observed for greater values. The NODF c measured on the original matrix is represented as a red line. The best values of the parameter α are the lowest analysed, i.e. α ≤ 1.65, as in Table S2. In replicating the measure of NODF c the value of β and γ are more non influential than in Table S2 and more or less all configuration (α, β, γ) are able to correctly replicate the real value.  Table S4: Parameters space analysis: It is possible to observe the variation of the r at the changing of the parameters α, β γ; the parameter k 0 has been kept fixed to the value 4, since no variation in any of the measure analysed has been observed for greater values. The r measured on the original matrix is represented as a red line. Instead, the wider area of acceptance for the β and γ parameters is around the "anti-diagonal" of the table represent, so high value of γ for low β and vice versa, while α is always centred over 1.65.   Table S5: Parameters space analysis: N Checkerboards . It is possible to observe the variation of the N Checkerboards at the changing of the parameters α, β γ; the parameter k 0 has been kept fixed to the value 4, since no variation in any of the measure analysed has been observed for greater values. The N Checkerboards measured on the original matrix is represented as a red line. The wider area of acceptance for the β and γ parameters is similar to the one of the  Table S6: Parameters space analysis: N V . It is possible to observe the variation of the N V at the changing of the parameters α, β γ; the parameter k 0 has been kept fixed to the value 4, since no variation in any of the measure analysed has been observed for greater values. As it is possible to see in the present table, our model is not able to capture the number of V −motifs in the network for more or less none of the parameters analysed. This phenomenon is due to the fact that the model evolution is based on a hierarchical structure for products (the products network) that is not present for the countries: in effect, as Table S7 shows, there is much more agreement with the original data for the Λ−motifs.  Table S7: Parameters space analysis: N . It is possible to observe the variation of the N Λ at the changing of the parameters α, β γ; the parameter k 0 has been kept fixed to the value 4, since no variation in any of the measure analysed has been observed for greater values. Respect to the Table S6 we find a better agreement in reproducing Λ−motifs: this fact is probably due to base the evolution of the model on a (evolving) structure for products, which keeps trace in the total number of Λ−motif, i.e. the number of co-occurrence of 2 different products in the exports baskets. As in the previous tables, the best results are obtained for low value of α and the area along the "anti-diagonal" of the presented table.