Economic complexity of prefectures in Japan

Every nation prioritizes the inclusive economic growth and development of all regions. However, we observe that economic activities are clustered in space, which results in a disparity in per-capita income among different regions. A complexity-based method was proposed by Hidalgo and Hausmann [PNAS 106, 10570-10575 (2009)] to explain the large gaps in per-capita income across countries. Although there have been extensive studies on countries’ economic complexity using international export data, studies on economic complexity at the regional level are relatively less studied. Here, we study the industrial sector complexity of prefectures in Japan based on the basic information of more than one million firms. We aggregate the data as a bipartite network of prefectures and industrial sectors. We decompose the bipartite network as a prefecture-prefecture network and sector-sector network, which reveals the relationships among them. Similarities among the prefectures and among the sectors are measured using a metric. From these similarity matrices, we cluster the prefectures and sectors using the minimal spanning tree technique. The computed economic complexity index from the structure of the bipartite network shows a high correlation with macroeconomic indicators, such as per-capita gross prefectural product and prefectural income per person. We argue that this index reflects the present economic performance and hidden potential of the prefectures for future growth.

prefectures that are present in Japan, firstly considering how similar are the industrial sectors that can be found in each of the prefectures. Then, they apply a series of published methods that are ad-hoc adapted for their case study to evaluate the economic health of each of the prefectures. The analysis is deeply done, and it can be followed even from readers, like me, that are not expert in this specific research field (a thing that is valuable for a broad impact journal like PlosOne).
We are pleased that the reviewer found that "The analysis is deeply done, and it can be followed even from readers, like me, that are not expert in this specific research field (a thing that is valuable for a broad impact journal like PlosOne)."

.). A step in this direction would be very interesting also for the other figures, even if less obvious.
We thank the reviewer for this excellent idea. We have added a map of Japan in the Appendix with the same color code as used for the regions in Fig 1. Changes made in the manuscript: At Fig 1 caption: "The eight regions of Japan are shown in a map using the same color code in Fig. S1 of Appendix S1." Changes in Appendix S1: We have added a map of Japan in Fig S1 to show the 8 regions and the 47 prefectures in Japan.

2.
While the study is based on the geographical unit of the prefecture, it would be interesting to give an insight on the impact of the regional division. How the obtained results relate with the divisions in regions of the prefectures?
To relate the obtained results with the divisions in region we have added a section "The average prefectural economic complexity of regions in Japan" in the Appendix S1.
Changes made in the manuscript: At line 237-239: "An interpretation of the above results for the regions in Japan is given in the section "the average prefectural economic complexity of regions in Japan" of Appendix S1." Changes made in Appendix S1: We have added a section ``the average prefectural economic complexity of regions in Japan" We have added Fig S2. 3. Finally, I think that the work the authors presented at the Complex Network conference that has exactly the same title should be properly referenced in the paper.
We presented some of the very preliminary results with the same title at the Complex Network conference 2019. A short abstract was only published in the "Book of Abstracts". We feel this citation may not be necessary here.

Reviewer #2:
The authors apply the economic complexity methodology to the bipartite network of economic sectors and Japanese prefectures. Even if I am sympathetic with respect to the methodology, I do not think that the manuscript can be published in the present form. The authors could try to deeply revise it, trying to add economical and or theoretical investigations. The main problem is the scientific contribution of this manuscript. It is stated in the abstract that "studies on economic complexity at the regional level are lacking", quite in contradiction with lines 32-33 "economic complexity has been studied at the regional level for China [12], Brazil [10], the US and the UK [5]", and I could add Mexico [1], Italy [2], Spain [3], and even Australia [4]. All these studies follow more or less the same route: first, a database is obtained with the export structure (i.e., different products) of each subnational entity; then some normalization is performed (RCA); then the ECI or the Fitness or the Product Space algorithm is applied. I believe that publishing the same exercise with different data is not very interesting from a scientific point of view. This analysis can be of some utility for Japanese policymakers, but does not add much to these known methodologies and also the economic meaning is not much discussed, for instance, with respect to more standard economical approaches. In conclusion, I find the data interesting but I think the authors should make a large effort to provide some better contribution to the literature.
Most of these regional complexity studies are done at very coarse grain level. In case of China, the analysis is performed for 31 provinces with 2690 firms, which is a tiny fraction of all Chinese firms. The complexity analysis is performed at states level for Brazil, Mexico and Australia. Moreover, the aims of some of these studies are very different from our analysis. For example, the regional complexity analysis of US and UK has been used as an example for the interpretation of ECI. What is very different in our study is that it concerns supply-chain in which regions and industrial sectors are studied, so one is looking at process of value added starting from a giant network of firms and by aggregating as a binary bipartite network. Here we performed a detailed complexity analysis of the prefectures in Japan using ECI and fitness complexity algorithm. Furthermore, our analysis revels the similarity among prefectures and among industrial sectors in Japan. The data of our study covers a large fraction of firms in Japan. Japan has been one of the most diversified country in the sense of the products. Therefore, it is important to reveal that whether such diversity comes from regional structures.
We thank the referee for altering us to these papers. We have now incorporated these references in the manuscript.
Changes made in the manuscript: At the abstract: "Although there have been extensive studies on countries' economic complexity using international export data, studies on economic complexity at the regional level are relatively less studied." At line 32-40: "Recently, economic complexity has been studied at the regional level for China [ [5]. Most of these regional complexity studies are done at very coarse grain level. In case of China, the analysis is performed for 31 provinces with 2690 firms, which is a tiny fraction of all Chinese firms. The complexity analysis is performed at states level for Brazil, Mexico and Australia. The difference in our study is that it concerns supply-chain in which prefectures and industrial sectors are studied. We are looking at process of value added starting from a giant network of firms and by aggregating as a binary bipartite network of prefectures and industrial sectors." At line 46-48: "Japan has been one of the most diversified country in the sense of the products. Therefore, it is important to reveal that whether such diversity comes from regional structures." At line 78-79: "These firms constitute a giant weakly connected component in the Japanese production network [21]." Moreover, other important methodological issues are present: -Some sectors are excluded from the analysis in a quite arbitrary way (as stated, because they are only linked to Tokyo, and this results in zero fitness for the other prefectures). Whether this is a data or a methodological problem, it should be clearly stated and discussed.
In the manuscript, at line 88-89, we have already mentioned that "the inclusion of these sectors in our analysis results in the largest value for the fitness of Tokyo, and the fitness of other prefectures become zero." -The economic complexity methodology is usually applied to export data. Here RCA is computed from annual sales, which include internal and external production in a highly biased way. Both the use of sales instead of export and RCA should be motivated, possibly with some references.
Now we have included our motivation for the use of sales instead of export with references.
Changes made in the manuscript: At line 92-100: "The Revealed Comparative Advantage (RCA)) [Balassa 1965] is frequently used as a quantitative criterion to evaluate the relative dominance of a country, in the export of certain products by comparing it with the average export of those products. Recently, RCA has been measured from the ratio between the actual number of firms from an industry in a province and the average number of firms from that industry in that province [Gao 2018]. Mealy et al. constructed a binary region-industry matrix based on the number of people employed in an industry in a region [Mealy 2019]. Here, we use annual sales of industrial sector s in prefecture p to measure the RCA."

-From Figures 4 and 6 it emerges that all the correlations are driven by Tokyo. What happens if it is removed?
Our analysis shows that the Tokyo has highest economic complexity or fitness. Therefore, it plays an important role. If we remove the data for Tokyo in our correlation measures we find the following results: The Pearson's product-moment correlation between ECI and GPP per Capita is r = 0.454 and pvalue = 0.001. The Pearson's product-moment correlation between ECI and Prefectural income per person r= 0.500 and p-value = 0.0004.
The Pearson's product-moment correlation between fitness and GPP per Capita is r = 0.241 and p-value = 0.107. The Pearson's product-moment correlation between fitness and Prefectural income per person r= 0.243 and p-value = 0.104.
The above results show that the correlations measures are not solely driven by Tokyo (at least in case of ECI and the macro-economic indicators).
Minor issues: -Lines 101-102 are inaccurate. k_{p,N} is not the diversity of prefecture, as incorrectly stated, but the ECI. The diversity is k0. The same applies to sectors.
We thank the reviewer for pointing it out. We have now modified the line in the manuscript.
Changes made in the manuscript At line 122-124: "The economic complexity index (ECI) of prefectures and product complexity index (PCI) of industrial sectors can be calculated using the following iterative equation:" -The mathematical derivation in lines 104-125 is well known in the literature (ref. [7] and [24] of the paper), so it is quite useless. -Also, the (quite elementary) similarity measures for both prefectures (pag.5) and sectors (page 6) is from [7].
We agree with the reviewer that the mathematical derivation in lines 104-125 is well known in the literature and also the similarity measures are very basic. However, we believe it will be helpful for the readers to follow the article in a compact way. So we keep it as it is.
-On lines 188-189 it is stated that "k p,0 and k p,1 are slightly negatively correlated (Pearson correlation coefficient r = −0.230 and p-value = 0.119)", which I find rather surprising. These iterative methods are supposed to converge to some "real value" of the economic complexity, providing better and better assessments as the iteration procedure goes on, so I would expect them to be highly correlated.
We found the correlation low because these are initial iterations, however we have observed the correlation becomes strong at the subsequent iterations. The Pearson correlation coefficient between k p,1 and k p,2 is found r =-0.630 and p-value=2.05 X 10 -6 . The paper is technically sound and the database is fantastic. The statistical analysis is also adequate but more could be done with such a detailed database The paper is also well written in English The authors have made clear where the data was got from but I am not sure if they can share the detailed information on the firms. Could you please clarify this issue?
We cannot share the detailed information on firms as the data are not in the public domain but are commercially available.
The data for bipartite network is based on a survey done by Tokyo Shoko Research (TSR), one of the leading credit research agencies in Tokyo, supplied to us through the Research Institute of Economy, Trade and Industry (RIETI). The data are not in the public domain, but are commercially available. Information on data access is given below. Provider: Tokyo Shoko Research, Ltd. JA Bldg., 1-3-1 Otemachi, Chiyoda-ku, Tokyo 100-6810, JAPAN Tel: +81 (0)3-6910-3142 Fax: +81 (0)3-5221-0712 Web: http://www.tsr-net.co.jp/ Database: TSR Company Profile Data File Description: TSR Company Profile Data File is standard data sets that have been provided for many years in local market. The data is based on TSR's reporters' site visit interviews which is our most frequent data source. TSR offers various services/ data in accordance with customers' demand, such as TSR Report, TSR Company Profile Data File, financial statement data, Internet Service "tsrvan2", viability scores, etc. We collected the gross prefectural product data, prefectural population data and prefectural income per person data for the year 2015 from the Japanese government statistical portal site (https://www.e-stat.go.jp), which are in the public domain.
Thank you for allowing me to review this interesting paper, I recommend its publication but only after some comments which I enumerate below are dealt with: 1) The authors should justify their clustering method as there are other methods like stochastic block models or other recent applications like for bank-firms bipartite networks which can be used for bipartite networks We agree with the reviewer that there are many methods for clustering analysis. However, most of them give very similar results. Moreover, our network is similar to a correlation based network. The minimal spanning tree is a widely used technique for clustering analysis for this type of network. Methods like stochastic block models are mostly used for sparse networks.
2) The data set is fantastic and the analysis interesting. However, a study using more granularity would be very welcomed. The authors thought about a prefecture-firm study, for example. In this way the complexity of prefectures would be based on individual firms, as well as the complexity of firms (maybe less interesting).
Prefecture-firm network may not be suitable for this study. As each firm belongs to a single prefecture only. The degree of each firm will be one in that case, indicating a homogeneity.
3) Is there a specific reason for the authors to use the unweighted version of the bipartite network? I think that the weighted version can be quite useful in order to put monetary value to the metrics, even for the projections the monetary value of the similarity can be quite useful as well as a possible centrality study.
The algorithm that measures economic complexity index and product complexity index is applied on a binary bipartite matrix [PNAS 106(26),10570 (2009)]. This is the reason we have used the unweighted version of the bipartite network.
. To conclude, we again thank all the reviewers for providing us with the opportunity to clarify these issues. We have updated our manuscript (as detailed in the attached "List of Changes to the Manuscript") according the points discussed above. We hope that the changes we have made to the manuscript address the reviewers' concerns.
List of Changes to the Manuscript