Bank-firm credit network in Japan. An analysis of a bipartite network

We present an analysis of the credit market of Japan. The analysis is performed by investigating the bipartite network of banks and firms which is obtained by setting a link between a bank and a firm when a credit relationship is present in a given time window. In our investigation we focus on a community detection algorithm which is identifying communities composed by both banks and firms. We show that the clusters obtained by directly working on the bipartite network carry information about the networked nature of the Japanese credit market. Our analysis is performed for each calendar year during the time period from 1980 to 2011. Specifically, we obtain communities of banks and networks for each of the 32 investigated years, and we introduce a method to track the time evolution of these communities on a statistical basis. We then characterize communities by detecting the simultaneous over-expression of attributes of firms and banks. Specifically, we consider as attributes the economic sector and the geographical location of firms and the type of banks. In our 32 year long analysis we detect a persistence of the over-expression of attributes of clusters of banks and firms together with a slow dynamics of changes from some specific attributes to new ones. Our empirical observations show that the credit market in Japan is a networked market where the type of banks, geographical location of firms and banks and economic sector of the firm play a role in shaping the credit relationships between banks and firms.


I. INTRODUCTION
Bipartite networks are quite common in complex systems.Classic examples are networks of actors and movies, board members and companies, authors and scientific papers, etc.The customary investigation of bipartite networks is done by performing a one-mode projection for one or both of the two sets of vertices.This approach has been quite successful in the investigation of many bipartite complex systems.However, one-mode projection implies a certain degree of information loss that might prevent, for example, a characterization involving information about direct relationships between nodes of the two sets.
In this paper, we investigate the bipartite network of credit relationships established between banks and firms traded at the stock exchanges and overthe-counter markets of Japan.Specifically, we aim to detect and characterize communities of banks and firms that were present in the Japanese credit market during the past years of the period of time from 1980 to 2011.Our working hypothesis is that the credit market is a networked market [1], i.e., a market where the credit relationships that are present between banks and firms are affected by attributes characterizing both banks and firms.
Community detection in large and dense bipartite networks has been considered in the past years by several authors [2][3][4][5] and it is still a topic of current research [6,7].As for unipartite networks, community detection in bipartite networks is performed by using different approaches and different fitness measures.One widely used fitness measure is the modularity [8], i.e., the measure of the fraction of links in the network connecting vertices of the same community minus the expected value of the same quantity in the corresponding configuration model.The modularity was introduced for unipartite networks in [8] and it was generalized and adapted to bipartite networks in [2][3][4][5].The algorithms based on the generalization to the bipartite case of the modularity [2][3][4][5] differ among them with respect to the type of generalization.They also differ with respect to the type of communities obtained.Specifically, in Guimera et al [2] only communities with nodes of the same type are obtained.This is also the case for the algorithms of Murata [4] and Suzuki and Wakita [5] although in their case a one-to-many correspondence of each community of a specific type of nodes can be obtained.
The algorithm of Barber [3] is the only one providing communities that are composed by nodes of both types and are providing a one-to-one correspondence between a group of nodes of one set and a group of nodes of the other set.In the present study, we are explicitly interested in investigating the oneto-one correspondence of groups of banks with related groups of firms.For this reason we have decided to use Barber's algorithm [9].
Several complex systems can be monitored over long periods of time.The analysis and modeling of these systems can be done by considering the network connections observed for the whole time period and/or by analyzing the network in successive time intervals as, for example, daily, weekly, monthly or yearly intervals.Here we investigate the bipartite network of credit relationships between banks and firms yearly from 1980 to 2011 by investigating 32 distinct credit networks.For each year we obtain the credit network and its community structure by using Barber's BRIM (bipartite recursively induced modules) algorithm.When the time evolving nature of networks is investigated, it is important to device methods and procedures that are able to track the time evolution of specific communities of the networks also in the presence of uncertainty related to the statistical nature of the community detection process.Here we propose a method which is able to track the time evolution of communities detected in networks obtained at successive periods of time.The method uses a statistical test which is robust with respect to the heterogeneity of the size of communities and therefore works both for large and small communities.
Finally, we characterize the communities obtained for different years in terms of the over-expression of attributes of banks and firms concerning (i) the regional location of firms, (ii) economic sectors of firms, and (iii) the types of banks.The statistical validation of the over-expressed attributes is done by using a method [10] using a multiple hypothesis test correction procedure.Our statistical validation procedure of the time evolution of communities allow us to track efficiently the evolution of the communities over time.With our approach we detect layers of networked credit relationships [1] that have been present in Japan for many years.These layers of credit relationships are characterized by specific types of banks, by firms located in the same or closely related geographical regions and by firms preferentially involved in specific economic sectors.
The paper is organized as follows.In Sect.II we briefly discuss our dataset.Sect.III discusses community detection in the bipartite network of the Japanese credit market.Sect.IV introduces a method used to track the time evolution of communities detected in networks and obtained for successive time periods.Sect.V presents the empirical results obtained in the characterization of the overexpression of attributes of banks and firms in each community over the years and in Sect.VI we draw our conclusions.

II. DATASET
Our dataset is based on a survey of firms quoted in the Japanese stock-exchange markets (Tokyo, Osaka, Nagoya, in the order of market size) and in Japanese over-the-counter (OTC) markets .The data were compiled from the firms' financial statements and survey by Nikkei Media Marketing, Inc. in Tokyo, and are commercially available [11].They include the information about each firm's borrowing obtained from financial institutions.Specifically, the dataset reports the amounts of borrowing and their classification into short-term and longterm borrowings.Long-term borrowing are considered all contracts exceeding 1 year.We examined the period 1980 to 2011, which is a time period of more than three decades.The analysis is performed yearly, and each yearly network is constructed from the dataset by using the financial statements of the considered calendar year.Since 1996 the dataset includes also OTC markets and/or JASDAQ (the present OTC market).In other studies firms of the over-the-counter market have been excluded to focus on publicly quoted firms.In the present study we investigate all firms which are present in the database.
The number of banks of the database changes year by year.It was 225 in 1980, remained approximately constant until 2001 and then decreased to 166 in 2011.The number of firms was first increasing from the value of 1414 in 1980 to the value of 3034 in 2006 and then decreasing to the value of 2706 in 2011.The number of firms increased from 1802 in 1995 to 2602 in 1996 in the presence of the first inclusion of the OTC firms in the database.During the same years the number of banks increased from 219 to 226.The density of links in the bipartite network defined as number of observed links over number of potential links was on average decreasing from the value of 0.0867 in 1980 to the vale of 0.0398 in 2011.The variation of the density of links was not too large during the first inclusion of the OTC firm.In fact the density of links decreased from 0.0721 to 0.0601 from 1995 to 1996.
The Japanese credit market has been previously analyzed by considering one-mode projected networks [12], an eigenvalue problem determined by the weight of the credit network [13], and, as in the present paper, in terms of communities detected directly on the bipartite network [14].
Concerning financial institutions, commercial banks are long-term, city, regional (primary and secondary), trust banks, insurance banks and government-related financial institutions including credit associations but excluding the Bank of Japan.We remark that failed banks are included until the year of failure, and that merger and acquisition of banks are processed consistently to identify surviving banks.For quoted firms, those who are active in each investigated calendar year are all included even if they failed later during the considered years.

III. COMMUNITY DETECTION IN BIPARTITE NETWORKS
In our bipartite network a link is present between bank i and firm j when a credit relation (short and/or long) is present between i and j.Links are described by a binary variable (just indicating the presence or absence or a credit relationship), i.e., in the present investigation the bipartite network is an unweighted network.
Community (cluster) detection in networks is a widely used approach used to discover empirical regularities present in a network that might be informative with respect to important aspects of the system such as its internal structure, robustness, resilience, etc. Community detection can be performed by using a series of different algorithms using different approaches and fitness measures [15].
The community detection algorithm used here is the bipartite, recursively induced modules (BRIM) algorithm, introduced in [3].It is a stochastic algorithm directly applied to the bipartite network.It uses the modularity of the bipartite network [8] as a fitness measure of the partitioning procedure.
In our analysis we have repeated the application of BRIM community detection algorithm a number of times for each year we investigate.Specifically, for each investigated year in each run we apply the algorithm 100 times and we perform 20 independent runs.
To quantitatively evaluate the differences which are present among the partitions obtained in the 20 independent runs performed for each calendar year, we evaluate the adjusted Rand index (ARI) [16] among all the pairs of partitions of the 20 runs.In Fig. 1 we show the mean value of the adjusted Rand index as a function of the calendar year.The mean value is computed for the set of 190 distinct pairs of partitions obtained from the 20 independent runs of the BRIM computed each calendar year.The error bars are one standard deviation.The adjusted Rand index is close to 0.55 from 1980 to 1995 and increases to approximately 0.8 in the time interval from 2000 to 2011.A value of the adjusted Rand index equals to one would indicate a perfect overlap of the two compared partitions whereas a value close to zero would indicate a random distribution of the nodes into the partitions.Therefore mean values ranging from 0.5 to 0.8 indicate that the different runs provide different partitions.However, the different partitions obtained retain a significant amount of nodes within the same clusters.Moreover the degree of overlap of the partitions obtained by independent runs increases in the second half of the investigated time period.
To provide an indication of the differences observed among the partitions obtained in independent runs, in Fig. 2 we show the time evolution of the average number of communities (red symbol) and its standard deviation obtained for each investigated year.In the figure, we also show the number of communities (blue square symbol) of the partition with the highest modularity for each year.The figure presents an overall gradual increases of the number of communities over time.The figure also shows the presence of an abrupt change of the average number of clusters that it is observed between 1995 and 1996.The reason for this abrupt change is that starting from 1996 the database is including OTC firms and therefore comprises a larger set of firms.It is worth noting that in spite of that the mean value of the adjusted Rand index (see Fig. 1) is not affected by the change of the size of the investigated system.

IV. TIME EVOLUTION OF COMMUNITIES
The communities detected by using the BRIM algorithm and discussed in Section III are obtained year by year.It is therefore of interest to properly put communities detected on a given year in relation with communities detected in the following year.A time evolution of the communities can be detected by considering what are the communities of year t+1 in which one detects an over-expressed amount of elements of a given community of year t.The community detection procedure has a certain degree of stochasticity and degeneracy with respect to small differences of the fitness measure and therefore the membership of an element into a certain community might also just be due to chance.We therefore need a method detecting over-expression of the same composition in communities of two successive years that is based on a carefully devised statistical procedure which is robust to the size heterogeneity of the different communities.
Hereafter, we propose such a method.Suppose that in period t there are N t communities C t i , i = 1, • • • , N t and in period t + 1 there are N t+1 other communities C t+1 j , j = 1, • • • , N t+1 .For all the C t i communities of period t we search amongst all N t+1 communities of period t + 1 which communities C t+1 j have an over-represented composition of elements also present in a community at time t. .Let us call N t,t+1 the number of distinct vertices in the two consecutive periods t and t + 1.The probability that n t,t+1 ij is observed by chance is given by the hypergeometric distribution H(n t,t+1 ij |N t,t+1 , n t i , n t+1 j ) where: Therefore for each pair of clusters we can compute a p-value After setting the appropriate p-value threshold p t , the above methodology gives us a way to select the communities in year t + 1 that are linked to a given community in year t in a statistically robust way.
To avoid the presence of false positive, the pvalue threshold must be corrected to take into account that we are performing a multiple hypothesis test comparison.Indeed, for each pair of consecutive periods we perform the test N t • N t+1 times against the null hypothesis of random distribution of elements among two partitions of communities of consecutive periods.Moreover we perform these tests for all pairs of consecutive years in our dataset, i.e., from 1980 to 2011.The most restrictive multiple hypothesis test correction is the Bonferroni correction, which prescribes that the modified p-value threshold p B is: In the present investigation we have set p t = 0.01.
In Fig. 3 we show a graphical representation of the interrelationships of communities that are statistically validated in successive years.The graphical representation is the time evolution of the biggest community of 1980 (labeled as 9 80).The size of each vertex is proportional to the logarithm of the size of the community.The statistical validation procedure shows that the largest community of year t evolves into the largest community of year t + 1 for all the investigated years.In addition to this primary channel of community evolution we also detect that in some years other smaller communities merge part of them into the largest one (this process is more pronounced during the years 2000, 2001 and 2002).For the sake of clarity, among the communities merging into the largest community, only communities at one year distance from the largest community of each year are shown in the figure .In the following section we will investigate the overexpression of the attributes characterizing the elements of the largest community observed in each calendar year.
In Fig. 4 we track the evolution of the second and the third largest communities of 1980 (labeled as 6 80 and 8 80).In this case the evolution of these communities presents three main branches shown in the figure as parallel evolving branches.However, splitting and coalescence of the branches are observed over time.In this figure we show only "forward" community evolution, i.e., we show all the validated relationships between communities shown at year t with communities at year t + 1 but, differently than in Fig. 3 we do not show validated relationships between communities at year t + 1 and communities at year t different from the one already shown in the figure.The additional incoming validated connections from other communities of the previous year are not shown to make the figure readable.As in Fig. 3, the size of each vertex symbol is proportional to the logarithm of the size of the community.The over-expression of attributes of elements belonging to the communities of the three main branches will be discussed in the following Section.

V. OVER-EXPRESSION OF ATTRIBUTES
The identification of bank-firm partitions and the statistical validation of their time evolution provides the basis for the understanding of the networked structure of the credit system and its time evolution.A further step is to look for information characterizing the obtained clusters and their time evolution.In other words it is important to characterize the clusters in terms of attributes over-expressed by the elements belonging to the same clusters with respect to an appropriate random null hypothesis.The method used is illustrated in [10].It should be noted that the null hypothesis takes into account the heterogeneity of the tested attributes and therefore the over-expressed attributes are not necessarily the most common ones in each cluster, but rather are those whose frequency in the cluster is overexpressed with respect to a null hypothesis taking into account the heterogeneity of attributes.In our analysis we account for multiple hypothesis test correction by using the Bonferroni correction.
The metadata available for the characterization of firms and banks allows us to identify the economic sector and the prefecture of the main office of firms and the type of bank.In Table I we summarize the over-expressed attributes observed for the time evolution of the largest community (see Fig. 3).In the Table we provide the calendar year, the number of banks in the cluster, the number of firms in the cluster, and the over-expressed (i) prefectures where firms are located, (ii) economic sectors of the firms and (iii) types of banks.We notice that the type of bank over-expressed in this cluster is the type labeled as "City banks" for the majority of the investigated years.These banks are large commercial banks operating in the entire country.The fact that the over-expression of "City banks" is not observed after 2005 does not mean that the role of City banks is no more present in those years.In fact also for those years we detect a significant number of City banks in the considered cluster.The reason why this bank category start to be not over-expressed lays in the fact that the number of "City banks" is declining over time (due to merging) and the validation procedure is conducted at the most severe level of multiple hypothesis test correction.In fact the Bonferroni threshold used to validate the over-expression is set to 0.01/R t where 0.01 is the univariate threshold and R t = (N S + N P + N B ) • N t is the total number of tests done in the statistical validation of communities of the year t.More specifically, N S is the number of distinct economic sectors, N P is the number of distinct Japanese prefectures, N B is the number of types of banks, and N t is the number of communities detected at year t.In this way, we minimize the number of false positive but unavoidably increase the number of false negative.
The over-expressed prefectures are the prefectures of Kanagawa (14) and Tokyo (13), i.e. two prefectures of the so-called greater Tokyo area.The Table also shows the over-expression of the main economic sectors of the firms belonging to the community.The over-expressed economic sectors are Electric and electronic equipment (EEE) for the time period 1980-1993, and Services (S) and Wholesale trade (WT) for the time period 1996-2011.
Table I shows two pronounced changes in the number of firms belonging to the main clusters.The main change (also observable in term of average number of clusters detected by the BRIM algorithm in Fig. 2) occurs in 1996 which is the first year of inclusion of firms traded in the OTC markets in the database.The second change is observed for the period 1999-2001.In fact, starting from 2000 the database reports credit information covering a fraction of the credit close to approximately 20-30% of the total credit being referred to as "unknown -other financial institutions".This form of credit involves approximately 300 firms both publicly quoted and traded in the OTC markets.In other words starting from 2000 a large number of firms of the database receive their credit from "unknown -other financial institutions".This set of firms makes a rather stable star-like cluster.Such a community is detected from the BRIM algorithm systematically since its first formation in 2000.Most probably the presence of this community and its stability is the main source of the increased mean value of the adjusted Rand index observed after 2000 in Fig. 1.
In Fig. 4 we have shown the time evolution of the second and third largest communities of 1980.In this case we observe an evolution of the communities that on average presents three main branches characterized by the over-expression of TABLE I: Summary of information about the largest cluster detected by the BRIM algorithm in each calendar year.In the table for each cluster we report the year, the number of banks, the number of firms, the over-expressed Japanese prefecture of firms (the information is provided in terms of the standard 2 digit code), the over-expressed economic sector, and the over-expressed bank type.According to the 2 digit prefecture code we have: 13 Tokyo and 14 Kanagawa.The over-expressed economic sectors are Electric and Electonic Equipments (EEE), Services (S), and Wholesale trade (WT).The over-expressed type of bank is "city banks" (CB).II).This second branch presents over-expression of firms of the Construction (C) economic sector and of the Regional banks (RB) and occasionally of the Second regional banks (SR).The geographical over-expression points out Japanese prefectures of Hiroshima and Fukuoka and of Tokyo in a few cases.
The third branch starts in 1985 and ends in 2011 (see the third column of Table II).In this last case the branch presents persistent over-expression of firms of the Railroad Transportation (RT) and Chemicals (Ch) sectors.An over-expression of banks classified as Life-insurance banks (LI) is observed after 1997.The geographical over-expression mainly involves the prefecture of Tokyo especially during the most recent years.
In summary we observe three distinct branches well characterized over time by a rather stable overexpressions of economic sector and type of banks.Also the over-expression of the regional location of firms, although less stable than the ones of the economic sector and of the type of bank, shows a high degree of persistence over time.The clusters of banks and firms detected by the BRIM are able to detect a networked nature of the Japanese credit market with a time scale of the dynamics of the communities covering several years.

VI. CONCLUSIONS
In our study we analyze the time evolution of the bank-firm credit relationships in Japan over a period of time longer than 30 years.The analysis is performed on the bank-firm credit network observed TABLE II: Summary of information about the evolution of a few large clusters detected by the BRIM algorithm in each calendar year.The evolution follows the scheme shown in Fig. 4. In the table for each community we report the code of the community (id year), the number of banks, the number of firms, the over-expressed Japanese prefecture of firms (the information is provided in terms of the standard 2 digit code), the over-expressed economic sector of firms, and the overexpressed bank type.The 2 digit prefecture code is the one of Japan's International Organization for Standardization, and it can be found online at the web page Prefectures of Japan in Wikipedia.The over-expressed economic sectors are Construction (C), Credit Leasing (CL), Chemicals (Ch), Electric and Electronic Equipments (EEE), Motor parts (MV), Railroad Transportation (RT), Sea Transportation (ST), Services (S), Utitilies (U) and Wholesale trade (WT).The overexpressed type of bank are "city banks" (CB), Life-insurance banks (LI), Regional banks (RB), Insurance banks (IB), and Second regional banks (SR).yearly.The bipartite network is analyzed yearly and the communities of banks and firms are characterized with respect to the over-expression of firms' economic sectors, firms' Japanese prefectures and types of banks.
In our study it was crucial to select a community detection algorithm directly working on the bipartite network that is providing communities composed by both types of vertices (banks and firms).The choice of a one-to-one correspondence between banks' partitions and firms' partitions also simplify our analysis of simultaneous over-expression of attributes of both banks and firms.With this approach we have been able to show the existence of layers of the credit market involving groups of firms char-acterized by specific economic sectors and regional locations (prefectures) and specific types of banks.These empirical observations show that the credit market in Japan is a networked market.
The robustness of our results is shown by the ability of our approach in detecting both the long term stability and the slow dynamics of the detected communities.The time evolving communities have been tracked from each year to the next one by using a newly introduced statistical method able to track the time evolution of communities detected in successive periods of time also in the presence of size heterogeneity of the communities.It is worth noting that our method presently used to track time evolution of communities can also be easily adapted to

FIG. 1 :
FIG. 1: Mean value of the Adjusted Rand Index (ARI) computed between all pairs of partitions obtained in the 20 independent runs of the BRIM algorithm for each calendar year.Error bars indicate one standard deviation.

FIG. 2 :
FIG. 2: Mean value (red circle symbols) of the number of clusters obtained by applying the BRIM algorithm to the bipartite credit system bank-firm for each calendar year of the time interval 1980-2011.The mean value is obtained by considering the mean value of the number of clusters observed in the partition of best modularity obtained performing 20 different independent runs of the algorithm using random initial conditions.Error bars indicate one standard deviation.The blue symbols indicate the number of clusters obtained in the partition of the best modularity among the 20 independent runs performed.

FIG. 3 :
FIG. 3: Graphical representation of the interrelationship of clusters detected in successive years.The figure shows only the statistical validations observed starting from the largest community of 1980 (labeled as 9 80) and considering validation between all pairs of clusters observed for each pair of successive years.The size of the vertex symbol is proportional to the logarithm of the size of the cluster.The figure shows that the statistical validation of the cluster composition clearly show that the largest cluster always evolves into the largest cluster of the successive year.

FIG. 4 :
FIG. 4: Graphical representation of the interrelationship of clusters detected in successive years.The figure shows only the statistical validations observed starting from the second and the third largest communities of 1980 (labeled as 6 80 and 8 80) and considering validation between all pairs of clusters observed for each pair of successive years.Only statistically validated directed connection from each cluster to the ones of the successive year are shown.The incoming validated connection from other clusters of the previous year are not shown to make the figure readable.The size of the vertex symbol is proportional to the logarithm of the size of the cluster.The figure shows that the statistical validation of the clusters presents the evolution of three main branches.One of these branches merges into the evolution of the largest in 2005 (see the evolution of 23 04 in 24 05).