The authors have declared that no competing interests exist.
Analyzed the data: DK MP. Wrote the paper: DK MP IC GV. Conceived the study: DK IC GV MP.
The possibility to analyze everyday monetary transactions is limited by the scarcity of available data, as this kind of information is usually considered highly sensitive. Present econophysics models are usually employed on presumed random networks of interacting agents, and only some macroscopic properties (e.g. the resulting wealth distribution) are compared to real-world data. In this paper, we analyze Bitcoin, which is a novel digital currency system, where the complete list of transactions is publicly available. Using this dataset, we reconstruct the network of transactions and extract the time and amount of each payment. We analyze the structure of the transaction network by measuring network characteristics over time, such as the degree distribution, degree correlations and clustering. We find that linear preferential attachment drives the growth of the network. We also study the dynamics taking place on the transaction network, i.e. the flow of money. We measure temporal patterns and the wealth accumulation. Investigating the microscopic statistics of money movement, we find that sublinear preferential attachment governs the evolution of the wealth distribution. We report a scaling law between the degree and wealth associated to individual nodes.
In the past two decades, network science has successfully contributed to many diverse scientific fields. Indeed, many complex systems can be represented as networks, ranging from biochemical systems, through the Internet and the World Wide Web, to various social systems
Bitcoin is a decentralized digital cash system, there is no single overseeing authority
The Bitcoin system was proposed in 2008 by Satoshi Nakamoto, and the system went online in January 2009
Number of addresses with nonzero balance (green), addresses in participating in at least one transaction in one week intervals (red) and the exchange price of bitcoins in US dollars according to MtGox, the largest Bitcoin exchange site (blue). The black lines are exponential functions bounding the growth of the network size; the characteristic times are
We download the complete list of transactions, and reconstruct the transaction network: each node represents a Bitcoin address, and we draw a directed link between two nodes if there was at least one transaction between the corresponding addresses. In addition to the topology, we also obtain the time and amount of every payment. Therefore, we are able to analyze both the evolution of the network and the dynamical process taking place on it, i.e. the flow and accumulation of bitcoins. To characterize the underlying network, we investigate the evolution of basic network characteristics over time, such as the degree distribution, degree correlations and clustering. Concerning the dynamics, we measure the wealth statistics and the temporal patterns of transactions. To explain the observed degree and wealth distribution, we measure the microscopic growth statistics of the system. We provide evidence that preferential attachment is an important factor shaping these distributions. Preferential attachment is often referred to as the “rich get richer” scheme, meaning that hubs grow faster than low-degree nodes. In the case of Bitcoin, this is more than an analogy: we find that the wealth of already rich nodes increases faster than the wealth of nodes with low balance; furthermore, we find positive correlation between the wealth and the degree of a node.
Bitcoin is an evolving network: new nodes are added by creating new Bitcoin addresses, and links are created if there is a transaction between two previously unconnected addresses. The number of nodes steadily grows over time with some fluctuations; especially noticeable is the large peak which coincides with the first boom in the exchange rate in 2011 (
We first measure the degree distribution of the network. We find that both the in- and the outdegree distributions are highly heterogeneous, and they can be modeled with power-laws
Since the beginning of 2011, the shape of the distribution does not change significantly. The black line shows a fitted power-law for the final network; the exponent is
The black line shows a fitted power-law for the final network; the exponent is
To further characterize the evolution of the degree distributions we calculate the corresponding Gini coefficients in function of time (
We observe the distinct initial phase lasting until mid-2011. The trading phase is characterized by approximately constant coefficients.
In the Bitcoin network we find that in the initial phase the Gini coefficient of the indegree distribution is close to 1 and for the outdegree distribution it is much lower. We speculate that in this phase a few users collected bitcoins, and without the possibility to trade, they stored them on a single address. In the second phase the coefficients quickly converge to
To characterize the degree correlations we measure the Pearson correlation coefficient of the out- and indegrees of connected node pairs:
Here
We find that the correlation coefficient is negative, except for only a brief period in the initial phase. After mid-2010, the degree correlation coefficient stays between
After the initial phase, both measures reach a stationary value.
In networks without degree correlations, the degree of connected nodes do not depend on each other, therefore for such networks we expect that
We also measure the average clustering coefficient.
In the initial phase
To explain the observed broad degree distribution, we turn to the microscopic statistics of link formation. Most real complex networks exhibit distributions that can be approximated by power-laws. Preferential attachment was introduced as a possible mechanism to explain the prevalence of this property
Evaluating our method for indegree distribution of the Bitcoin network, we find good correspondence between the empirical data and the presumed conditional probability function; the exponent giving the best fit is
The cumulative distribution function of the
In the this section, we analyze the detailed dynamics of money flow on the transaction network. The increasing availability of digital traces of human behavior revealed that various human activities, e.g. mobility patterns, phone calls or email communication, are often characterized by heterogeneity
The state of node
We first investigate the temporal patterns of the system by measuring the distribution of inactivity times
We observe a power-law distribution close to the widely observed
It is well known that the wealth distribution of society is heterogeneous; the often cited –and quantitatively not precise–80–20 rule of Pareto states that the top 20% of the population controls 80% of the total wealth. In line with this, we find that the wealth distribution in the Bitcoin system is also highly heterogeneous. The proper Pareto-like statement for the Bitcoin system would be that the 6.28% of the addresses posesses the 93.72% of the total wealth. We measure the distribution of balances at different points of time, and we find a stable distribution. The tail of wealth distribution is generally modeled with a power-law
The distributions are shifted by arbitrary factors along the vertical axis for better visibility of the separate lines. The black lines are stretched exponential and power-law fits of the last empirical distribution. The tail can be approximated by a power-law with exponent
To further investigate the evolution of the wealth distribution we measure the Gini coefficient over time. We find that the distribution is characterized by high values throughout the whole lifetime of the network, reaching a stationary value around
To understand the origin of this heterogeneity, we turn to the microscopic statistics of acquiring bitcoins. Similarly to the case of degree distributions, the observed heterogeneous wealth distributions are often explained by preferential attachment. Moreover, preferential attachment was proposed significantly earlier in the context of wealth distributions than complex networks
To find evidence supporting this hypothesis, we first investigate the change of balances in fixed time windows. We calculate the difference between the balance of each address at the end and at the start of each month. We plot the differences in function of the starting balances (
Increase (top) and decrease (bottom, vertical axis is inverted) of node balances in one month windows as a function of their balance at the beginning of each month. We show the raw data (red), the average (green), median (blue) and logarithmic average (magenta). The later three are calculated for logarithmically sized bins. We find a clear positive correlation: addresses with high balance typically increase their wealth more than addresses with low balance. The median and the logarithmic average values almost coincide, which suggests multiplicative fluctuations. The median and the logarithmic average increase approximately as power-laws for several orders of magnitude. The black line is a power-law fit for the double logarithmic data; the exponent is
To better quantify the preferential attachment, we carry out a similar analysis to the previous section. However, there is a technical difference: in the case of the evolution of the transaction network, for each event the degree of a node increases by exactly one. In the case of the wealth distribution there is no such constraint. To overcome this difficulty we consider the increment of a node’s balance by one unit as an event, e.g. if after a transaction
The cumulative distribution function of the
We have investigated the evolution of both the transaction network and the wealth distribution separately. However, it is clear that the two processes are not independent. To study the connection between the two, we measure the correlation between the indegree and balance associated to the individual nodes. We plot the average balance of addresses as a function of their degrees on
We calculate the averages for logarithmically sized bins. We find strong correlation between the balance and the indegree of individual nodes. The main plot shows indegree values up to
Bitcoin is based on a peer-to-peer network of users connected through the Internet, where each node stores the list of previous transactions and validates new transactions based on a proof-of-work system. Users announce new transactions on this network, these transactions are formed into
Here we have four input (I
An important aspect of Bitcoin is how new bitcoins are created, and how new users can acquire bitcoins. New bitcoins are generated when a new block is formed as a reward to the users participating in block generation. The generation of a valid new block involves solving a reverse hash problem, whose difficulty can be set in a wide range. Participating in block generation is referred to as
Due to the nature of the system, the record of all previous transactions since its beginning are publicly available to anyone participating in the Bitcoin network. From these records, one can recover the sending and receiving addresses, the sum involved and the approximate time of the transaction. Such detailed information is rarely available in financial systems, making the Bitcoin network a valuable source of empirical data involving monetary transactions. Of course, there are shortcomings: only the addresses involved in the transactions are revealed, not the users themselves. While providing complete anonymity is not among the stated goals of the Bitcoin project
Another issue arises not only for Bitcoin, but for most online social datasets: It is hard to determine which observed phenomena are specific to the system, and which results are general. We do not know to what extent the group of people using the system can be considered as a representative sample of the society. In the case of Bitcoin for example, due to the perceived anonymity of the system, it is widely used for commerce of illegal items and substances
We installed the open-source bitcoind client and downloaded the blockchain from the peer-to-peer network on May 7th, 2013. We modified the client to extract the list of all transactions in a human-readable format. We downloaded more precise timestamps of transactions from the blockchain.info website’s archive. The data and the source code of the modified client program is available at the project’s website
The data includes 235,000 blocks, which contain a total of 17,354,797 transactions. This dataset includes 13,086,528 addresses (i.e. addresses appearing in at least one transaction); of these, 1,616,317 addresses were active in the last month. The Bitcoin network itself does not store balances associated with addresses, these can be calculated from the sum of received and sent bitcoins for each address; preventing overspending is done by requiring that the input of a transaction corresponds to the output of a previous transaction. Using this method, we found that approximately one million addresses had nonzero balance at the time of our analysis.
We have preformed detailed analysis of Bitcoin, a novel digital currency system. A key difference from traditional currencies handled by banks is the open nature of the Bitcoin: each transactions is publicly announced, providing unprecedented opportunity to study monetary transactions of individuals. We have downloaded and compiled the complete list of transactions, and we have extracted the time and amount of each payment. We have studied the structure and evolution of the transaction network, and we have investigated the dynamics taking place on the network, i.e. the flow of bitcoins.
Measuring basic network characteristics in function of time, we have identified two distinct phases in the lifetime of the system: (i) When the system was new, no businesses accepted bitcoins as a form of payment, therefore Bitcoin was more of an experiment than a real currency. This initial phase is characterized by large fluctuations in network characteristics, heterogeneous indegree- and homogeneous outdegree distribution. (ii) Later Bitcoin received wider public attention, the increasing number of users attracted services, and the system started to function as a real currency. This trading phase is characterized by stable network measures, dissasortative degree correlations and power-law in- and outdegree distributions. We have measured the microscopic link formation statistics, finding that linear preferential attachment drives the growth of the network.
To study the accumulation of bitcoins we have measured the wealth distribution at different points in time. We have found that this distribution is highly heterogeneous through out the lifetime of the system, and it converges to a stable stretched exponential distribution in the trading phase. We have found that sublinear preferential attachment drives the accumulation of wealth. Investigating the correlation between the wealth distribution and network topology, we have identified a scaling relation between the degree and wealth associated to individual nodes, implying that the ability to attract new connections and to gain wealth is fundamentally related.
We believe that the data presented in this paper has great potential to be used for evaluating and refining econophysics models, as not only the bulk properties, but also the microscopic statistics can be readily tested. To this end, we make all the data used in this paper available online to the scientific community in easily accessible formats
The authors thank András Bodor and Philipp Hövel for many useful discussions and suggestions.