Table 1.
An example of bitcoin’s dataset obtained using BitIodine software [21].
Table 2.
An example of ethereum’s dataset obtained from [23].
Table 3.
Periods when bitcoin and ethereum systems show significant and non-significant price changes.
Table 4.
Network properties calculated for bitcoin (BTC) and ethereum (ETH) networks for each period under analysis.
Fig 1.
Proportion of “noisy” users in transaction networks.
More than 50% of users in each period of ETH have degree less than 2, small transaction value and appear only once. In bitcoin vast majority are “noisy” users but surprisingly, in the period of Crypto Winter and after Crypto Bubble the proportion of “noisy”users is only 36-38%. “Noisy” users are removed for the further analysis.
Fig 2.
Correlation between all features in ETH are shown—It can be seen that total degree and total value are redundant features as they are always highly correlated with others.
In later periods, value in and value out also show very high correlation.
Fig 3.
Correlation between all features in BTC are shown—It can be seen that in, out and total degrees are always highly correlated, same as in, out and total value.
Fig 4.
The algorithmic steps to classify users into behavioral groups.
Due to the complexity of the feature sets, we found that solely applying unsupervised method is insufficient. Our approach is to first group the set of features Ac that k-means clustering has good performance. From which, we obtain the set of labels Vc. Subsequently, we use the clustered output as training data for the more advanced supervised algorithm SVM. The final step is to employ the trained SVM model as classifier for the rest of the feature sets A1, A2, …Am to obtain their corresponding labels V1, V2, …Vm.
Fig 5.
Number of clusters in ETH data found using the elbow method—for all periods, 4 appeared to be the optimal.
Fig 6.
Number of clusters in BTC data using the elbow method—for local events, optimal number is 4.
However, for global events there is no “sharp” elbow.
Table 5.
Distinct properties of the majority of nodes in each cluster.
Fig 7.
Silhouette coefficient calculated for filtered ethereum data set.
For local periods all data points have positive scores, while small number of data points for global events are not well matched with their cluster (have negative score).
Fig 8.
Silhouette coefficient calculated for filtered bitcoin data set.
In all periods we see amount of data points with negative score and average silhouette coefficient for all is 0.4.
Table 6.
The percentage of misclassified points shown in comparison with various feature sets used.
Fig 9.
The percentage of users in different groups in the in bitcoin and ethereum during different periods of systems’ evolution.