Fig 1.
Distribution of objects in the space spanned by the eigenvectors of combinatorial Laplacian corresponding to some of the lowest eigenvalues (no. 1490 and 1488)—TWT.PL dataset: In two corners there are two objects, while the rest is located in the third corner (mass concentration).
The positions of datapoints are slightly blurred so that the mass concentration is visible.
Fig 2.
Spectral normalization in the Combinatorial Laplacian Relative Lambda Method method.
The TWT.PL dataset.
Fig 3.
Spectral normalization in the Combinatorial Laplacian Sample Size Adjusted Lambda Method method and Combinatorial Laplacian Sample Size Adjusted Maximum Lambda Method method.
The TWT.PL dataset.
Fig 4.
Spectral normalization in the Normalized Laplacian Method method The TWT.PL dataset.
Fig 5.
The artificial data set BLK.4_0.2_0.5—adjacency matrix for documents.
Table 1.
Error percentage for natural classification.
Column names: datasets, row names: GSC methods considered.
Table 2.
F1 score for natural classification.
Column names: datasets, row names: GSC methods considered.
Table 3.
Error percentage for natural classification.
Column names: datasets, row names: GSC methods considered.
Table 4.
Error percentage for natural classification.
Column names: datasets, row names: GSC methods considered.
Table 5.
F1 score for natural classification.
Column names: datasets, row names: GSC methods considered.
Table 6.
F1 score for natural classification.
Column names: datasets, row names: GSC methods considered.
Table 7.
Error percentage for eigenvector based classification.
Column names: datasets, row names: GSC methods considered.
Table 8.
F1 measure for eigenvector based classification.
Column names: datasets, row names: GSC methods considered.
Table 9.
Error percentage for eigenvector based classification.
Column names: datasets, row names: GSC methods considered.
Table 10.
F1 measure for eigenvector based classification.
Column names: datasets, row names: GSC methods considered.
Table 11.
Error percentage for eigenvector based classification.
Column names: datasets, row names: GSC methods considered.
Table 12.
F1 measure for eigenvector based classification.
Column names: datasets, row names: GSC methods considered.
Fig 6.
Mass of the element with the largest mass in the eigenvector.
Eigenvectors are ordered by decreasing eigenvalue. Top figure: the entire spectrum. Bottom figure: only the 50 eigenvectors corresponding to 50 lowest eigenvalues. English Twitter data TWT.EN.
Fig 7.
Number of highest mass elements of eigenvectors constituting half mass of the eigenvector.
English Twitter data TWT.EN. Top: all eigenvectors. Bottom: 50 eigenvectors with the lowest eigenvalue.
Fig 8.
Relative error among the elements of eigenvector constituting its halfmass (standard error divided by the mean).
English Twitter data TWT.EN. Top: all eigenvectors. Bottom: 50 eigenvectors with the lowest eigenvalue.
Table 13.
Original cardinalities of groups in the dataset ANO.8.
Table 14.
Cluster membership confusion matrix for ANO.8 after removing small groups.
Rows represent TRUTH, columns represent PREDICTION.
Table 15.
Cluster membership confusion matrix for ANO.8 after removing small groups, but retaining the 3rd group.
Rows represent TRUTH, columns represent PREDICTION.
Table 16.
Original cardinalities of selected groups in the dataset ANO.26.
Table 17.
Cluster membership confusion matrix for ANO.26 after removing small groups.
Rows represent TRUTH, columns represent PREDICTION.
Table 18.
Cluster membership confusion matrix for ANO.26 after removing small groups and the group #15.
Rows represent TRUTH, columns represent PREDICTION.
Fig 9.
Eigenvalue distributions for the entire dataset (the black dots) and for the classes (lines with different colors) for combinatorial (top) and normalized (bottom) graph Laplacian.
English Twitter data TWT.EN. On the bottom, ten lowest eigenvalues were omitted for better readability.
Fig 10.
Combinatorial Laplacian of the entire TWT.PL data set (thick line) and of each of the classes.
Fig 11.
Combinatorial Laplacian spectrum of the class #pizgaczhell of TWT.PL data set and of samples of size 25%, 50% and 75%.
Fig 12.
Combinatorial Laplacian spectrum of the class #pizgaczhell of TWT.PL data set and several samples of size 50%.
Fig 13.
Combinatorial Laplacian spectrum of the class #pizgaczhell of TWT.PL data set and several samples of same size (size of the smallest class) from different classes.
Fig 14.
Effects of similarity perturbation within a class of data.
Upper line represents the maximal quotient and the lower line represents the minimal quotient of eigenvalues after and before perturbing the similarity matrix, as described in the text, depending on the maximal similarity decrease factor. ANO.8 dataset.
Table 19.
Error percentage and F1 for eigenvalue based classification.
Column names: datasets, row names: GSC methods considered.