Clustering-independent analysis of genomic data using spectral simplicial theory

doi:10.1371/journal.pcbi.1007509

Clustering-independent analysis of genomic data using spectral simplicial theory

Fig 5

Feature selection on the MNIST dataset using the combinatorial Laplacian score.

Each sample consists of a grey-scale image of a hand-written digit from 0 to 9 and each pixel represents a feature. The degree of localization of each feature in the simplicial complex is assessed using the 0- and 1-dimensional combinatorial Laplacian scores. (a) Number of rejected null hypothesis at a FDR of 0.05 as a function of the radius ε of the balls in the Vietoris-Rips complex. In this example, the statistical power of and is maximized for ε~0.7. The significance of the scores was determined through a permutation test were the pixels were randomized 5,000 times. (b) Vietoris-Rips complex colored by the intensity of a pixel that is significant under (q-value < 0.005) but not under (q-value = 0.5). The intensity of the pixel is high in a densely connected, topologically trivial region of the complex containing images of the digit `4`. Images associated to several nodes are shown for reference, with the pixel highlighted in red. (c) Vietoris-Rips complex colored by the intensity of a pixel that is significant under both (q-value < 0.005) and (q-value < 0.005). The intensity of the pixel is high in a densely connected region that surrounds a large non-contractible cycle of the simplicial complex. The cycle is generated by images that belong to the sequence of digits `7`, `3`, `1`, `9`. Images associated to several nodes along the cycle are shown for reference, with the pixel highlighted in red.

doi: https://doi.org/10.1371/journal.pcbi.1007509.g005