Classifying sex and strain from mouse ultrasonic vocalizations using deep learning

doi:10.1371/journal.pcbi.1007918

Classifying sex and strain from mouse ultrasonic vocalizations using deep learning

Fig 4

Features alone are insufficient to explain the DNN classification performance.

A Features of individual vocalizations can also be measured using dedicated convolutional DNNs, one per feature, with identical architecture as for sex classification (see Fig 3A). B-E Classification performance for different properties was robust, ranging between 57.0 and 82.0% on average (maroon) and depending on the individual value of each property (red). We trained networks for direction ({-1,0,1}, B), the number of breaks ({0–3}, C), the number of peaks ({0–3}, D) and the degree of broadband activation ([0,1], E). For the other 2 properties (complex and tremolo), most values were close to 0 and thus networks did not have sufficient training data for these. The light gray lines indicate chance performance, which depends on the number of choices for each property. The light blue bars indicate the distributions of values, also in %. F Using a non-convolutional DNN, we investigated how predictable features alone would be, i.e. without any information about the precise spectral structure of each vocalization. G Prediction performance was above chance (maroon, 59.6±3.0%) but less than the prediction of sex on the basis of the raw spectrograms (see Fig 3). The gray line indicates chance performance. H Feature-based prediction of sex with DNNs performed similarly compared to ridge regresson (blue) and SVM (red, see main text for statistics). I Duration, volume and the level of broadband activation were the most significant linear predictors for sex, when using ridge regression. J Using a semi-convolutional DNN, we investigated the combined predictability of the same features as above, plus 3 statistics of the stimulus (each a vector), i.e. the marginal of the spectrogram in time and frequency, as well as the spectral line, i.e. the sequence of frequencies of maximal amplitude per time-bin. K The average performance of the semi-convolutional DNN (64.5%) stays substantially lower than the 2D cDNN (see Fig 3D). USVs of both sexes were predicted with similar accuracy. L The average performance of the semi-convolutional DNN is not significantly larger than ridge regression (61.9%) or SVM (62.7%) on the same data, due to the large variability across the sexes (see Panel K).

doi: https://doi.org/10.1371/journal.pcbi.1007918.g004