Deep learning for behaviour classification in a preclinical brain injury model

doi:10.1371/journal.pone.0268962

Fig 1.

Representative scheme of the experimental behaviour data acquired for the study.

Briefly, animals were subjected to a controlled cortical injury via a craniotomy to expose the cortical brain (CCI- TBI injured group), a craniotomy alone without cortical injury (Sham craniotomy group), all under aseptic surgery and anaesthesia with analgesic support. All animals were implanted with a RFID chip. Naive animals were implanted with RFID under a brief anaesthesia, with no other interventions. Animals were grouped housed in their home cages, which were placed on top of the baseplate RFID reader and exposed to a side video camera from the Home Cage Analyser (ActualHCA™) system. Non-disturbance recordings were set up as 15 min timeframes and throughout 3 days periods per week, during 5 weeks post-intervention. Naive data was recorded only for a single week as a baseline data. Automated data acquired via the HCA systems included body temperature per individual animal (ºC), distance travelled per individual animal (mm), number of transitions between RFID set up detectors (Nm) and average separation between animals (mm). This time-series raw data was then pre-processed and utilised to train our machine learning models.

More »

Expand

Fig 2.

Representative graphs showing the data acquired from the Home Cage Analyser (ActualHCA™) system, for the TBI-Injured, Sham-craniotomy and Naive animals.

Parameters analysed include the distance travelled per animal (mm), number of transitions –as spatial transitions between the different RFID sensors, the distance between animals grouped housed in the same cage (mm) and the body temperature between individual animal (ºC). Data is shown as average ±SEM from the grouped animals per cage, plotted from the individual RFID recordings from each animal. Fig 2A–2C show data for the TBI-Injured and the sham-craniotomy groups, recorded during Week 1, Week 3 and Week 5 post-intervention, respectively. Fig 2D shows the recordings acquired for Naive animals, as a one-week baseline reference. Data shown covers a larger study cohort of 35 animals, including n = 16 as naive, n = 11 for the TBI group and n = 8 for the craniotomy group, including the cohorts used deep-learning modelling, as part of a larger ongoing recording project. All raw data used for ML analysis is provided as S1 File.

More »

Expand

Table 1.

Outline of class distribution in the dataset.

More »

Expand

Fig 3.

Architecture of the proposed deep neural network, with input shape and number of channels shown below the input, number of neurons below the dense layer.

The features of time-dependent behavioural signals are captured via two convolutional-1D layers and a LSTM layer. CW transformed frequency domain signals are simultaneously processed by two convolutional-2D layers. The features extracted by both branches are then concatenated and sent through a final fully connected layer and a softmax function to return the probability of each class. The number of filters in each convolutional layer and units in LSTM and dense layers are tuned heuristically. The model is trained to minimize the sparse categorical crossentropy loss between the target class and the predicted class.

More »

Expand

Fig 4.

Performance metrics of the different machine learning techniques on LOOCV.

Accuracy indicates the percentage of correct predictions. Precision is a measure of quality that indicates the percentage of relevant samples among selected samples. Recall is a measure of quantity that reflects the percentage of the relevant samples that were actually selected. F1-score is identified as a more sensible measure of accuracy of a model that is calculated using precision and recall. Higher the better for all metrics.

More »

Expand

Table 2.

Performances for machine learning models using leave-one-out cross validation.

More »

Expand

Fig 5.

Confusion matrices obtained from predicted labels by performing Leave-one-out cross validation on the deep learning and machine learning models.

Each confusion matrix reveals the performance of different ML algorithms on the same test set. It reflects the accuracy of an algorithm in predicting each class. The samples that fall in the main diagonal are correct predictions while off-diagonal instances are incorrect predictions. The proposed deep neural network can distinguish TBI and sham classes at an acceptable accuracy while naive class is difficult to capture.

More »

Expand

Fig 6.

Confusion matrices of machine learning models trained on data oversampled by SMOTE and data trained with class weights given to the loss function.

Because the naive class is oversampled using SMOTE, new data points are generated by interpolating between existing observations. As reflected in the confusion matrices, such an action results in ML models overfitting to the oversampled class, exhibiting an inflated accuracy on that class. The present oversampling approach has only a little effect on the performance on other two classes.

More »

Expand

Fig 7.

Principal component analysis plot of the dataset with two components.

Two clusters, injured and sham have started forming although there is some overlap between the two groups. Naive data currently overlaps with other two major classes, however, more data may provide a better visual representation of the location of the naive cluster. Overall, it is not straightforward to distinguish the three experimental groups on the PCA map, indicating that the classification task is challenging.

More »

Expand