Fig 1.
Each frame is first passed through a ConvNet, called Inception V3 (“Incept. V3”), that reduces dimensionality by extracting high-level image features [36]. The features from 125 successive video frames are then given as an input to an RNN. The RNN is composed of LSTM units with the capacity to analyze temporal information across frames. The RNN outputs the movement deficit score for each video. After the network is trained, information is extracted from the network weights in order to identify image features and the parts of each video frame that were most predictive of the network score (red arrows). Network code is available at github.com/hardeepsryait/behaviour_net, and weights of trained model are available at http://people.uleth.ca/~luczak/BehavNet/g04-features.hdf5. See Methods for details. ConvNet, convolutional network; LSTM, long short-term memory; RNN, recurrent neural network.
Fig 2.
Automated scoring of movement deficits in the SPRT.
(A) Video frames showing selected movement elements in the task. (B) Scatterplot of corresponding network and expert scores. Each circle denotes averaged score for a single rat. Note that stroke (red) versus control (black) could be separated along the network score (y-axis) but not along x-axis corresponding to the expert scores. (C) Scatterplot of stroke volume and corresponding scores by the network (blue) and human expert (yellow). The distribution of blue points closer to the identity line (dashed) indicates that network scores are more strongly correlated with stroke lesion volume than were the expert scores. Inset shows a representative histological image from a rat with a lesion (infarct area outlined; extent of M1 and M2 is denoted by lines in the intact hemisphere). Lesion volume and movement scores were normalized between 0 and 1 in order to directly compare both scores. Each dot represents the average score for one rat, and solid lines show linear regressions (blue: network score; yellow: expert score). The distribution of blue dots closer to the identity line (dashed) shows that the network scores better predict lesion volume in this dataset. The sample network and data on which this figure is based are available at github.com/hardeepsryait/behaviour_net. M1, primary motor area; M2, secondary motor area; SPRT, single-pellet reaching task.
Fig 3.
The network can learn to detect individual movement components with human-level accuracy.
(A) The relation between network scores and expert scores for individual movement components. Each dot represents the average score for one rat, and dashed lines show linear regression. Network scores were significantly correlated with expert scores for almost all movement elements. (B, C) To directly test whether the network could learn to discriminate movement components in action clips, we retrained the network on video segments with labeled movement elements. Panels B and C show the probability (“prob.”) of detecting a particular movement element in a video clip. For visualization, video segments are aligned with respect to the beginning of a reaching movement. The high similarity between timing of movements defined by the expert (B) and the network (C) shows that the network can be used for automated segmentation of behavioral videos to identify specific movements. The sample network and data on which this figure is based are available at github.com/hardeepsryait/behaviour_net. Pron., pronation; Sup., supination.
Fig 4.
Extracting knowledge from the network to identify the movement elements most predictive of stroke severity.
(A) Representation of video frames transformed into the internal feature space of the network (see Methods). Each point represents a single video fame. Blue points represent video frames from a single rat during trials obtained on the day before stroke. Red points represent video frames from trials obtained after stroke for the same rat. Blue and red ellipses outline distributions of points before and after the stroke, respectively. Note the disparity between distributions. For example, eating with both hands (Aa) was only observed before the stroke, whereas reaching for the food pellet with the mouth (Ab) was only observed after the stroke. Panels Ac and Ad illustrate the parts of frames in Aa and Ab that the network evaluated as being most important for its scoring decision. (B) Ellipses outline the distribution of points before the stroke (blue) and on day after the stroke (red) for each rat. Close overlap of the red ellipses indicates that features predictive of stroke found by the network were consistent across rats. The sample network and data on which this figure is based are available at github.com/hardeepsryait/behaviour_net. PC, principal component.
Fig 5.
The clustering of the network feature space revealed movement elements specific only to the stroke or the control condition.
(Top insert) Blue and red ellipses outline the distribution of points in feature space of the network before and after the stroke, respectively (the same as in Fig 4A). Black ellipses outline subclusters corresponding to individual movement subcomponents. For visualization clarity, only 10 subclusters out of 40 are shown. Dashed ellipses indicate clusters most selective for the stroke and the control categories and arrows point to sample frames from those clusters. Note that clustering was done using the first seven PCs of the network features; thus, subclusters appear to overlap in this 2D projection. (Main panel) Each point represents cluster selectivity by expressing the fraction of frames from stroke versus control rats in each subcluster (see Results). Labels below denote the movement category assigned to subclusters, and images above show representative frames from corresponding subclusters. Points in black denote a “not clear” clusters category. The bottom insert shows the average cluster selectivity index (“avr clust select. index”) for each movement category. Error bars denote standard deviation. The sample network and data on which this figure is based are available at github.com/hardeepsryait/behaviour_net. adv, advance; PC, principal component; pron, pronation; sup, supination.
Fig 6.
Quantifying changes in individual movement components during stroke recovery using the internal network representation.
(A) Representation of video frames in the internal feature space of the network, as in Fig 4A, but with added points from day 15 after stroke (light blue). Note that points on day 15 shift toward prestroke (dark blue) values, indicating movement recovery. (B) Ellipses outlining the distribution of points before stroke and for all filming days after stroke. Note the gradual shift of the poststroke distributions toward prestroke space. Dashed ellipses illustrate sample subclusters representing single movement components. (Ba and Bb) Probability of points falling within a given subcluster across days. For example, the high red bar in Ba shows that this movement component was mostly present on day 1 poststroke. The sample network and data on which this figure is based are available at github.com/hardeepsryait/behaviour_net. Movement comp. prob., movement component probability; PC, principal component.
Fig 7.
Movement trajectories encoded by the internal network representation are more variable after stroke.
(A) Movement trajectories for the three most similar trials before stroke (blue shades) and the three most similar trials after stroke (red shades) for the same rat. Coordinates correspond to the first three PCs of the internal network representation. (B) Distribution of correlation coefficients (“corr coef”) between pairs of trajectories for the day before stroke (blue) and the day after stroke (red). The sample network and data on which this figure is based are available at github.com/hardeepsryait/behaviour_net. PC, principal component.
Fig 8.
A network trained only to classify videos as stroke versus control derived a continuous expert-like score.
(A) Neural network scores versus group category used for the training. Each circle denotes averaged score for a single rat (stroke [“Str”] = red, control [“Contr”] = black). (B) Relation between that network scores and the expert scores. The regression line is shown in yellow. (C) Network scores were also predictive of stroke volume, even though this information and human-based scores were made available to the network. Stroke volume and movement scores were normalized between 0 and 1 in order to directly compare both scores. The sample network and data on which this figure is based are available at github.com/hardeepsryait/behaviour_net.
Fig 9.
A network trained to predict stroke size discovered the same most informative movement features as the network trained to predict expert scores.
(A) Network predictions of stroke lesion volume (normalized [“Norm.”] between 0 and 1). The line shows linear regression. (B) Importance of movement features as determined by the network trained on stroke size (y-axis) and the network trained on expert scores (x-axis). Each point represents one of 2,048 features from the output of the ConvNet (Fig 1). (C) Representation of video frames in internal feature space of the network trained to predict stroke volume (see Fig 4A for description). Green and black points correspond to frames identified in previous analyses (see Fig 5) as belonging to reaching with the mouth and eating with both hands (outlined with dashed ellipses). The similar location of those clusters to the corresponding ones in Fig 5 exemplifies the discovery of similar feature importance by both networks. The sample network and data on which this figure is based are available at github.com/hardeepsryait/behaviour_net. ConvNet, convolutional network; PC, principal component.
Fig 10.
Network validation on different behavioral tasks.
(A) Sample frame showing a rat on the parallel-beam-walking task. Note the mirror below the rat is showing an additional view of paw placement. (B) Relationship between expert and network scoring. Each dot represents the average score for a single rat. (C) Representation of video frames transformed into the internal feature space of the network. Each point represents a single video fame. Blue points represent video frames from control rats and red points from stroke rats. For picture clarity, only 20% of randomly selected points are shown. Long arrows point to sample video frames from control and stroke groups. Short arrows point to “attention” maps superimposed on frames: parts of frames most informative for network decision (marked in lighter colors). It indicates that similar to experts, the network uses foot slips to score stroke deficits (micro symptoms), but it also discovered that body posture and/or speed (macro symptoms) improves scoring. The sample network on which this figure is based is available at github.com/hardeepsryait/behaviour_net. a.u., arbitrary units; PC, principal component.