Fig 1.
Automated cell and cluster detection in MADM brains.
(A) Confocal micrograph of a sagittal section from a month-old Nestin-cre, MADM-11 (MADM) mouse forebrain (left) and the corresponding annotated map from the Allen Brain Atlas (right; Image credit: Allen Institute). Scale bar, 1500 μm. GFP–green fluorescent protein, RFP–red fluorescent protein, DAPI– 4’,6-diamidino-2-phenylindole. (B) Three isolated neurons captured in the MADM brain where green (enhanced GFP), red (tdTomato) and yellow (both reporters expressed) cells are derived from distinct clones of progenitors earlier during development [8,10,12,13]. Scale bars, 40 μm. (C) A representative coronal MADM section with only a single clone of cells labeled and later imaged by a slide scanner (left; scale bar, 1000 μm). Sections were obtained from MADM brains in which red, green, and yellow clones were labeled in the late-stage embryo at very low densities using a Nestin-creER transgene [11]. Boxed area demarcates the zoomed image on the right. Scale bar, 50 μm. (D) A representative confocal image of another MADM brain (left; scale bar, 1250 μm) and zoomed image to the right (right; scale bar, 50 μm). Two main types of cells can be seen in C and D: Neurons and glia marked with arrowheads and arrows, respectively. The white dashed frame in D indicates a glia cluster. (E) The cell detection workflow. To localize and classify each cell, an object detection network (RetinaNet) was utilized. To detect dense and saturated glia clusters, two RetinaNet models were trained, one to detect individual cells with different colors, and the other to detect only glia clusters. In the inference stage, predictions of individual cells and glial clusters were merged to obtain final output.
Fig 2.
Training configurations that influence the performance of a single RetinaNet model in individual cell detection.
(A, B) Representative images from the slide scanner and the confocal fluorescence microscope (CFM) respectively, with the corresponding ground truth (GT) annotations. The GT annotations are on grayscale images for clearer display of the bounding boxes. Scale bars, 50 μm. (C) Average precision (mean ± SD, n = 3; *, p < 0.05; ***, p < 0.005; unpaired t-test) of trained models with different training configurations. Configurations include training: (i) With and without the DAPI channel. (ii) With and without pure-background (BKG) image patches (i.e., no target cells in the image patches). (iii) On six classes (red/green/yellow neuron, and red/green/yellow glia) versus two color-independent classes (neuron, glia). Training with the DAPI channel and BKG fail to improve the performance and degrade the average precision. Two-class detection shows better performance on neurons without color classification. Please note that when the significance line ends with inverted T, it shows significance between the average of the two classes.
Fig 3.
Data augmentation by color swap and saturation improve individual cell detection in a single RetinaNet model.
(A, B) Slide scanner and confocal images respectively with data augmentation including color swap and/or saturation. Scale bars, 50 μm. (C) Average precision results (mean ± SD, n = 3; ***, p < 0.005; unpaired t-test) from the different augmentation conditions. Utilizing each type of data augmentation provided similar results. Harnessing all three augmentations together led to the best performance in both neuron and glia detection. (D) Average precision results with respect to proportion of training images. A plateau of performance was reached when increasing the amount of training data.
Table 1.
Average precision results (mean ± SD, n = 3) across six classes in individual cell detection using a single RetinaNet model.
Table 2.
Average precision results (mean ± SD, n = 5) of 5-fold cross validation across six classes in individual cell detection using a single RetinaNet model.
Fig 4.
Representative model-detected neurons and glia.
(A) Examples of correct predictions on images from both the slide scanner (first and second images) and confocal (third and fourth images). (B) Examples of incorrect predictions from the trained model. Three main error types including miss detection (arrow), redundant detection (arrowhead) and false detection (asterisk) are marked in the images. Glial clusters and their saturation were the main factors that caused false predictions and miss detections. Threshold of confidence was 0.5 in all images. Scale bars, 50 μm.
Fig 5.
Combining predictions from two RetinaNet models enhances performance.
(A) An example of merging results compared to the ground truth (GT) annotations. As shown in Fig 1E, two RetinaNet models were trained separately: One for individual cell detection and one for glia cluster detection. Predictions from both the trained individual cell-detection models and the trained glia cluster-detection model were then merged to assess performance. For comparison, an additional RetinaNet model was trained to detect seven classes simultaneously (glia cluster, red/green/yellow glia, and red/green/yellow neuron). Predictions on the same image patch with confidence scores above 0.5 are shown. Scale bars, 100 μm. (B) F-score distributions for merged detection and seven-class detection on the test dataset. F-score is a comprehensive measure of accuracy combining precision and recall. F-score in the glia cluster improved significantly by merging predictions from the two RetinaNet models.