Fig 1.
Schematic representation of selected data from the Allen Institute dataset.
Note that the original dataset contains additional channels and information not depicted here. Panel (A) shows: (i) a Field of View (FoV) from a representative RNA FISH well displaying multiple cells with segmentation overlays, (ii) a single cell segmented from the field of view image (highlighted in yellow), and (iii) detected contours of potential z-discs within the selected cell, obtained by applying SarcGraph to the alpha-actinin-2 channel image. (B) Representative cells from each expert-assigned organization score group (scores 1-5). Score 1 represents poorly organized, sparse cells with predominantly z-bodies, while score 5 indicates highly organized cells with z-discs aligned along a single axis. Scale bars in individual cell images represent . (C) Confusion matrix showing the agreement between two expert annotators in scoring cell organization.
Fig 2.
Enhancements to SarcGraph’s z-disc segmentation pipeline.
(A) Starting from a raw cell image with all detected contours shown in blue, each contour is processed individually to crop a pixel region centered on it (current processing contour highlighted in yellow). This region serves as the image representation of the processing contour and is transformed into two distinct input types: Type 1, which preserves the original pixel intensities, and Type 2, which removes pixel intensity dependence. (B) depicts our ensemble classification architecture, where the two input types are processed through separate classifiers: a SimCLR-based EfficientNetv2 trained on both labeled and unlabeled data (Classifier 1), and a DINO-v2-based feature extractor followed by an MLP trained only on labeled data (Classifier 2). The final Z-disc probability is computed as the average of both classifiers’ predictions. (C) shows the predicted Z-disc probabilities for all detected contours in the sample cell, where each contour is colored according to its predicted probability of being a Z-disc (higher values in warmer colors indicate higher probability of being a Z-disc). (D) illustrates the application of our z-disc location correction method to three contours selected from different samples. The yellow line highlights the processing contour, while blue lines outline all detected neighboring contours. The blue dot marks the original z-disc location marked by the original SarcGraph, and the red dots indicate the corrected positions. Note that this approach can both refine z-disc locations and identify multiple z-discs that were previously merged into a single contour; see Sect E and Fig A8 in S1 Appendix.
Fig 3.
The modified SarcGraph pipeline.
(A) Schematically illustrates SarcGraph taking a raw image of a cell as input and, in two steps—z-disc segmentation and sarcomere detection—outputs a list of detected sarcomeres, visualized as red lines with light blue dots indicating z-discs. (B) Visualizes the modified z-disc segmentation step in SarcGraph, where (1) shows the original Otsu-thresholding-based contour detection, (2) demonstrates the integration of our deep-learning-based z-disc classifier (Sect 3.3.1), and (3) depicts the effect of the procedural approach used to correct the location of detected z-discs (Sect 3.3.2). (C) Illustrates the modified sarcomere detection phase in SarcGraph, where (4) schematically shows the construction of a graph by connecting detected z-discs to their N nearest neighbors, followed by ensemble graph scoring with probabilistic ensemble averaging (Sect 3.4.1), (5) presents the results of graph pruning, and (6) highlights the effect of applying post-processing myofibril extension to obtain the final list of detected sarcomeres (Sect 3.4.2); see Sect F and Fig A9 in S1 Appendix.
Fig 4.
Here we compare the performance of the original and modified SarcGraph software for detecting sarcomeres.
(A) Violin plots comparing sarcomere count and myofibril length (sarcomeres per myofibril) across samples from the Train set, grouped by cell organization score (low: score < 3, medium: score = 3, high: score > 3). With the modified version of SarcGraph, low-score cells have near zero sarcomere count. Notably, across all score categories, average myofibril length is higher with the new SarcGraph. (B) Visual comparison of sarcomere detection in one representative cell from each expert score group (1–5), showing original SarcGraph (top) versus modified SarcGraph (bottom). In low-score cells, the original pipeline yields numerous false-positive sarcomeres, whereas the modified version correctly suppresses most of these spurious detections, producing near-zero counts. In medium- and high-score cells, the modified pipeline reveals longer, more continuous myofibril chains without altering the overall sarcomere count substantially. Note that while the modified SarcGraph pipeline significantly reduces false positives and improves detection, false positives and false negatives can still occur, indicating the model still has room for potential improvement.
Fig 5.
This figure highlights the limitations of manual scoring.
(A) Cells arranged by sarcomere count normalized by cell area (ranked) within each score group (1 to 5), highlighting variation within score groups and visual similarity between cells from adjacent groups. (B) Confusion matrices comparing Expert 1 (KG) and Expert 2 (MH) scores across the Train, Test FISH, and Test Live datasets, emphasizing scoring differences. This panel closely corresponds to Figure S2 panel J from [32]. (C) Average PCA projections of feature vectors from detected sarcomeres across the Train, Test FISH, and Test Live datasets, grouped by expert-assigned cell organization levels (Low: score < 3, Medium: score = 3, High: score > 3), showing distribution differences across datasets.
Fig 6.
This figure presents the performance of a Support Vector Regression (SVR) model on predicting cellular structural organization scores.
The model was trained on the train dataset and evaluated on two test datasets: Test FISH and Test Live. Strip plots and box plots are used to compare expert scores with model predictions across the Train, Test FISH, and Test Live datasets. The figure also highlights visual examples of cells with both the highest predicted scores and cases where there is significant discrepancy between expert and model scores.
Fig 7.
Application of explainable clustering on cell features for unsupervised organization scoring.
(A-i) Decision tree derived from the Train dataset classifying cells into Low, Medium, and High organization clusters. (A-ii) Distribution of cells in 2D PCA-reduced feature space. (B) Histograms showing the distribution of average expert scores ((Expert 1 + Expert 2)/2) within each cluster for Train, Test FISH, and Test Live datasets. (C) Cell count distribution across organization levels (Low, Medium, High) for Train, Test FISH, and Test Live datasets. To compare with clustering results, expert scores are transformed into three categories (scores 1,2: Low, score 3: Medium, scores 4,5: High), and SVR predicted scores are similarly categorized (score < 2.33: Low, 2.33 ≤ score < 3.67: Medium, score ≥ 3.67: High). (D) Representative cell images from each cluster selected from the Train dataset, shown with their corresponding expert and SVR-predicted scores. Scale bars: 20 μm.
Table 1.
Pearson correlation coefficients between different scoring methods: manual expert scoring (MH, KG), supervised learning predictions (SVR), and decision tree clustering (DT) across datasets.