Sarc-Graph: Automated segmentation, tracking, and analysis of sarcomeres in hiPSC-derived cardiomyocytes

A better fundamental understanding of human induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) has the potential to advance applications ranging from drug discovery to cardiac repair. Automated quantitative analysis of beating hiPSC-CMs is an important and fast developing component of the hiPSC-CM research pipeline. Here we introduce “Sarc-Graph,” a computational framework to segment, track, and analyze sarcomeres in fluorescently tagged hiPSC-CMs. Our framework includes functions to segment z-discs and sarcomeres, track z-discs and sarcomeres in beating cells, and perform automated spatiotemporal analysis and data visualization. In addition to reporting good performance for sarcomere segmentation and tracking with little to no parameter tuning and a short runtime, we introduce two novel analysis approaches. First, we construct spatial graphs where z-discs correspond to nodes and sarcomeres correspond to edges. This makes measuring the network distance between each sarcomere (i.e., the number of connecting sarcomeres separating each sarcomere pair) straightforward. Second, we treat tracked and segmented components as fiducial markers and use them to compute the approximate deformation gradient of the entire tracked population. This represents a new quantitative descriptor of hiPSC-CM function. We showcase and validate our approach with both synthetic and experimental movies of beating hiPSC-CMs. By publishing Sarc-Graph, we aim to make automated quantitative analysis of hiPSC-CM behavior more accessible to the broader research community.


Introduction
Quantitative analysis of movies of beating cardiomyocytes is a compelling approach for connecting cell morphology to dynamic cell function [1,2]. In particular, connecting structure and function is a crucial step towards a better fundamental understanding of human induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) [3,4]. In stark contrast to sarcomeres in mature cardiomyocytes which have a highly ordered regular structure [5], sarcomeres in hiPSC-CMs are typically immature and disordered [6]. This irregular structure, combined with large variation between cells, makes quantitative analysis both more difficult and more pressing [7]. Robust quantitative analysis frameworks for analyzing beating hiPSC-CMs will help enable technological advances in drug discovery, genetic cardiac disease, and cardiac repair [8,9,10]. In this work, we aim to make automated quantitative analysis of hiPSC-CM contractile behavior more accessible to the broader research community. We note that the definitions of F avg , λ 1 , and λ 2 are introduced in this text. Fig. 1a shows a hiPSC-CM with flourescently labeled z-discs. Movies of these flourescently labeled hiPSC-CMs contracting are incredibly information rich [10]. Critically, with dynamic data it is possible to not only measure structure but also measure aspects of how the structural components functionally interact [11,12,13,14,4]. The focus of this work is on extracting structural and functional information from movies of beating hiPSC-CMs with flourescently labeled z-discs. To extract biologically relevant quantitative information from these movies, we focus primarily on segmenting, tracking, and analyzing individual sarcomeres. For context, each movie will typically have over 100 frames, and each frame will contain 100s of sarcomeres. Therefore, it is infeasible to perform segmentation and tracking by hand at scale. In the first component of our computational framework, we introduce a procedure for automated sarcomere segmentation and tracking. Example results from segmentation and tracking are show in Fig.  1b.
Building on effective segmentation and tracking, the second contribution of our computational framework is tools for automated analysis of hiPSC-CM contractile behavior. To accomplish this, we first draw inspiration from recent literature on defining summary metrics for describing cardiomyocyte structure and cytoskeletal structure [15,16] and mechanical deformation in multiple cell types [17,18]. Specifically, we treat the tracked sarcomeres as fiducial markers and quantify the principal directions and magnitudes of average cell contraction for each movie frame. An example of applying this approach is shown in Fig. 1c, with Fig. 1d for comparison. Then, we move beyond average metrics and extend our framework to readily perform spatiotemporal analysis of sub-cellular sarcomere contraction. Specifically, we treat the structurally disordered and complex hiPSC-CMs as spatial graphs where z-discs are represented as nodes and sarcomeres are represented as edges. With a spatial graph defined, performing spatial statistics based analysis is straightforward. For example, it is possible to examine the correlation between individual sarcomere contraction time series using both euclidean and network distances.
In conjunction with segmentation, tracking, and analysis, our computational framework -Sarc-Graphalso contains multiple visualization tools. The remainder of this paper is focused on describing key components of the framework.
We note that all code is available free and open source on GitHub: https://github.com/elejeune11/Sarc-Graph. The code is designed in a modular way such that making major alterations to one component or performing only a subset of the segmentation, tracking, and analysis steps should be straightforward when appropriate. Looking forward, we anticipate that this work is a starting point for a major effort in quantification and statistical analysis of hiPSC-CM contractile behavior.

Data
In general, it is infeasible to perform segmentation and tracking by hand at scale for movies of hiPSC-CMs with flourescently labeled z-discs. To address the lack of "ground truth" data we invest significant effort in realistic synthetic data generation to validate the segmentation and tracking components of our computational framework. In addition, we show the results of applying our framework to a selection of real experimental movies.

Synthetic data for software validation
In Fig. 2, we show the critical components of the synthetic data generation pipeline. The first step is illustrated in Fig.  2a where we show an example of generating the three-dimensional (3D) skeleton of z-disc and sarcomere geometry for each movie frame. First, a baseline geometry is defined. Then, the baseline geometry is deformed in space such that each sarcomere is associated with a ground truth function for contraction (shown here as fractional change in length) with respect to time. By starting with this 3D skeleton geometry, it is straightforward to include examples with inhomogeneous contraction, out of plane orientation and deformation, and sarcomere chains that appear to intersect when projected into two-dimensional (2D) space.
Given the 3D skeleton geometry in Fig. 2a, we then render each frame as illustrated in Fig. 2b-d. First, each zdisc is treated as an oriented cylinder with an associated height and radius. We note that with this approach it is straightforward to include multiple z-disc sizes in the same image. Then, the domain is converted into a voxel array such that the resolution is representative of the experimental images of interest. The voxel array, a 3D matrix, is then sliced around the simulated focal plane and projected into 2D. Both image noise and blur can be added at the voxel array step, 2D image step, or both. In the representative examples shown here, we add Gaussian blur to the voxel array [19] and Perlin noise to the 2D image [20]. We note that the entire synthetic data generation pipeline is available on GitHub. All code is written in Python with heavy use of the numpy and matplotlib packages [21,22].

Experimental data for software demonstration
The first two examples of experimental data included with this paper (E1 and E2) were previously made available in a publication by Toepfer et al. [10]. The protocol to introduce green fluorescent protein (GFP) onto z-disc protein titin (TTN-GFP) in hiPSC-CMs is reported in [24]. Video imaging was conducted on small clusters of hiPSC-CMs using a 100X objective of a fluorescent microscope with a minimum acquisition rate of 30 frames per second. The three additional examples included with this publication were selected to showcase the general applicability of Sarc-Graph. Cells in E3, E4, and E5 were fluorescently labeled using lentiviral constructs lenti-GFP-actinin-2 or lenti-mAppleactinin-2 as described in [23], and were electrically stimulated at 1Hz using the IonOptix C-Pace EP Culture Pacer (IonOptix) during imaging. Time-lapse videos of the cells contraction were acquired at 30 frames per second using a 40x objective on a Nikon Eclipse Ti (Nikon Instruments, Inc.) with an Evolve EMCCD Camera (Photometrics) or on a Zeiss Axiovert 200M inverted spinning disk microscope with an ORCA-100 Camera (Hamamatsu), equipped with a temperature and CO2 equilibrated environmental chamber. Key details are summarized in Table 1. We note that, though not included as examples with this publication, our computational framework has also been tested on additional Figure 2: Creating synthetic data to test the image analysis code: (a) Define the geometry of z-discs (points) and sarcomeres (line segments) in three-dimensional (3D) space and prescribe the ground truth deformation for each sarcomere; (b) Approximate each z-disc as an oriented cylinder in 3D space; (c) Convert the 3D geometry into a voxel array with higher resolution in the x and y directions than the z direction; (d) Slice and project the voxel array to obtain a two-dimensional (2D) image. Image noise and blur can be added to the voxel array, 2D image, or both. low spatial resolution, small deformation beating hiPSC-CM movies obtained by different researchers with subtly different experimental set ups. We also note that for all examples shown here, both experimental and synthetic, our framework does not require any parameter tuning. All input parameters are identical across all cases shown in this paper.

Code
Sarc-Graph is divided into seven modular components: file_pre_processing, segmentation, tracking, time_series, spatial_graph, analysis_tools, and functional_metrics. The first, file_pre_processing, is simply used to convert each movie into a series of 2D numpy arrays [21], one for each frame of the movie converted to grayscale. The code is designed such that the only new step required to support a previously unsupported movie file format is a function to convert each frame into a 2D numpy array. The remainder of the code can loosely be grouped as segmentation which is necessary for extracting spatial data, tracking and time_series which are necessary for extracting temporal data, and spatial_graph, analysis_tools, and functional_metrics which are necessary for synthesizing and interpreting the spatial and temporal data.  The Laplacian of the input image is used to detect the high gradients present at the edge of every z-disc [25]; (c) z-discs are identified as closed contours where the value of the Laplacian exceeds the threshold computed with Otsu's method [26]; (d) Sarcomeres are procedurally identified from the segmented z-discs; (e) Sarcomeres and z-discs are tracked independently between frames with the Python trackpy package [27]; (f) The algorithm to segment sarcomeres is based on linking the approximately parallel z-discs to their closest neighbors in the direction perpendicular to the z-disc; (g) Sarcomere properties are computed from the pair of associated z-discs; (h) Tracking each sarcomere leads to multiple spatially resolved time series.

Segmentation
Within the segmentation script, the function segmentation_all(folder_name, gaussian_filter_size) will segment all z-discs and all sarcomeres from each frame. The first parameter, "folder_name" specifies the folder that the data is in. The second parameter, "gaussian_filter_size" specifies the standard deviation for the Gaussian kernel applied to the image during z-disc segmentation [28]. The default value for gaussian_filter_size = 1 and that is the value for all data shown in this paper. In our experience, a value of 2 may be more appropriate for movies of hiPSC-CMs that have high levels of irregular fluctuatig background signal, and thus it is left as a potential tunable parameter with a default value of 1. The key steps of segmentation are illustrated in Fig. 3. z-disc segmentation is conducted by computing the Laplacian of the image [25], applying a Gaussian filter to the Laplacian [28], and then using the scikit-image "measure.find_contours()" function [29] to identify contours with a level matching a scalar threshold computed via Otsu's method [26]. We note briefly that a global threshold determined with Otsu's method is sufficient (i.e., an adaptive threshold is not required) because thresholding is performed on the Laplacian rather than the original image. As illustrated in Fig. 3c, each contour represents a segmented z-disc and the properties of each z-disc are computed algorithmically from the contours. Of note, contour position is computed as the mean position of each pixel in the contour, contour length is computed as the maximum possible distance between two pixels in the contour, and contour endpoints are the locations of the pixels that correspond to the maximum distance.
Once z-discs are segmented, sarcomeres are procedurally identified from the segmented z-discs with a custom algorithm schematically illustrated in Fig. 3f. In brief, the steps of the algorithm are as follows: 1. Compute the distance between the center of each z-discs and the center of it's nearest neighbor. 2. Define median_neigh as the median distance between a z-disc and its nearest neighbor. 3. For each z-disc, define a line that goes through the center of the z-disc and is perpendicular to the axis defined by the contour endpoints. Then, define the two points on the line that are median_neigh/2 distance from the z-disc center. These points are referred to as "ghost_points." This is illustrated in Fig. 3f. 4. Find the nearest neighbor for each point in ghost_points. 5. If there is a two-way nearest neighbor match between a pair in ghost_points, the pair of z-discs will correspond to a segmented sarcomere.
As illustrated in Fig. 3f, this algorithm links approximately parallel z-discs. In order to flexibly accommodate potential image artifacts, the criteria for "approximately parallel" is not strictly defined. However, in most cases, there will not be a persistent two way nearest neighbor match unless z-discs bound a convex quadrilateral, ideally a trapezoid. If necessary, additional match rejection criteria could be integrated into the algorithm. Once a sarcomere is identified, its properties are computed from the properties of the corresponding paired z-discs. This is illustrated in Fig. 3g. Of note, sarcomere width is computed as the mean length of the two z-discs, sarcomere length is computed as the distance between the two z-disc centers, and sarcomere angle is computed as the angle of the vector connecting the two z-disc centers.
For the examples shown in this paper, the segmentation step takes order of 10s of seconds to a few minutes to run on a single laptop. We also note that users working with still images rather than movies can still segment z-discs and sarcomeres from a single frame, and can still create a spatial graph with their data and perform several aspects of the spatial analysis described in this paper. For generality, all length data is reported in units of pixels.

Tracking and processing time series data
The tracking component of Sarc-Graph is for independently tracking all z-discs and all sarcomeres. Tracking the sarcomeres is necessary for reporting the relative length change from frame to frame. Without tracking, it is only possible to report population average relative length change which is substantially less information rich because the sarcomeres do not necessarily contract in perfect synchrony, and are not necessarily of identical lengths. We track the z-discs and sarcomeres independently because we use this information to construct a spatial graph, described later in the text. Tracking is performed by calling the "run_all_tracking(folder_name,tp_depth)" function. As stated previously, "folder_name" specifies the folder that the data is in. The second parameter, "tp_depth" specifies the farthest distance in pixels that a particle can travel between frames [27]. The default value for tp_depth = 4 and that is the value for all data shown in this paper. In some cases, particularly for movies with lower spatial resolution, it is necessary to change to tp_depth = 3. Particle tracking is performed with the Python package trackpy which is based on the Crocker-Grier algorithm [27,30]. We note that we do not use trackpy for feature detection, instead we convert each segmented z-disc and sarcomere to a pandas DataFrame for compatibility with the trackpy framework [31]. The outcome of running run_all_tracking() is a unique ID for each particle that is tracked for over 10% of the movie. With this information, it is possible to locate each tracked particle in space across multiple frames, as illustrated in Fig.  3e where the position of each sarcomere across all frames is plotted.
The time_series component of our computational framework processes the results from tracking so that there is time series data describing the length, normalized length (L −L)/L, width, angle, and position for each tracked sarcomere. This is accomplished by running the function "timeseries_all(folder_name, keep_thresh)" where keep_thresh represents the fraction of movie frames that a tracked sarcomere must be present in to get processed. We recommend keep_thresh= 0.75 for quantitative analysis of time series data and keep_thresh> 1/k for visualization, where k is the approximate number of times the cell contracts during one movie. To process the results from tracking, Gaussian process regression is used to interpolate data across frames which temporarily lose signal [32]. Gaussian process regression is implemented through python scikit-learn with a radial basis function (RBF) and white noise kernel [33]. In addition to interpolating between lost frames, Gaussian process regression has the added benefit of adaptively smoothing the data without a need for additional dataset-specific input parameters. The main result of running the timeseries_all function is illustrated in Fig. 3h where normalized sarcomere length is plotted with respect to frame number. Even with this synthetic data example, the individual recovered time series deviate from the ground truth. This is for two main reasons. First, for the ground truth geometry that we specified, some sarcomeres are angled out of the image z-plane and are thus not perfectly captured by a 2D image. Second, the limited resolution of the images introduces artifacts where lengths less than 1 pixel cannot be perfectly captured. However, the mean of all time series curves illustrated in Fig. 3h is a near perfect fit to the ground truth.
For the examples shown in this paper, the tracking step takes order of 10s of seconds to a minute to run on a single laptop. The time_series step takes on the order of 10s of seconds to a few minutes to run. The time_series step will output several user friendly text files describing sarcomere properties with respect to time where each row corresponds to a tracked sarcomere and each column corresponds to a movie frame. The normalized length data can be automatically plotted with the function plot_normalized_tracked_timeseries() from the analysis_tools file. For generality, all time data is reported in units of frames.
b) construct a spatial graph Figure 4: Data analysis from segmentation and tracking: (a) Tracked elements are treated as fiducial markers and used to construct an average deformation gradient F (see eqns. 1-4); (b) z-discs and sarcomeres are tracked independently and used to construct a spatial graph (z-discs are "nodes" and sarcomeres are "edges", edge color corresponds to sarcomere orientation, node color corresponds to correlation in sarcomere orientation) to measure correlations based on network distance; (c) Automated analysis of time series data is used to extract constants such as mean contraction time and number of peaks.

Computing average deformation (F avg )
Developing methods to describe data rich images and movies of cells with a tractable set of output parameters is an active area of research [34,16]. In this paper specifically, we draw inspiration from recent work on summarizing data from traction force microscopy experiments [18] and agent based model simulations [35,36] and propose a method to compute the mean deformation gradient of each domain with respect to time. First, we define the standard continuum mechanics deformation gradient F as follows: where dX is a vector in the initial configuration, dx is the vector in the deformed configuration, and F is the mapping between them [37]. In the context of analyzing movies of contracting hiPSC-CMs, we first define a set of n vectors v that connect each potential pair of tracked fiducial markers, as illustrated in Fig. 4a. The direction and magnitude of each vector v will change as the fiducial markers move between movie frames. With this definition, we can set up the over-determined system of equations: where F avg is a 2 × 2 matrix, Λ 0 is a 2 × n matrix of vectors in the initial (reference) configuration movie frame, and Λ is a 2 × n matrix of vectors in the current (deformed) configuration movie frame. Note that when the initial frame and the current frame are identical, Λ 0 = Λ and F avg = I. The initial configuration can be defined as either the first frame of the movie, or a selected movie frame where the cell is known to be in a relaxed state. Then, we can use the normal equation to solve for the best fit average deformation gradient as: Once computed, the components of F avg can be plotted directly, or F avg can be manipulated to provide information that is more convenient to compare and interpret. For example, we can write where R avg is a rotation tensor and U avg is a symmetric positive definite tensor that represents stretch. In the numerical setting, R avg and U avg are computed with the scipy polar decomposition function [28]. We can also compute the eigenvalues and eigenvectors of U avg , λ avg 1 , λ avg 2 , u avg 1 , and u avg 2 respectively. In the numerical setting, this is done with the numpy linalg package, and we always define λ avg 1 < λ avg 2 [21]. From a mechanical data interpretation perspective, the average stretch values λ avg 1 and λ avg 2 and the associated eigevectors are particularly convenient because they contain information about both the magnitude and direction of contraction. This is illustrated in Fig. 1b and 1c.
Because sarcomeres data is already processed through the time_series component of our framework, we use the sarcomeres as our fiducial markers to define Λ and Λ 0 . We note that this approach would be just as effective with the zdiscs chosen as fiducial markers. In addition, if 3D data was available, this approach would still work with F avg as a 3× 3 matrix and Λ and Λ 0 as 3×n matrices. Within the analysis_tools file, the function compute_F_whole_movie() will compute F avg for each movie frame. The function visualize_F_full_movie() will create a visualization of λ avg 1 and λ avg 2 as a function of the movie frame and schematically illustrate F avg overlaid on the original image data, as shown in Fig. 1. The computational cost of computing and processing F avg for all frames is on the order of seconds to 10s of seconds on a single laptop. The computational cost of data visualization is on the order of 10s of seconds to minutes.

2.2.4
Comparing average deformation (F avg ) to average sarcomere shortening (s, s avg ) and Orientational Order Parameter (OOP) In addition to defining tensor F avg as a dynamic metric that evolves over the course of the entire beating hiPSC-CM movie, it is possible to formulate scalar metrics from F avg that are more straightforward to directly compare to well established metrics for describing hiPSC-CMs [7]. Here we focus on defining scalar metrics from F avg that are directly comparable to average sarcomere shortening (s, s avg ) [10], and the Orientational Order Parameter (OOP) [11,16,38].
We compute average sarcomere shortening in two different ways. In both cases, we work with the normalized sarcomere length obtained from time_series y = (L −L)/L where L is sarcomere length in pixels. Given normalized sarcomere length y for each movie frame, we define shortening as: where y max is the maximum normalized length and y min is the minimum normalized length. Then, we can defines as the median value of s for all sarcomeres tracked in the movie. We choose to use median rather than mean in this case because the mean is more susceptible to outliers and thuss is a better reflection of the ground truth sarcomere deformation. In addition, we can compute the mean normalized length time series y avg and then compute s avg based on the single mean time series. If all sarcomeres are contracting identically,s and s avg will be identical. However, if this is not the case, the two values will likely differ ands will reflect average individual sarcomere behavior while s avg will reflect both individual sarcomeres and the (lack of) synchrony between them. Functions to computes and s avg are given in the functional_metrics component of Sarc-Graph.
In addition to computing mean sarcomere shortening, we specify the method for computing OOP from hiPSC-CM movies that we implement in the functional_metrics component of Sarc-Graph. When hiPSC-CMs are fully relaxed, individual sarcomere chains are no longer under systolic tension and can thus become wavy and lose local alignment. Therefore, the first step to computing OOP is to define a frame of the movie where the lowest fraction of sarcomere chains will be in their relaxed and potentially wavy configuration. To select this frame, we compute det(F avg ) for every movie frame with the first frame of the movie used as the reference configuration. Then, we identify the frame where det(F avg ) is the greatest (i.e. the sarcomeres are most relaxed) and select that frame as the reference frame and recompute F avg . With the appropriately selected reference frame, we re-compute det(F avg ) and define the most contracted frame as the frame with the lowest value of det(F avg ).
We compute OOP for the static image corresponding to the most contracted frame of the movie. To do this, we define structural tensor T following the literature: where r i = [r i x , r i y ] is the unit vector describing the orientation of the i th sarcomere [11]. Given T, with eigenvalues u max and u min and corresponding unit eigenvectors v max and v min , OOP= u max . When OOP= 0.0, sarcomeres are oriented randomly. When OOP= 1.0, sarcomeres are perfectly aligned. Corresponding eigenvector v max corresponds to the dominant direction of sarcomere orientation.
In order to best relate F avg to OOP, we define two scalar metrics C iso and C || that describe average radial contraction, and contraction in the direction of dominant sarcomere orientation v max respectively. Unlike individual sarcomere contraction or the average of individual sarcomere contraction, C iso and C || represent deformation throughout the whole domain and not necessarily along the sarcomere axis. Metric C iso is defined as where F avg is computed in the most contracted frame. Practically, C iso is the line shortening during contraction of an equivalent system where contraction is not directionally dependent. When C iso = 0.0, no shortening occurs.
As an example, a value C iso = 0.10 corresponds to an equivalent isotropic line shortening of 10% in all directions.
Essentially, higher C iso indicates higher levels of hiPSC-CM contraction.
To quantify shortening in the dominant direction specified by v max computed from T, we consider the relation F avg v 0 = v max where v 0 is the corresponding vector defined in the reference frame. We can manipulate this equation to compute v 0 = F −1 avg v max and then define C || as With this definition, C || is the fractional shortening of a line in the direction of v max where direction is defined in the contracted configuration. Conveniently, lowers and s avg , C iso , and C || all correspond to lower levels of contraction, and highers and s avg , C iso , and C || all correspond to higher levels of contraction.
Functions to compute and visualizes, s avg , OOP, C iso , and C || are all defined in the functional_metrics script. The function "compute_metrics" will computes, s avg , OOP, C iso , and C || and create a visualization of OOP and F avg . The function "visualize_lambda_as_functional_metric" will create a movie of λ avg 1 and λ avg 2 changing over the course of the movie. Examples of this are provided as Supplementary Information. The computational cost of computing F avg (including computing C iso , and C || ) ands, s avg , and OOP for all frames is on the order of seconds to 10s of seconds on a single laptop.

Preparing spatial graphs
In addition to analyzing average cell and sarcomere behavior (ex: average z-disc and sarcomere morphological properties, F avg ), we are interested in analyzing the spatiotemporal patterns that arise on the sub-cellular level. We implemented the spatial_graph component of Sarc-Graph to conveniently synthesize the outputs of the segmentation, tracking, and time_series steps. Specifically, running the function create_spatial_graph() will create a spatial graph where z-discs are represented as nodes and sarcomeres are represented as edges. With a spatial graph structure, it is then possible to better quantify spatiotemporal patterns in hiPSC-CM behavior. For example, access to a spatial graph can help delineate sarcomere synchrony that depends on Euclidean distance, and sarcomere synchrony that depends on network distance. In addition, access to a spatial graph can help us better quantify both global and local sarcomere alignment. This functionality is illustrated in Fig. 4b.
In the run_all_tracking() step, z-discs and sarcomeres are tracked independently. Every z-disc that is at least partially tracked (defined as present in over 10% of the movie frames) will have a unique local (frame-specific) ID and a unique global ID. Each node of the spatial graph corresponds to a unique global ID, and the local IDs specific to each frame are linked to the corresponding global ID node. Each tracked sarcomere is associated with two local z-disc IDs in each frame. To add sarcomeres to the spatial graph, the global ID of the associated z-disc is first determined. Then, either a new edge is created linking the two global IDs, or, if the edge already exists, the weight of the edge is increased. The spatial graph is created with the NetworkX python package, and all operations, such as computing the shortest path between two nodes, rely on built-in NetworkX functionality [39]. The function create_spatial_graph() runs in seconds on a single laptop. Running create_spatial_graph() will also produce a schematic drawing of the spatial graph.

Notable analysis and visualization tools
In addition to the functions to compute and visualize F avg and other functional metrics (compute_F_whole_movie(), visualize_F_full_movie(), compute_metrics, visualize_lambda_as_functional_metric) and plot normalized sarcomere length with respect to time (plot_normalized_tracked_timeseries()), the analysis_tools component of Sarc-Graph contains multiple functions to further visualize and analyze the data acquired from the hiPSC-CM movies. Example outputs from these functions are included onn the Sarc-Graph GitHub page. Additional key analysis and visualization functions are as follows: • visualize_segmentation(folder_name, gaussian_filter_size, frame_num): Visualization of the segmented z-discs and sarcomeres overlaid on the original image.
The parameter gaussian_filter_size should match the value chosen in segmentation.
The default for gaussian_filter_size = 1 and the default for frame_num = 0.
• visualize_contract_anim_movie(folder_name, re_run_timeseries, use_re_run_timeseries, keep_thresh=0.75): Visualization of the results of tracking overlaid on the original movie of the contracting hiPSC-CMs. If re_run_timeseries = True and use_re_run_timeseries = True this function will re-run the timeseries_all() function with the new specified keep_thresh. The saved tracking data will be designated for visualization purposes only.
• cluster_timeseries_plot_dendrogram(folder_name,compute_dist_DTW, compute_dist_euclidean): Visualize hierarchical clustering of all normalized sarcomere length time series data and plot a dendrogram that illustrates the clustering [28]. If compute_dist_DTW =True the similarity between each sarcomere time series will be measured with a Dynamic Time Warping algorithm [40]. Because this can be time consuming (order of minutes to 10s of minutes), we also provide an option to set compute_dist_euclidean =True and instead base clustering on the Euclidean distance between time series.
• plot_normalized_tracked_timeseries(folder_name): This function will create a plot of the normalized length of the tracked sarcomeres with respect to frame number.
• plot_untracked_absolute_timeseries(folder_name): This function will create a plot of the absolute sarcomere lengths in pixels with respect to frame number for all segmented sarcomeres. The number of segmented sarcomeres will exceed the number of tracked sarcomeres.
• compute_timeseries_individual_parameters(folder_name): This function will compute and save scalar values describing the normalized length time series data for the tracked sarcomeres (contraction time, relaxation time, flat time, period, offset, etc.) The results of this analysis will be saved in a spreadsheet titled "timeseries_parameters_info.xlsx." • analyze_J_full_movie(folder_name): This function will compute and save scalar values describing the F avg time series data. Specifically, it will compute time constants describing the plot of the scalar Jacobian (J = |F avg |) with respect to frame number. This is illustrated in Fig. 3c.
• compare_tracked_untracked(folder_name): This function will compare the tracked and untracked populations through random sampling of the untracked population. These plots are important for understanding if the tracked data is significantly biased compared to the segmented data.
• preliminary_spatial_temporal_correlation_info(folder_name): This function will perform a preliminary analysis of spatial/temporal correlation within the hiPSC-CMs. The main outcome is a plot of normalized cross-correlation score between normalized sarcomere length time series data with respect to both Euclidean and network distances.

Synthetic data examples
Our first investigation into the performance of Sarc-Graph on synthetic data is illustrated in Fig. 5. In this investigation, we define a sinusoidal chain of 20 sarcomeres that deform homogeneously following the time series illustrated in Fig.  2a. Under baseline conditions (modest amplitude, little angling out of plane, and low to moderate noise), Sarc-Graph is able to successfully track all 20 out of 20 sarcomeres. This is illustrated in 5a. Then, we adversely alter these baseline conditions to demonstrate the limitations of our approach. In the first study, illustrated in Fig. 5b, we alter the underlying 3D geometry of the sarcomere chain. As anticipated, performance degrades when we increase the amplitude of the sine wave, and when we increase the out of plane tilt of the sarcomere chain. That being said, we are able to segment and track at least half of the 20 sarcomeres in all but four of the examples shown. In the second study, illustrated in Fig. 5c, we adversely alter baseline conditions by adding noise to the rendered image. Specifically, we add Perlin noise at varying levels of magnitude and with a varying number of octaves [20]. From these results, it is clear that our computational framework is quite robust to all tested types of low magnitude noise, and Perlin noise with a low number of octaves (low frequency). However, performance degrades in the presence of Perlin noise that has high gradients and is similar in size and brightness to the simulated z-discs themselves. We note that if a specific experimental dataset has a characteristic noise pattern that can be removed with a known operation, implementing additional steps in the segmentation component of our computational framework to address it would be straightforward. In some cases, simply increasing the parameter gaussian_filter_size will resolve the issue. In the experimental data that we have tested the computational framework on thus far, our current segmentation process is effective.
Our second investigation into the performance of Sarc-Graph on synthetic data is illustrated in Fig. 6. In these examples, our aim is to design synthetic data that contains many of challenges associated with analyzing real hiPSC-CMs: out of plane deformation, sarcomere chain curvature, sarcomere chain overlap, inhomogeneous deformation, variable z-disc size, and irregular contraction. From S1 to S5 the examples are loosely organized from "least" to "most" challenging for the computational framework to capture. The corresponding movies are included in our GitHub repository. Briefly, example S1 shows four sarcomere chains of variable curvature (decreasing from left to right) and variable out of plane deformation (increasing from left to right) deforming homogeneously. Our framework is able to Figure 5: For each image, the number of sarcomeres successfully segmented and tracked (out of 20 possible) is reported. Running the algorithm on synthetic data shows that the code is robust to geometry and noise up to a limit: (a) Illustration of successful z-disc and sarcomere segmentation and tracking for a curved geometry in the presence of noise; (b) Performance degrades when the baseline geometry moves partially out of plane (here slanted in the zdirection), and when the sarcomere chain is too distorted (here high amplitude to period ratio for sinusoidal geometry); (c) Performance degrades in the presence of Perlin noise, in particular noise that is similar in size and brightness to the z-discs themselves.
recover 77 out of 80 sarcomeres, match the ground truth mean time series data, and match the ground truth on λ avg 1 and λ avg 2 . This is consistent with the results shown in Fig. 5. Example S2 shows two families of sarcomere chains, an external ellipse and four internal chains that overlap in the center. Our framework is able to recover 90 out of 99 sarcomeres (error occurs primarily near the overlap region), match the ground truth mean time series data, and match the ground truth on λ avg 1 and λ avg 2 . Example S3 shows three closely positioned elliptical sarcomere chains that deform inhomogeneously. Our framework is able to recover 34 out of 40 sarcomeres, however, because there is bias in which sarcomeres are recovered (recovery is worse towards the bottom of the image frame), both the mean time series data and the λ avg 1 and λ avg 2 data are unable to perfectly capture the ground truth. We note that the results for λ avg 1 and λ avg 2 are a better reflection of the ground truth than the mean time series data. We also note that the amount of sarcomere contraction simulated in S3 exceeds what is typically observed in the experimental setting (over 30% contraction).
Example S4 shows two families of sarcomere chains: an external ellipse, and multiple partially overlapping and curved internal chains in different z-planes. In Example S4, the sarcomeres deform inhomogeneously with a brief abrupt jump in sarcomere deformation around frame 70. Our framework is able to recover 75 out of 90 sarcomeres, and is able to nearly recover the mean time series data, and the λ avg 1 and λ avg 2 data. We note that for both approaches, the abrupt jump in sarcomere behavior seen in the ground truth is smoothed out in the automated time series processing because during the abrupt jump tracking temporarily fails for the entire outer ring of sarcomeres. There are two main implications of this result. First, that large abrupt motion may cause Sarc-Graph to fail. Second, that Sarc-Graph is able to track sarcomeres on either end of a time period where signal is temporarily lost, and interpolate data across the missing frames. Similar to example S3, the plot of λ avg 1 and λ avg 2 is a better reflection of the ground truth than the mean time series curve.
Example S5 shows multiple sarcomere chains that are all angled out of plane, appear to overlap at multiple points, and undergo inhomogeneous deformation. In this case, we are only able to recover time series data for 27 out of 70 sarcomeres. This results in mean time series data that is a poor reflection of the ground truth. However, the λ avg 1 and λ avg 2 data is a much better match to the ground truth. This is a positive sign towards the efficacy of our proposed metric even when the number of sarcomeres recovered is relatively low. Though only 27 out of 70 sarcomeres were fully tracked, segmented but only partially tracked sarcomeres appear on the spatial graph shown in Fig. 6 S5. Looking forward, additional criteria to further refine the spatial graph and further delineate "probable" and "improbable" links can readily be added to Sarc-Graph within the spatial_graph module.
increasing complexity Figure 6: Example performance on synthetic data of increasing complexity with a known ground truth: Segmentation and tracking (marker color corresponds to sarcomere contraction in the illustrated frame); Constructing a spatial graph (line color corresponds to sarcomere orientation); Recovering individual time series data (both Sarc-Graph output and ground truth shown); Recovering average deformation behavior (both Sarc-Graph output and ground truth shown).  Table 2: Comparison of Sarc-Graph computeds, s avg , OOP, C iso , and C || with the ground truth values of these parameters for each synthetic data example.
In Table 2, we report median individual sarcomere shortening (s), mean time series sarcomere shortening (s avg ), Orientational Order Parameter (OOP), equivalent isotropic contraction (C iso ), and contraction in the dominant direction of sarcomere orientation (C || ) for each synthetic data example. For all of the synthetic data cases, we are able to compute a ground truth for each of these parameters from the known ground truth contraction, displacement, and orientation of the sarcomere geometry. For examples S1 and S2, the parameters recovered by Sarc-Graph are a near perfect match to the ground truth. However, for all other samples minor discrepancies arise ins, OOP, C iso , and C || . For example S4,s is over estimated. For examples S3 and S5, OOP is over estimated, C iso is under estimated, and C || is under estimated. Consistent with the results shown in Fig. 6, we believe that this error is due to bias in the region where sarcomeres are recovered in S3, and failure of the algorithm for overlapping and severely out of plane sarcomere chains in S4 and S5. We note that for all examples but S1 and S2 -where sarcomere contraction is synchronous -recovered s avg is not a good match to the ground truth. This is consistent with the normalized sarcomere length time series plot shown in Fig. 6. These errors indicate that care should be taken when comparing results across beating hiPSC-CM movies that may have systemic differences. The accessibility of our code for generating additional synthetic data can aid in this preliminary check.
In all examples, most glaringly example S5, more sarcomeres are segmentated and tracked for a small number of frames (and thus appear on the spatial graph) then are tracked for enough frames to reconstruct the full time series. The recovered time series (λ avg 1 and λ avg 2 ) and parameters (C iso , C || ) associated with F avg are typically a closer match to the synthetic data ground truth behavior than the mean curve of the recovered time series data. Example S5with out of plane inhomogeneous deformation and overlapping sarcomere chains -is perhaps the best reflection of the challenges that we have seen in our experience thus far with the experimental data. We note that the extensible code used to generate the synthetic data is published on GitHub so implementing and testing additional examples is straightforward.

Experimental data examples
Here, we show the performance of Sarc-Graph on the five experimental data examples listed in Table 1. In Fig. 7 -9, we illustrate the key Sarc-Graph outputs. In Table 3, we reports, s avg , OOP, C avg , and C || for each example. In the Supplementary Information section, we include a movie for each sample that shows both the change in length of the successfully tracked sarcomeres, and how our computed metric F avg changes over the course of the movie.
In Fig. 7, we show the performance of Sarc-Graph on two similar examples of beating hiPSC-CM movies, examples E1 and E2. For example E1, we are able to segment ≈ 500 sarcomeres per frame and successfully track 74 sarcomeres. The time series plot of absolute sarcomere length change with respect to frame number shows three distinct contraction events during the movie. Though the individual sarcomeres are clearly not perfectly in sync, there is enough of a unifying pattern for these three peaks to emerge. These three distinct contraction events are also reflected in the plot of λ avg 1 and λ avg 2 . For example E2, we are able to segment ≈ 500 sarcomeres per frame and successfully track 60 sarcomeres. The time series plots of absolute sarcomere length change and λ avg 1 and λ avg 2 with respect to frame number also show three distinct peaks. We also show the spatial graph representation of each example and a plot of normalized cross-correlation between pairs of individual sarcomere time series curves with respect to network distance.
Qualitatively, examples E1 and E2 are different in that the sarcomeres in example E1 appear vertically aligned throughout while the sarcomeres in example E2 appear tangentially aligned around the edge of the cell and unaligned in the center. This qualitative observation is consistent with the values of OOP reported in Table 3. In addition, we can use our computed metric F avg to go beyond morphology alone and quantify the function of the observed hiPSC-CMs. By looking at the time series plots of λ avg 1 and λ avg 2 with respect to frame number, we can see that example E1 is deforming quite anisotropically λ avg 1 < λ avg 2 while the deformation of example E2 is much closer to isotropic where λ avg 1 ≈ λ avg 2 . In example E1, C || is meaningfully greater than C iso while in example E2 the two parameters are much Figure 7: Example performance on experimental data E1 and E2: Raw image visualization; z-disc and sarcomere segmentation; Sarcomeres that are tracked for 1/3 or more of the movie, red corresponds to a contracted state, blue corresponds to a relaxed state (frame 20); Sarcomeres that are tracked for 3/4 or more of the movie, red corresponds to a contracted state, blue corresponds to a relaxed state (frame 20), note that these sarcomeres are included in the time series analysis; Individual time series data for normalized sarcomere length; Average deformation behavior; Illustration of the data represented as a spatial graph, color corresponds to sarcomere orientation; Normalized sarcomere time series cross-correlation score plotted as a function of the distance along the network. Supplementary Movies SI1 and SI2 are included for further visualization. Figure 8: Example performance on experimental data E3 and E4: Raw image visualization; z-disc and sarcomere segmentation; Sarcomeres that are tracked for 1/3 or more of the movie, red corresponds to a contracted state, blue corresponds to a relaxed state (frame 50 for E3, frame 40 for E4); Sarcomeres that are tracked for 3/4 or more of the movie, red corresponds to a contracted state, blue corresponds to a relaxed state (frame 50 for E3, frame 40 for E4), note that these sarcomeres are included in the time series analysis; Individual time series data for normalized sarcomere length; Average deformation behavior; Illustration of the data represented as a spatial graph, color corresponds to sarcomere orientation; Normalized sarcomere time series cross-correlation score plotted as a function of the distance along the network. Supplementary Movies SI3 and SI4 are included for further visualization. Figure 9: Example performance on experimental data E5: Raw image visualization; z-disc and sarcomere segmentation; Sarcomeres that are tracked for 1/3 or more of the movie, red corresponds to a contracted state, blue corresponds to a relaxed state (frame 50); Sarcomeres that are tracked for 3/4 or more of the movie, red corresponds to a contracted state, blue corresponds to a relaxed state (frame 50), note that these sarcomeres are included in the time series analysis; Individual time series data for normalized sarcomere length; Average deformation behavior; Illustration of the data represented as a spatial graph, color corresponds to sarcomere orientation; Normalized sarcomere time series cross-correlation score plotted as a function of the distance along the network. Supplementary Movie SI5 is included for further visualization.
closer in value. Essentially, example E1 is experiencing more substantial oriented contraction than example E2. We also note that for these two examples, wheres is similar but OOP is different, metrics derived from F avg are able to capture functional differences in contraction.  Table 3: Sarc-Graph computeds, s avg , OOP, C iso , and C || for each experimental data example.
In Fig. 8 and λ avg 2 , and scalar metrics OOP, C iso , and C || , we observe high sarcomere alignment and corresponding highly aligned contraction. Notably, the cell contracts substantially in the direction perpendicular to fiber alignment, thus exhibiting auxetic deformation during contraction. For example E4 we are able to segment approximately 700, and track 228 sarcomeres. This is notable because the conditions in E4 are far from ideal for image analysis. In both cases, five distinct contraction events are observed over the course of the movie and are visible from both the average normalized sarcomere length time series and the λ avg 1 and λ avg 2 time series.
In Fig. 9, we show the performance of Sarc-Graph on example E5. Example E5 is of a hiPSC-CM on a glass slide with poor fiber alignment, curved fibers, and relatively low deformation over the course of the movie. With Sarc-Graph, we are able to segment approximately 800 and track 539 sarcomeres for example E5. Though individual sarcomeres in this movie contract substantially (s = 0.13), there is low overall synchrony, and overall average deformation is quite low (C iso = 0.0060). Despite low synchrony, Sarc-Graph still detects 5 distinct contractions in both the average time series plot and the λ avg 1 and λ avg 2 time series plot. Notably, overall contraction is also slightly higher (C || > C iso ) in the dominant direction of sarcomere orientation. This example is notable in because Sarc-Graph is able to segment and track a high number of sarcomeres from an irregular geometry and detect subtle sarcomere behavior beyond the limits of what has been accomplished with the previous state of the art [10].

Conclusion and outlook
The objective of this work is to provide an open source computational framework to quantitatively analyze movies of beating hiPSC-CMs. To accomplish this, we introduce tools to segment and track z-discs and sarcomeres, and analyze their spatiotemporal behavior. Notably, we are able to automatically segment and track a high number of sarcomeres across multiple experimental conditions without any input parameter tuning, and we introduce two important new approaches for the analysis of beating hiPSC-CM: a method for computing average deformation gradient, and a method for treating the hiPSC-CMs as spatial graphs. Looking forward, we see this work as an important tool for substantial future study of hiPSC-CM behavior. Finally, our proposed computational framework is heavily documented and has a modular extensible design. We anticipate that many components of our open source software will be directly useful to other researchers.

Additional information
Sarc-Graph, the code to generate the synthetic data, the example data, and examples of the full analysis output can all be accessed through GitHub: https://github.com/elejeune11/Sarc-Graph.

Supplementary Information
SI-1. A movie of E1 with a schematic illustration of deformation gradient F avg and tracked sarcomeres overlayed (sarcomere color corresponds to contraction level with red indicating the highest level of contraction), magnitudes of average principal stretches (λ 1 , λ 2 computed from F avg ), and average normalized sarcomere length, both with respect to the movie frame number. This movie is a supplement to Fig. 7, sample E1.

SI-2.
A movie of E2 with a schematic illustration of deformation gradient F avg and tracked sarcomeres overlayed (sarcomere color corresponds to contraction level with red indicating the highest level of contraction), magnitudes of average principal stretches (λ 1 , λ 2 computed from F avg ), and average normalized sarcomere length, both with respect to the movie frame number. This movie is a supplement to Fig. 7, sample E2.

SI-3.
A movie of E3 with a schematic illustration of deformation gradient F avg and tracked sarcomeres overlayed (sarcomere color corresponds to contraction level with red indicating the highest level of contraction), magnitudes of average principal stretches (λ 1 , λ 2 computed from F avg ), and average normalized sarcomere length, both with respect to the movie frame number. This movie is a supplement to Fig. 8, sample E3.

SI-4.
A movie of E4 with a schematic illustration of deformation gradient F avg and tracked sarcomeres overlayed (sarcomere color corresponds to contraction level with red indicating the highest level of contraction), magnitudes of average principal stretches (λ 1 , λ 2 computed from F avg ), and average normalized sarcomere length, both with respect to the movie frame number. This movie is a supplement to Fig. 8, sample E4.

SI-5.
A movie of E5 with a schematic illustration of deformation gradient F avg and tracked sarcomeres overlayed (sarcomere color corresponds to contraction level with red indicating the highest level of contraction), magnitudes of average principal stretches (λ 1 , λ 2 computed from F avg ), and average normalized sarcomere length, both with respect to the movie frame number. This movie is a supplement to Fig. 9, sample E5. A still frame from this movie is also shown in Fig. 1.