An Exact Hypergraph Matching algorithm for posture identification in embryonic C. elegans

The nematode Caenorhabditis elegans (C. elegans) is a model organism used frequently in developmental biology and neurobiology [White, (1986), Sulston, (1983), Chisholm, (2016) and Rapti, (2020)]. The C. elegans embryo can be used for cell tracking studies to understand how cell movement drives the development of specific embryonic tissues. Analyses in late-stage development are complicated by bouts of rapid twitching motions which invalidate traditional cell tracking approaches. However, the embryo possesses a small set of cells which may be identified, thereby defining the coiled embryo’s posture [Christensen, 2015]. The posture serves as a frame of reference, facilitating cell tracking even in the presence of twitching. Posture identification is nevertheless challenging due to the complete repositioning of the embryo between sampled images. Current approaches to posture identification rely on time-consuming manual efforts by trained users which limits the efficiency of subsequent cell tracking. Here, we cast posture identification as a point-set matching task in which coordinates of seam cell nuclei are identified to jointly recover the posture. Most point-set matching methods comprise coherent point transformations that use low order objective functions [Zhou, (2016) and Zhang, (2019)]. Hypergraphs, an extension of traditional graphs, allow more intricate modeling of relationships between objects, yet existing hypergraphical point-set matching methods are limited to heuristic algorithms which do not easily scale to handle higher degree hypergraphs [Duchenne, (2010), Chertok, (2010) and Lee, (2011)]. Our algorithm, Exact Hypergraph Matching (EHGM), adapts the classical branch-and-bound paradigm to dynamically identify a globally optimal correspondence between point-sets under an arbitrarily intricate hypergraphical model. EHGM with hypergraphical models inspired by C. elegans embryo shape identified posture more accurately (56%) than established point-set matching methods (27%), correctly identifying twice as many sampled postures as a leading graphical approach. Posterior region seeding empowered EHGM to correctly identify 78% of postures while reducing runtime, demonstrating the efficacy of the method on a cutting-edge problem in developmental biology.


Introduction
Point-set matching describes the task of finding an alignment between two sets of points.The problem appears in computer vision applications such as point-set registration [11], object recognition [12], and multiple object tracking [13].Often the point-sets are modeled via graphs, abstract mathematical objects in which points are represented as vertices and edges define relationships between pairs of vertices.
User defined attributes characterize the vertices and edges, such as coordinate positions or shape descriptions and lengths of chords connecting vertices, respectively.Specified attributes give insight to observable relationships between vertices and allow for structural analyses of graphs.Graph matching is the optimization problem defined by the search for a correspondence of vertices between a pair of attributed graphs.The optimization problem uses binary variables x ij to specify a matching between vertex i in the first graph to vertex j of the second.The graph matching domain consists of assignment matrices of size n 1 × n 2 , for matching graphs of size n 1 and n 2 .
The space X (Eq 1) comprises assignment matrices which each describe a one-to-one alignment between nodes of the two graphs.The specification of the graph matching optimization objective function allows for joint assignment costs: i.e., how the assignment of a pair of vertex-to-vertex assignments changes the quality of the match.Let C be an n 1 × n 2 matrix and D be a n 1 × n 2 × n 1 × n 2 tensor storing the vertex-to-vertex and edge-to-edge dissimilarities, respectively.The graph matching optimization problem is expressed in Eq 2, which takes the form of the quadratic assignment problem (QAP).
Graphs are limited in their expressive power as edges can only relate pairs of vertices; hypergraphs extend the definition of a graph to include hyperedges which can specify relationships among an arbitrary number of vertices.Hypergraph matching then concerns finding an optimal vertex correspondence between pairs of attributed hypergraphs.The number of vertices aligned by the most comprehensive hyperedge defines the degree of a hypergraph.
Maximum degree hypergraphs with hyperedges composed of all n 1 vertices yield the most comprehensive pointset matching function possible.The optimization objective function captures the dissimilarity arising between the matching: (l 1 , l 2 , . . ., l n1 ) → (l 1 , l 2 , l 3 , . . ., l n1 ).Then, for a given assignment matrix X ∈ X , the hypergraph matching objective can be expressed using n 1 dissimilarity tensors of dimension 2, 4, . . ., 2d, . . ., 2n 1 , each measuring dissimilarity between degree d hyperedges, respectively.Define Z (d) as the tensor mapping the dissimilarity for the degree d hyperedges.The hypergraph matching objective is expressed in Eq 7.
Hypergraph matching allows for the modeling of intricate point-set matching problems through high multiplicity assignment objective function formulations.The Z (d) dissimilarity terms measure degree d hyperedge dissimilarity comprising d simultaneous vertex assignments.The range in assignment problem objective complexity from d=1 to d=n 1 trades off model capacity for increased computation.The traditional linear assignment problem (d=1) is solvable in polynomial time [14], but treats points between sets independently.Existing graphical methods (d=2) and hypergraphical methods (d>2) rely on approximate searches and do not generalize to high degree formulations of Eq 7. Exact Hypergraph Matching (EHGM) is able to find globally optimal solutions to hypergraph matching problems of arbitrary degree, allowing for the modeling of intricate point-set matching tasks.

Related Research
Finding an exact solution to the QAP is an N P-hard problem.That is, unless P=NP, there does not exist a polynomial time solution to exactly solve the QAP [15].Higher order assignment problems (i.e.hypergraph matching) are also N P-hard as they are at least as hard as the QAP [16].As a result, recent methods for graph matching and lower-degree hypergraph matching focus on heuristic solutions which offer no guarantee on performance [1,2,3,5,4].Heuristic hypergraph matching methods are adapted from existing graph matching algorithms.In particular, spectral methods for solving graph matching (Eq 2) have been extended to solve hypergraph matching.Duchenne et al. [3] adapt Leordeanu's [11] work to obtain a rank-1 approximation of the affinity tensor via higher order power iteration.However, calculating affinity tensors (Z (d) terms) is computationally prohibitive, especially for higher degree hypergraphs due to the exponentially growing number of entries in the tensors.Simplifying assumptions such as supersymmetry and sparseness are used with sampling methods to build large affinity tensors [3,17].Chertok and Keller propose similar methodology to [3], but instead unfold the affinity tensor and use the leading left singular vector to approximate the adjacency matrix [4].All such methods operate outside the permutation matrix space.The Hungarian algorithm or similar binarization step is used to yield a valid assignment, e.g. as in [11].
Exactness allows for a more rigorous analysis of a hypergraphical point-set matching model than is possible using heuristic techniques.The guarantee of a globally optimal correspondence allows an iterative tuning of the underlying model in pursuit of accurate characterization, whereas the output of a heuristic algorithm could be incorrect due either to the stochasticity of the search or to inadequacy of the optimization objective.Branch-and-bound is a paradigm originally developed to exactly solve the the travelling salesman problem, a type of QAP [18,19].Branch-andbound methods recursively commit partial assignments and solve successive subproblems within X .The paradigm iteratively partitions the search space while bounding the optimum at each branch.At each step the method prunes branches which cannot contain lead to the optimum.Convergence occurs when only feasible assignments achieving a global optimum remain.The N P−hardness of the QAP implies convergence occurs only after implicit enumeration of X .

Overview of EHGM & Application to C. elegans
EHGM deviates from recent graph matching and hypergraph matching methodology as an exact method, guaranteeing convergence to a globally optimal solution (S1:Convergence of EHGM).Heuristic hypergraph matching methods approximate the assignment matrix using the dissimilarity tensor [3,4] whereas EHGM builds upon the seminal branchand-bound algorithm [18].EHGM extends the methodology to branch and prune based upon a given hypergraphical model.A k-tuple of nodes at branch m are greedily selected while another step encapsulates the full hypergraphical objective upon selection.These changes enable flexibility in altering the hypergraph matching objective, particularly in allowing for high degree hypergraphical modeling.
EHGM is applied to model posture in embryonic Caenorhabditis elegans (C.elegans), a small, free-living roundworm.The nematode features approximately 550 cells upon hatching, including a set of twenty seam cells and two associated neuroblasts.The seam cells and neuroblasts form in lateral pairs along the left and right sides of the worm, resulting in eleven pairs upon hatching [7].The neuroblasts appear in the final hours of development, just prior to hatching.The pairs of cells are named, posterior to anterior: T, V6, V5, Q (neuroblasts), V4, V3, V2, V1, H2, H1, and H0.Each pair's left and right cell is named accordingly; for example, H1L and H1R comprise the H1 pair.We define posture as the identification of all seam cells and neuroblasts, which together reveal the shape of the coiled embryo.Posture identification allows for traditional frame-to-frame tracking of imaged cells belonging to various tissues such as the gut, nerve ring, and bands of muscle [10].Images are captured in five minute intervals (Fig 1-B) in order to achieve necessary resolution to track cells of other tissues without disturbing embryo development.Current methods for posture identification rely on trained users to manually annotate the imaged nuclei using a 3D rendering tool [20].The process takes several minutes per image volume and must be performed on approximately 100 image volumes per embryo [10].Manual annotation strategies motivated us to develop EHGM, as established methods for point-set matching fail to adequately capture the relationships between seam cells throughout myriad twists and deformations of the developing embryo.EHGM uses hypergraphical models comprising biologically driven geometric features to more accurately identify posture than established graphical methods.The limited expressive power of graphical models hinders accurate seam cell identification; graphical models accurately identify posture in 27% of samples compared to 56% using a hypergraphical model.User labelling of the posterior-most seam cell nuclei improves the success of hypergraph matching to correctly identifying all nuclei in 77% of samples.The improved accuracy in posture identification attributed to high-degree hypergraphical modeling solved via EHGM paves a path toward automatic posture identification while presenting a general framework for approaching similarly challenging point-set matching tasks.

Posture Identification Models
Posture was predicted via EHGM according to three models: a graphical model, denoted Sides, and two hypergraphical models.The two hypergraphical models, Pairs and Posture, showcase EHGM as existing algorithms cannot find solutions under such high degree hypergraphs.Each of the three models incrementally use higher degree terms to describe posture.Sides follows the form of Eq 2 and leverages pairwise assignments to calculate lengths and widths of portions of the embryo.Pairs uses degrees four and six hyperedges to better model local regions of the embryo than is possible with graphical methods which rely on pairwise relationships.Posture further demonstrates the capabilities of EHGM by including a degree n 1 hyperedge to maximize context in evaluating a hypothesized posture.Geometric features such as pair-to-pair rotation angles and left-right flexion angles were developed to more accurately measure and compare posture hypotheses.The calculation of each angle or distance requires identification of multiple seam cells in tandem to calculate, necessitating the use of hyperedges.

Posture Identification Accuracy
Annotators curated a dataset of seam cell nuclei center coordinates from 16 imaged embryos.Each imaged embryo yielded approximately 80 image volumes for a total of N=1264 labelled seam cell nuclei coordinate sets.Homogeneity in C. elegans embryo development allowed use of samples spanning multiple embryos to fit models via a leave-one-out approach (S1:Model Fitting, S1: Posture Modeling).EHGM allows for known correspondences, henceforth referred to as seeds, to be given as input prior to search initialization.The algorithm was evaluated both in a traditional pointset matching scenario given no a priori information, and in a series of seeded simulations.Seeded trials assumed incrementally more pairs given sequentially from the tail pair, T, to the fourth pair, V4 (or Q for n 1 =22 samples).KerGM [2], a leading algorithm for heuristic graph matching, was applied to posture identification.The algorithm used the same connectivity matrix as Sides, but processed results frame-to-frame serially, relying on the correct posture identification at the prior image as input to search.
EHGM is able to store complete assignments encountered during the search as it compares against the current solution at the final branch.This allowed for an analysis of the similarity between cost minimizing posture hypotheses and progressively higher cost solutions encountered during search.The top x accuracy describes the percentage of all N samples in which EHGM returned the correct posture in the x lowest cost solutions; i.e. the top 1 accuracy describes the percentage of samples in which the correct posture was returned as the cost minimizing posture, and the top 3 accuracy is the percentage of samples in which the correct assignment was among 3 lowest cost posture hypotheses returned by the search.Top x accuracies are reported alongside the median runtime and the median cost ratio.The cost ratio is defined as the ratio of the correct posture's objective to the cost minimizing posture's objective.A cost ratio greater than one implies the objective of the hypothesized posture is lower than that of the correct posture, suggesting the model is not aptly characterizing posture as an incorrect posture hypothesis was preferred by the model.
Table 1 shows the percentage of all N samples in which the correct posture (correct identification of all seam cells) was returned as the minimizer according to KerGM and each of the models solved via EHGM: Sides, Pairs, and Posture.KerGM identified 27% of sampled postures correctly, outperforming Sides (10%).Pairs and Posture more effectively identified posture with 52% and 56% top 1 accuracies, respectively.Both hypergraphical models also reported a median cost ratio of 1.00, compared to 1.28 of Sides, suggesting the hypergraphical representations of coiled posture provided enhanced discriminatory power across samples.The hypergraphical models demonstrated small trade-offs between accuracy and runtime.The Posture model's n 1 degree hypergraphical features improved accuracy over Pairs, 56% to 52%, in exchange for longer median runtime, 60 minutes to 43 minutes.Differences between the top 1 and top 3 accuracies reflect the challenge in posture identification.The optimums under the Pairs and Posture models were often similar to those of similar posture hypotheses.Notably, the Posture model returned the correct posture in the top 3 hypotheses in approximately 67% of samples, an approximate 20% increase in relative accuracy over the top 1 percentage, 56%.1: Hypergraphical model Posture achieves highest accuracy.Posture identification accuracies across all N=1264 samples.KerGM is compared to proposed models.The first columns list the top x accuracy as a percentage of samples.The column titled R shows the median runtime of each model in minutes.CR reports the median cost ratio, defined as the ratio of the correct posture cost to the returned posture cost.
Posture identification results were stratified by the presence of the Q neuroblasts; 875 of the 1264 samples contain only the seam cells while the remaining 389 samples are mature enough to have the Q neuroblasts.Table 2 depicts the findings presented in Table 1 split by Q neuroblast presence.KerGM and all models solved via EHGM achieved a higher accuracy on Q samples.Notably, the Posture model's top 3 accuracy is higher on the Q samples (82%) than the pre-Q samples (60%).The extra pair of coordinates provided substantial context, further defining the coiled shape and helping to penalize incorrect postures.
Seeded experiments specifying nuclear identities provided a priori information starting with the tail pair, and incrementally identified more pairs in the posterior region.Each experiment was given five minutes of maximum runtime; a semi-automated solution requiring more runtime was deemed infeasible.Top 1 and top 3 accuracy percentages are reported by EHGM models and number of seeded pairs in Table 3. Seeding yielded decreasing marginal improvements to accuracy and runtime.2: Hypergraphical models leverage Q neuroblasts to identify posture.The samples are split according to the absence (top) or presence (bottom) of the Q neuroblasts, which form in the last two hours of development.There are 875 n 1 =20 cell samples and 389 n 1 =22 Q samples.Reported methods more accurately identify embryonic posture in the Q samples, suggesting the increased continuity along the body of the embryo allows for more consistent posture identification.3: Seeding posterior pair identities promotes accurate posture identification and reduces runtime.Top 1 and top 3 seeded posture identification accuracies across all samples.All trials had a five-minute maximum runtime.The rows again correspond to each model.Columns specify which pairs were given as seeds prior to search.The None columns recreate the original no information task.The subsequent columns specify which pairs are correctly identified prior to search.

Discussion
We have presented EHGM as a dynamic and effective tool for intricate point-set matching tasks.The hypergraph matching algorithm provides a method in which to gauge the efficacy of modeling point correspondences in conservatively-sized problems; problems featuring larger numbers of points likely contain the context required to match adequately via lower degree models.For example, postures in samples containing Q nuclei were more accurately identified across models, but the largest marginal gain in accuracy came from Sides (d=2) to Pairs (d=4,6).The results suggest that added context throughout the embryo would further improve posture identification accuracy, reducing the reliance on higher degree (and thus more computationally expensive) hypergraphical objective function formulations.EHGM specifically addresses a gap in literature concerning challenging point-set matching applications in which domain-specific features lead to rigorously testable models.Seeding allows a wider range of problems to be approached, and mitigates the computational expense of the algorithm for scenarios featuring larger point-sets.
Posture identification in embryonic C. elegans is a challenging problem benefiting from high degree hypergraphical modeling.EHGM equipped with biologically inspired hypergraphical models led to substantial improvement in posture identification.The top 1 accuracy doubled from 27% with a graphical model to 56% via the Posture model (Table 1).The top 3 accuracy rate improved to 67%, highlighting the challenge in precisely specifying the coiled embryo due to the similarity of competing posture hypotheses.The presence of Q neuroblasts further contributed to accurate posture identification.The added context empowered the Posture model to identify the correct posture in 82% of Q samples (Table 2.
The top x percentage accuracy metric reflects the need to correctly identify all seam cells in order to recover the underlying posture, but does not distinguish between hypotheses that are incorrect due to one cell identity swap or a more systemic modeling inadequacy.A qualitative analysis highlighted a few themes among incorrectly predicted postures.The foremost errors concern the tail pair cells, TL and TR; spurious identifications occurred when the tail pair coiled against another the body of the embryo, causing one tail cell identity to be interchanged with a cell of a nearby body pair.The variance of feature measurements in the posterior region resulted in similar costs for postures with minor differences about the posterior region.Pair seeding allows for the strengths of EHGM to compensate for the most challenging aspect of posture identification.The posterior region of the embryonic worm is especially flexible and contributes to the majority of reported errors.Feature engineering stands to create hypergraphical models more capable of reliable posture identification, particularly in contextualizing the posterior region.The method and application outline a protocol for challenging point-set matching tasks.

Exact Hypergraph Matching
EHGM extends the branch-and-bound paradigm to exactly solve hypergraph matching.The algorithm performs the search in the permutation space X subject to a given branch size k which specifies the number of vertices assigned at each branch.A size n 1 hypergraph will require M := n1 k branch steps, where branch m concerns the assignment of vertices ((m − 1)k + 1, (m − 1)k + 2, . . ., mk); vertices 1, 2, . . ., mk have been assigned upon completion of the m th branch.The set P contains all possible permutations of the indices of the unordered point set, |P| = n2!(n2−k)! .P is incrementally subset into queues Q m ⊆ P at branches m = 1, 2, . . ., M at each branching.The queue Q m is subset according to both a pruning rule which eliminates permutations leading to a suboptimal solution as well as the one-to-one constraints of X .The search converges to a global optimum upon the implicit enumeration of Q 1 = P.
The objective function f is further stratified according to the branch size k.Lower degree (d ≤ 2k) hyperedge dissimilarity tensors are computed prior to search.Branches comprising k-tuples of vertices are partially assigned in a greedy manner according these lower degree hyperedge dissimilarities via the selection rule H. Later branches accrue higher degree (d > 2k) hyperedge dissimilarities which are calculated at time of branching; the intent of the method is to rely on lower degree terms to steer the search towards an optimum in effort to minimize the number of branches explored.The aggregation rule I accrues higher degree hyperedge dissimilarity terms upon branching, further guiding the pruning step and ensuring the complete specification of the objective f .The branching and selection rules are designed to reduce computation performed throughout the search.A partial assignment at branch m: K m = (l (m−1)k+1 , l (m−1)k+2 , . . ., l mk ) ∈ Q m is selected via precomputed lower degree hyperedge dissimilarity tensors Z (1) , . . ., Z (2k) .A larger branch size k results in a selection rule with larger scope of the optimization landscape, better equipped to place optimal branches earlier in each queue Q m at time of branching.However, computing the lower degree dissimilarity tensors prior to search can be prohibitively expensive for larger point-sets.
Subsequent branches m = 2, 3, . . .M then use the general selection rule H m to order the permutations of the m th branch: K m = (l (m−1)k+1 , l (m−1)k+2 , . . .l mk ) ∈ Q m .Branch K m incurs a selection rule cost H m according to Eq 5 comprising lower degree hyperedge dissimilarities for assignments both within branch m and the assignments between branches 1, 2, . . ., m − 1 and branch m.The partial assignment constraints K m allow further simplification of notation; the reversed order of summation indices satisfies the criteria that only hyperedge dissimilarities pertaining to branch m assignments are considered via H m .
The greedy selection rule orders queues Q m , but does not account for higher degree (2k < d ≤ n 1 ) hyperedge dissimilarities.Precomputing higher degree dissimilarity tensors can be both computationally expensive, and inefficient as ideally only a small percentage of combinations are queried throughout the search.The aggregation rule I m , m = 3, 4, . . ., M measures the dissimilarity attributable to higher degree (2k < d ≤ mk) hyperedges accessible due to branch m partial assignments.The aggregation rule updates the cost of branch K m assignments, further informing the pruning step to subset the next queue Q m+1 .The greedy selection rule H m in tandem with the aggregation rule I m aim to minimize the total computation performed in finding an optimum.The definition I m follows from the general selection rule H m , but is applied to the higher degree hyperedge dissimilarities.The aggregation rule I m (Eq 6) can be expressed as the degree d dissimilarities calculable upon assignments of branch m assignments for degrees 2k < d ≤ mk.
The m th branch allows for hyperedge dissimilarities up to degree mk concerning the first mk assignments.The M th branch yields a complete assignment, allowing the evaluation of maximum degree n 1 hyperedge dissimilarities.The partitioning and further regrouping of each H m and I m as defined fully accounts for the objective f while allowing efficient computation during the search (S1:Hypergraphical Objective Decomposition, S1:Convergence of EHGM).

Posture Identification in Embryonic C. elegans
Caenorhabditis elegans (C.elegans) is a small, free-living nematode found across the world.The worm is often studied as a model of nervous system development due to its relative simplicity [6,9].The adult worm features only 302 neurons, the morphology and synaptic patterning of which have been determined via electron microscopy [6].
The complete embryonic cell lineage has also been determined [7]; methods and technology have been developed to allow study of cell position and tissue development in the embryo [21,22,23,24,25,26].Systems-level studies of these processes may be able to discover larger-scale principles underlying developmental events.
The embryo features a set of twenty seam cells and two associated neuroblasts.The seam cells and neuroblasts together describe anatomical structure in the coiled embryo, acting as a type of "skeleton" outlining its body.Identification of the seam cells and neuroblasts defines the embryo's posture.Fluorescent proteins are used to label cell nuclei, including the seam cell nuclei so that they may be visualized during imaging, e.g. with light sheet microscopy [27].
Volumetric images are captured at five minute intervals in order to capture subcellular resolution without damaging the worm's development [10].Seam cell nuclei appear in the fluorescent images as homogeneous spheroids.Their positions relative to other nuclei and other salient cues present in the image volumes comprise the information that trained users employ to manually identify seam cells.[20].The interface is used to annotate both seam cell nuclei and track remapped nuclei, as in Fig 3 [10].
We cast posture identification as hypergraph matching and use EHGM to solve the resulting optimization problem.The proposed models: Sides, Pairs, and Posture trade off modeling capacity for increased computation to identify optimal solutions.Sides expresses posture identification as graph matching; edge-wise (degree d=2) features take the form of standardized chord lengths between nuclei laterally and sequentially along each side.The first hypergraphical model, Pairs, employs a greater local context than Sides using degrees four and six hyperedges to describe relationships between seam cells.Hyperedges formed by two or three sequential pairs (d=4,6) better detail local regions throughout the embryo than is capable of a graphical model.The traditional point-set matching task requires a labelled point-set and a second unidentified point-set.Higher order features such as bend and twist angles may vary largely frame-to-frame depending on the posture at moment of imaging.However, elongation throughout late-stage development causes macroscopic trends in these geometric features.We estimate a template posture as a composite of feature measurements from a corpus of manually annotated postures.
The templates are time dependent to reflect the elongation from the first point of imaging throughout development until hatching.See S1:Model Fitting for details on template estimation.
Together, the fitted models are used with EHGM to identify posture in imaged C. elegans embryos.The branch size k=2 is set for all models, i.e. a lateral pair of seam cell identities are assigned at each branch starting with the tail pair cells TL and TR.The successive pair cells, V6L and V6R, are assigned given the established cells and hypergraphical relationships accessible with the hypothesized identities.on the methods.We also thank Dr. Hank Eden and Dr. Matthew Guay for their careful readings and suggestions.The code and data are available at https://github.com/lauziere/EHGM.which occurs when Q m = ∅.The recursion will continue until Q 1 is empty, signaling the complete enumeration of the search space S n .

Model Fitting
Expert annotations are used to derive features such that the correct assignment consistently achieves a minimal cost across the training set.Features can be engineered and analyzed in context of point set matching just as in traditional supervised learning tasks.
Features are expressed as attributes over hyperedge multiplicities d = 1, 2, . . ., n 1 .Hyperedge features g (d) s , s = 1, . . ., n d are given as input.Each feature g (d) s assumes a Gaussian distribution, and if n d ≥ 2 the features are modeled as a multivariate Gaussian distribution.Measurements from the data are used to derive estimates of the parameters of the Gaussian distributions: mu and Sigma.The most common application in heuristic approaches is to use the previous frame's feature values as the centers of the distributions.This standard approach is effective for features that vary minimally, frame-to-frame.However, certain angle measurements may vary greatly between frames.Mean estimates across the training data can better account for macroscopic patterns in features.The variances are then estimated from the feature values across the training set.
The dissimilarity costs arise from the Mahalanbonis distance between a hypothesized assignment's feature measurements and the estimated template mean values scaled by estimated covariance matrix.The dissimilarity tensors Z (d) are expressed as a function of the n d features of hyperedge d.A partial assignment up to degree d: [(l 1 , . . ., l d ) → (l 1 , . . ., l d )] invokes a cost according to the n d features: The traditional approach uses the labeled coordinates in the prior frame to build corresponding prior frame feature measurements.These prior frame feature measurements then serve as the estimated center of the Gaussian distribution.The covariance matrix estimation follows accordingly, in which the variation in frame-to-frame differences is estimated from sample data.

Posture Modeling
Embryonic C. elegans posture modeling used the aforementioned template hypergraph for quantifying hypothesized seam cell identities throughout the search process.The developing embryo elongates and as a result becomes more coiled due to the constraining eggshell.As such, the template hypergraphs are updated according to binned time intervals.Parameters are estimated from data according to the point in development between first image and hatch.
Each image is captured at time t with n 1 = 20 located nuclei centroids.The coordinates can be stored as X ∈ R n×3 which X i = [x i , y i , z i ] representing the i th centroid in R 3 .The seam cells are ordered posterior to anterior: TL, TR,

Fig 1 -
A depicts center points of seam cell nuclei located in an example image volume as imaged in the eggshell (left) and straightened to reveal the bilateral symmetry in seam cell locations (right).Fig1-Bshows four sequential images of an embryo, five minutes between images.
Fig 2-A highlights muscle cell nuclei (red dots) with the identified seam cells to contextualize the embryo's positioning.The posture is used to remap the muscle cells such that traditional cell tracking approaches can be applied in the late-stage embryo (Fig 2-B).Fig 2-C depicts the cell remapping process [10].The muscle cells are remapped according to splines fitted to the posture.The untwisted cell positions are then tracked frame-to-frame (Fig 2-D).
Fig 3 depicts manually identified postures in the first two successive image volumes of Fig 1-B.Manual identification is performed in Medical Imaging, Processing, Analysis and Visualization (MIPAV), a 3D rendering program used for manual annotation [20].

Figure 1 :
Figure 1: High spatial resolution, low temporal resolution imaging necessitates posture identification.A: Manually identified and seam cell nuclei from an imaged C. elegans embryo.The cells form in pairs; they are labelled posterior to anterior: T, V6, ..., H0.The identification of all seam cells reveals the embryo's posture.Natural cubic splines through the left and right-side seam cells estimate the coiled body.The left image depicts identified nuclei connected to outline the embryonic worm.The fit splines are used to untwist the worm, generating the remapped straightened points in the diagram on the right.B: Labelled nuclear coordinates from a sequence of four images.The embryo repositions in the five minute intervals between images, causing failure of traditional tracking approaches.

Fig 4
Fig 4 demonstrates four types of models applied to perform posture identification on the first two sampled images in Fig 1-B.Linear models (Fig 4-A & Fig 4-B) are ill-equipped to identify posture due to the repositioning of the embryo between successive images, so linear models are not evaluated on sampled data.The graphical model Sides (Fig 4-C & Fig 4-D) associates local seam cells via edges (purple).Edge-wise features such as lengths and widths vary if the

Figure 2 :
Figure 2: Posture identification allows the tracking of other cells during late-stage embrygenesis.A: Seam cell nuclei coordinates (black) and muscle nuclei coordinates (red) in a sequence of three sequential volumetric images.The untwisting process (green arrows) uses the seam cells to remap muscle coordinates to a common frame of reference.B: The remapped muscle nuclei are tracked frame-to-frame (blue arrows).C: A higher magnification view from the right coordinate plot of A. The left, right, and midpoint splines are used to create a change of basis defined by the tangent (black), normal (blue), and binormal (cyan) vectors.Ellipses are inscribed along the tangent of the midpoint spline, approximating the skin of the coiled embryo.D: A portion of the left (red) and center (blue) remapped muscle coordinates.Black lines connect the coordinates, frame-to-frame.

Figure 3 :
Figure 3: Manual posture identification in two successive image volumes of Fig 1-B using MIPAV.The 20 fluorescently imaged seam cell nuclei rendered in two successive image volumes.Scale bar: 10 µm.A & B: Seam cell nuclei appearing in two successive image volumes visualized in MIPAV.The five minute interval allows the embryo to reposition between images, yielding entirely different postures.C & D: Manual seam cell identification by trained users reveals the posture.The curved lines are cubic splines as described in Fig 2-C.

Figure 4 :
Figure 4: Posture identification applied to the two successive images in Fig 3 according to a series of increasingly intricate models.The embryo repositions between images.A & B: Linear models (LAP) cannot quantify relationships between seam cells; posture identification is impossible without context of neighboring cell identities.C & D: A graphical model (Sides) specifies edges (purple) between pairs of seam cell nuclei.Edge lengths are relatively static frame-to-frame, but the similarity of edge lengths throughout the embryo causes the edges to have a weak signal in identifying seam cells.E & F: The Pairs model uses degrees four (red) and six (blue) hyperedges to model a greater local context than is possible in a graphical model.G & H: The Posture model extends the Pairs model to use a degree n 1 (black) hyperedge to evaluate all seam cell assignments jointly.
Fig 5 depicts top 1 accuracies and median runtimes across seeded experiments for the Pairs and Posture models split by Q pair labelling.Particularly, seeding the first two pairs, T and V6, greatly reduces the median runtime while also netting the largest gains in top 1 accuracy, partially attributable to EHGM converging in the given timeframe.Top 1 (%) Top 2 (%) Top 3 (%) Top 5 (%) Top 10 (%) R (minutes) CR

Figure 5 :
Figure 5: Evaluating the Pairs and Posture models as seam cell identities were seeded.The Pairs and Posture models top 1 accuracies and median runtimes by Q pair labelling.Posterior pair seeding drastically improved top 1 accuracy and reduced runtime when applying both models.Q pair samples required more runtime (n 1 =22 as opposed to n 1 =20), but the added context improved posture identification accuracy.The majority of samples converged within 5 minutes when seeded with the T and V6 pairs of nuclei.
Fig 6 shows the two rendered fluorescent images from Fig 1-A in Medical Image Processing, Analysis and Visualization (MIPAV), a 3D rendering tool

Fig 7 -
A presents the hyperedge connectivity among nodes in the Pairs model[28].The Posture model extends the Pairs model by leveraging complete posture (d=n 1 ) features in effort to further discriminate between posture hypotheses that appear similar in sequential regions of the embryo.Geometric features help contextualize the coiled posture.Fig 8 illustrates three of the features used in the Pairs and Posture models.The angle Θ measures the angle between three successive pair midpoints.The angles Θ decrease throughout development as the worm elongates.Pair-to-pair twist angles ϕ and τ penalize posture hypotheses in which posterior to anterior transitions are jagged and unnatural in appearance.See S1:Posture Modeling for further details and specification of model features.

Fig 9 depicts
EHGM applied to the sample image depicted in Fig1-A.The initial pair (TL and TR) is selected, instantiating a search tree (Fig9-A).Successive seam cell identities are partially assigned according to the given hypergraphical model in a pair-wise fashion.Each branch greedily queues hypothesized point-pair assignments conditioned on the previous branch assignments (black arrows within a branch).The next leading V6 pair (Fig9-E) is chosen upon exhaustion of the leading hypothesized V6 pair (Fig9-B).EHGM continues the recursion to implicitly identify a globally optimal posture under the given hypergraphical model; each possible initial pair will follow this illustrated process subject to pruning of the minimizing posture accessed via the hypothesized tail pair in Fig9-A.

Figure 6 :
Figure 6: Rendered image volumes in the MIPAV GUI.The imaged twisted embryo (left) and imaged straightened embryo (right) rendered in Medical Image Processing, Analysis and Visualization (MIPAV) [20].The fluorescent images are those depicted in Fig 1-A.Trained users navigate the MIPAV GUI to identify seam cells based upon relative positioning and other salient features such as specks of fluorescence on the skin.Correct identification of all imaged nuclei reveals the coiled embryonic posture.Green (left), red (center), and purple (right) splines yield an approximation of the coiled embryo's posture.Yellow lines connect seam cell nuclei laterally.The splines are used with the image volume to sweep planes orthogonal to the center spline, yielding the straightened embryo image.

Figure 7 :
Figure 7: The Pairs hypergraphical model uses expansive local contexts about each portion of the embryo.A: The Pairs hyperedges connect local seam cell nuclei in sets of four and six.B: Degree four hyperedges connect sequential pairs of seam cells while degree six hyperedges connect sequential triplets of pairs.The posterior-most degree four hyperedge and a central degree six hyperedge are bolded.

Figure 8 :
Figure 8: Hypergraphical geometric features contextualize seam cell assignments.Anatomically inspired geometric features describe bend and twist of a posture assignment.A: Three pairs of sequential nuclei: red, green, blue.Rectangles represent pair midpoints.The angle Θ in red is used as a degree six feature given six point to nuclei assignments.B, C: Degree four hypergraphical features measuring twist angles ϕ and τ .These angles measure posterior to anterior twist pair-to-pair and left-right twist, respectively.

Figure 9 :
Figure 9: EHGM applied to the sample image depicted in Fig 1-A.A: Two points are selected at the initial branch for TL and TR, respectively.Candidates for the successive pair, V6L and V6R, are queued based on hypergraphical relationships between the established cell identities TL and TR and each hypothesized V6 pair (lower costs are green to higher costs in red).B: The leading hypothesis at branch m=2 given the initial branch pair is chosen.The recursion continues to queue V5 pair choices at branch m=3.Black arrows within branch m specify the ordering of the branch given established cell assignments.Each branch creates a new subproblem of completing the posture given partially assigned identities.C: The tree continuing from the V5 pair hypothesis is fully explored according to the established recursion.D: The next leading V5 hypothesis is initiated upon exhaustion of the subtree formed at panel C. E: Implicit enumeration of the subtree formed at panel B causes the search to progress to the second leading V6 hypothesis.

l1l 1
aggregate from training data for higher variance patterns:s (X L , X L ) N(22)where X L and X L are the correct permutation and observed point set, respectively, for sample L. The variancecovariance matrix uses estimated means to estimate variances and covariances among feature measurements in the annotated data: a(X L , X L ) − ḡ(d) a )(g (d) b (X L , X L ) − ḡThe selection rule tensor dissimilarity tensors Z (d) ∈ R n × n, . . ., ×n 2d use both sets of estimates to compute costs.The Mahalanobis distance is used to describe the scaled distance between the observed attributed hyperedge to an estimated feature description.Let g(d) = [g l2l 2 ...l d l d = (g (d) − ḡ(d) ) ( Σ(d) g ) −1 (g (d) − ḡ(d) )