A Context-Aware Delayed Agglomeration Framework for Electron Microscopy Segmentation

Electron Microscopy (EM) image (or volume) segmentation has become significantly important in recent years as an instrument for connectomics. This paper proposes a novel agglomerative framework for EM segmentation. In particular, given an over-segmented image or volume, we propose a novel framework for accurately clustering regions of the same neuron. Unlike existing agglomerative methods, the proposed context-aware algorithm divides superpixels (over-segmented regions) of different biological entities into different subsets and agglomerates them separately. In addition, this paper describes a “delayed” scheme for agglomerative clustering that postpones some of the merge decisions, pertaining to newly formed bodies, in order to generate a more confident boundary prediction. We report significant improvements attained by the proposed approach in segmentation accuracy over existing standard methods on 2D and 3D datasets.


Introduction
Extracting the network structure among neurons in animal brain has gained prominence lately in the field of neuroscience. Rapid advances in imaging technology, in particular Electron Microscopy (EM) techniques, have enabled us to trace neural bodies in unprecedented level of details. However, recording in such high resolution (at nanometer scale) generates massive amount of data that is too large to annotate manually. Automated region labeling or segmentation is considered to be the most viable strategy for generating a dense reconstruction of neural anatomy. Some recent efforts of such reconstruction yielded impressive results utilizing machine learning/computer vision tools such as image segmentation, and offered valuable biological insights to the neuroscience community [1] [2].
Image segmentation for natural scenes has a long history in computer vision literature [3] [4] [5]. In recent years, there have also been many fruitful attempts to automatically identify meaningful regions in EM images using segmentation techniques [6] [7][8] [9][10] [11]. Most of these studies initially apply a pixelwise (we denote locations on both 2D and 3D EM data as pixels in this paper) classifier to determine whether or not any particular pixel belongs to the cell boundary. The quantified confidence values of the pixelwise classifier are utilized to produce an initial (over-) segmentation through methods such as Watershed [12].
Different approaches resort to different methods to generate the final segmentation by merging or clustering the over-segmented bodies to corresponding neuronal cells. Andres et.al. [7] addresses this problem by searching the optimal subset of superpixel borders that form closed surfaces. Several studies work with the watershed merge tree in order to identify the regions to be combined for the final segmentation [13] [14]. Some approaches applied agglomerative or hierarchical clustering [8][10] [15] for this purpose. For anisotropic datasets, where the depth resolution (z-dimension) is coarser than the planar resolution (x, y dimensions), segmented bodies on one section overlap with multiple regions in the adjacent sections. Therefore, a complete 3D reconstruction needs to establish the correct correspondence, through an alignment or co-segmentation technique [9] [11], among segmented regions across multiple planes. In this study, we restrict ourselves only to segmentation on images for anisotropic data-the subsequent alignment is out of the scope of this paper. For both isotropic and anisotropic reconstruction, the outputs of segmentation algorithms need to be corrected afterwards, either manually [2] or by combining with a manually traced skeletonized representation [1].
Biologically, the interior of a neuron cell comprises several distinct sub-structures (or subcategories) such as cytoplasm, mitochondria, vesicles etc. An ideal binary pixel classifierwhich assigns a pixel to one of the two categories: cell boundary and cell interior-should label all locations within these sub-categories to cell-interior class. Several past studies [16] [17][18] [19] recommend increasingly complex pixelwise detector models to attain a binary prediction output. In contrast, some recent works [15] [11] represent the sub-categories (cytoplasm, mitochondria etc.) of cell body by multiple classes and apply relatively simpler classifiers (in terms of model size, learning time and convenience) for this multiclass classification problem. The results of [15] [11] suggest that using prior domain knowledge to divide a problem into multiple components can achieve high segmentation quality with simpler classifier models requiring less computation.
However, we believe the methods of [15] [11] do not exploit the full benefit of multiclass predictions on EM data. Regardless of the quality of multiclass pixel classifier output, the algorithms in [15] [11] do not distinguish between regions of one sub-structure (e.g., cytoplasm) from those of another (e.g., mitochondria) at the superpixel level. That is, the classification is divided into multiple sub-classes in pixel-level, but the subsequent fusion or superpixel clustering step does not utilize this additional information to compute the final segmentation. This often leads to sub-optimal performances by these methods. For example, Fig 1 shows an EM image (plane in a 3D volume) and the corresponding pixel predictions for the mitochondria sub-class. By not using this sub-class prediction explicitly in the clustering step, the final output of [15] failed to merge many regions into the correct cell (marked by 'S') and connected some of them to wrong cells (marked by 'M').
This paper introduces a context-aware scheme for combining over-segmented regions by utilizing the prior knowledge of sub-classes. We adopt an agglomerative or hierarchical clustering framework [8] [10] due to its advantages such as low space, time complexity and flexibility to tune for over/under segmentation. We develop a two-pass agglomeration policy where the (estimated) cytoplasm regions are grouped together in the first phase and then the remaining mitochondria bodies are absorbed into the cell cytoplasm. In these two stages, the superpixels are agglomerated based on different merge criteria that are defined by different contexts, which is why we call it context-aware agglomeration. Our proposed context aware approach significantly reduces the false split and merge errors (example shown in Fig 1) provided fairly accurate sub-structure detection. In addition, this strategy substantially reduces the training data requirement, as well as the predictor model complexity, which in turn offers significant increase in learning speed. The findings of this study further inspired us to design an interactive training algorithm [20] for region boundary predictor that does not require exhaustively labeled groundtruth. Generating such an exhaustive annotation is considered to be a bottleneck for neural reconstruction [21].
We also propose a modified version of the hierarchical clustering algorithm to cluster the superpixels in both phases of the context-aware framework. The proposed clustering method emphasizes on minimizing under-segmentation errors since these errors are conventionally costlier to correct than the over-segmentation errors [8]. In order to minimize the number of false merges, we 'delay' the merge decisions on a certain type of boundaries to be resolved at a later time. Compared to the traditional agglomerative scheme of [10], the proposed modification reduces the number of false merges significantly. We also attempt to analyze why our agglomeration approach performs better than the Global multicut scheme [7] on the dataset used for our experiments.
The paper is organized as follows. We define the problem in Section 2 and briefly describe the existing clustering segmentation algorithms in Section 2.1. Then we explain the proposed delayed agglomeration scheme in Section 2.2. This delayed strategy is employed in both the stages of our context-aware algorithm discussed in Section 2.3. Section 3 reports our experimental setup, both quantitative and qualitative results and their analyses. We conclude and discuss our findings further in Section 4.

Methods
A formal definition of the problem we are addressing assumes an initial over-segmentation, comprising N superpixels {S 1 , S 2 , . . ., S N } S, of an EM image or volume with M neurites (neuronal regions) where N ) M. Let L(S) be the neurite region that S actually belongs to. Our goal is to correctly assign these N superpixels such that each S i , i = 1,2, . . ., N is assigned to its corresponding L(S i ).
We denote a boundary between two superpixels (i.e., oversegmented regions) by a pair of regions e ≜ {S i , S j } and the set of all such boundaries by E. In a graph representation, each of the regions S i is considered to be a node and the boundary or face between two regions is regarded as an edge-a notation we will be using throughout the paper. Also, let the boundary label map B:S × S ! {0,1} assign a 1 to a boundary that actually separates one neurite region from another and a 0 to the boundary incorrectly generated due to over-segmentation. The problem of correctly merging S i to its corresponding L(S i ) is similar to a clustering problem where the number of clusters cannot be computed a priori. Following [8][10] [15], we adopt an agglomerative approach for superpixel clustering.
In our context-aware scheme, the set of superpixels is divided into two subsets: 1) the S c of potential cytoplasm superpixels, and 2) the set S m of potential mitochondria superpixels. The set of cytoplasm superpixels is clustered first with the proposed delayed agglomeration algorithm. Agglomeration of the mitochondria superpixels is also performed by the proposed delayed method, but with a different merge criterion. In order to assist the reader to comprehend the novelty of the proposed approach, we introduce the prior studies on agglomerative clustering for EM segmentation [8][10] [15] in Section 2.1. Afterwards, Sections 2.2 and 2.3 discuss the delayed agglomeration and the context-aware framework respectively.

Prior Works on Agglomerative Clustering for EM Segmentation
Several existing EM segmentation approaches [8][10] [15] tackled the problem of superpixel clustering by agglomerative hierarchical clustering, as described in Table 1 Algorithm 1. These methods assume a superpixel boundary estimator h:S × S ! R that assigns real valued confidences to all edges in E. This boundary estimator may represent the real-valued prediction of a classifier distinguishing true boundaries from the false ones [10], or compute the mean value of boundary pixel probabilities [8] or return the overlap percentage between borders of two adjacent superpixels. The value of h({S i , S j }) 2 [0, 1] indicates how confident the estimator is about the existence of a true boundary between S i and S j : a large h({S i , S j }) implies the estimator is very confident that the boundary {S i , S j } is correct while a small value implies the boundary was probably generated as an artifact of over-segmentation and therefore is false. Given such a function h, the hierarchical clustering algorithm iteratively merges cell boundaries in the increasing order of confidence values h(e) (Line 1 in Table 1 Algorithm 1) until a stopping criterion is satisfied, e.g., h(e) > δ where δ is a pre-defined threshold. After each merge, it updates the neighborhood structure of the merged superpixel, i.e., the neighbors of the absorbed region become the neighbors of the (newly) merged cell.
Each time a superpixel border is dissolved in standard agglomerative clustering, it modifies the characteristic representations of the pixels within the superpixels and on the boundary. This demands the confidences of the estimator function h(e) on these boundaries be recomputed (Line 1 in Table 1 Algorithm 1). The edges, for which h(e) decreases due to a merge, receive higher priority to be dissolved than it had before. The proposed delayed agglomeration strategy modifies this step and postpones merging these edges for a later time.

Proposed Delayed Agglomerative Clustering
Our adaption of segmentation commences with the boundary with lowest estimator confidence and repeatedly dissolve edges with in ascending order of h(e). Recall that, h(e) may measure the prediction of a superpixel boundary classifier, or the mean probability values on boundary pixels, or the fraction of overlap between the borders of two superpixels. After two regions have been joined due to a merge, the boundaries of the combined region is updated and the estimator function h is applied to recompute the new confidences. The edges for which h(e) decreases are set aside to be considered at a later stage. They are reexamined after all the borders, initially generated by over-segmentation process, have been checked.
This method is described in Table 2 Algorithm 2. After region R j is absorbed into R i , we do not immediately consider all the new boundaries {R i , R b } between the recently merged R i and its updated neighbors R b . We maintain a set of edges W and insert the new edge Table 2 Algorithm 2). The faces, for which h({R i , R b }) decreases from previous value, are kept aside until there are no members left in W and the modified confidence on {R i , R b } is less than the agglomeration threshold (Line 2 in Table 2 Algorithm 2). Once all e 2 W have been considered for merge and W is empty, these boundaries repopulate the list W (Line 2 in Table 2 Algorithm 2) and renew the agglomeration process which continues until there exists no e such that h(e) δ.
Effectively, the proposed strategy 'delays' the merging of new edges {R i , R b } resulting from a merge: either due to an increase in h( To avoid propagating wrong decisions made on smaller superpixels to the larger ones, this design postpones the merge decisions on the newly formed bodies for a later time. Our analyses support Input: S 1 , S 2 , . . ., S N and confidence function h.
4 Merge R j to R i and update E; doi:10.1371/journal.pone.0125825.t001 that deferring decisions on these edges significantly reduces false merges during agglomeration.

Time Complexity.
Asymptotically, the running time of the delayed algorithm remains the same as the traditional agglomerative clustering in the worst case. Instead of adding the adjacent boundaries to the priority queue, the delayed algorithm stores them in a separate list. Later, building a queue from this list would require O(n 1 ) time where the length n 1 of new list must be smaller than that of the previous one (which contains all edges): n > n 1 .
Our implementation is tuned to reduce the running time of delayed agglomeration. Notice that, a subset of adjacent boundaries is not pushed back or updated into the queue (Line 13 of Table 2 Algorithm 2). We may as well apply a simple trick to avoid updates at each merge altogether: instead of increasing key of the edges with increased h value (Line 11 of Table 2 Algorithm 2), we can postpone the check and increase the key until it becomes a candidate for merge (Line 7 of Table 2 Algorithm 2) or in Line 6 when it being considered to be inserted into W. Thus, we can reduce the computation by O(dnlogn) where d is the degree of S 2 and n is the queue size.

Proposed Context-aware Segmentation
The proposed context-aware agglomeration is composed of two different phases. We separate the set S m of potential mitochondria superpixels from the set S c of potential cytoplasm superpixels assuming the existence of an effective mitochondria superpixel detector (e.g., [22]). The regions in S c are agglomerated first by the proposed delayed policy. Motivated by [6][10] [15], a Random Forest (RF) [23] classifier h c is trained to act as the boundary predictor function for clustering the set S c of cytoplasm superpixels. During h c training, mitochondria-cytoplasm borders are treated the same way as cell membrane.
In the second step, the mitochondria-cytoplasm edges are merged in the same delayed scheme as explained in Section 2.2, but with a different estimator function h m . In order to absorb mitochondria into corresponding cells, we apply the delayed-agglomeration algorithm Input: S 1 , S 2 , . . ., S N and confidence function h.  In effect, the mitochondria superpixels are combined with the cytoplasm superpixels in the descending order of the overlap ratio between these two types of regions. That is, a mitochondria superpixel is merged into the adjacent cytoplasm region with the largest overlap between their boundaries. The combined cytoplasm-mitochondria superpixel created by such merge then identifies the next mitochondria superpixel with the largest overlap to absorb in the next step. We show snapshots of this process, at different values of ρ(S m , S c ) in Fig 2. It is worth noting that, for 3D segmentation, the overlap is computed across many different planes on which the two cells are neighbors to each other.

Results
We have applied the proposed method to EM images of two different modalities: isotropic Focused Ion Beam Scanning Electron Microscope (FIBSEM) data and anisotropic serial section Transmission Electron Microscopy (ssTEM) data. For both types of input data, the image (volume for the isotropic data) is first over-segmented for the agglomeration to be applied on. In the following sections, we explain our over-segmentation process and the error measures used to evaluate segmentation performance before reporting the results on FIBSEM and ssTEM data in Sections 3.3 and 3.4 respectively.

Over-segmentation and training
We learn a classifier to assign each individual pixel into multiple categories, such as cell boundary, cytoplasm, mitochondria and mitochondria boundary, using the interactive tool Ilastik [24]. Our pixelwise detector is a Random Forest (RF) classifier [23] trained on a few sparse samples from the dataset. The locations with lowest pixelwise cell boundary prediction are utilized as markers for the Watershed algorithm [12] to produce an over-segmentation of the image/volume. Unless otherwise specified, the same pixel prediction and watershed regions are provided as input to all (competing) methods.
The set S m of probable mitochondria superpixels is populated with all regions possessing mean mitochondria probability (estimated by our pixelwise RF classifier trained by Ilastik) above a certain threshold. The rest of the superpixels constitute the set S c of possible cytoplasm regions. The training set for superpixel boundary classifier h c consists of all boundaries among members of S c as well as the mitochondria-cytoplasm borders. Similar to [6] [15], each superpixel edge is represented by the statistical properties of the multiclass probabilities estimated by Ilastik. The statistical properties include mean, standard deviation, 4 quartiles of the predictions generated for the data locations on the boundary, two regions it separates as well as the differences of these region statistics. All of these features can be updated in constant time after a merge-a property which improves the efficiency of the segmentation algorithm substantially. The code and example dataset are publicly available at https://github.com/janelia-flyem/ NeuroProof.git.

Segmentation error measures
We report segmentation error of both types, namely under-and over-segmentation, separately because one of these errors (under-segmentation) is costlier than the other. Split versions of variance of information (VI) [25] and Rand Error (RE) [10] were selected to evaluate segmentation errors. Given a groundtruth (GT), GT = {g 1 , . . ., g M }, and a segmentation (SG), SG = {r 1 , . . ., r P }, we compute the over-segmentation (OE) and under-segmentation (UE) errors by splitting the terms in VI and RE. For split-VI, the over and under-segmentation are quantified as follows.
In these equations, j Á j denotes the size, \ denotes the intersection between two regions and Z is a normalizing constant. From information theoretic perspective, these two terms are conditional entropies defined over a set GT given SG, and vice versa. We also quantify segmentation error by average percentage (× 10 −5 ) of pairs of voxels falsely merged and split by any method. Formally, the over-segmentation (OE) and under-segmentation (UE) is computed based on the following formula.

RE OE ¼ % pixel pairs within same cluster in GT but different cluster in SG: ð3Þ
RE UE ¼ % pixel pairs within same cluster in SG but different cluster in GT:

Segmentation Performances-FIBSEM data
Dataset: The first set of experiments was conducted on isotropic datasets from fruit fly visual system imaged at 10 nm isotropic resolution using FIBSEM technology. This data is segmented as a volume (i.e., 3D segmentation) and both the voxelwise multi-class predictor and the supervoxel boundary classifier are learned on one 250 3 volume and applied on two 520 3 test volumes.
Competing methods: We have compared the following algorithms in this study: 1) LASH: Standard agglomeration with an RF supervoxel classifier learned based on the iterative procedure of [10]. 2) LASH-D: LASH classifier with delayed agglomeration (proposed extension). 3) GALA [15]: an agglomerative method with repetitive learning phases like LASH, except it accumulates the training sets of multiple phases. 4) CADA-F: Proposed two stage delayed agglomeration with standard RF learned using training set accumulation similar to GALA. 5) CADA-L: Proposed delayed agglomeration with a depth-limited RF (depth = 20) learned without training set accumulation. 6) Global Multicut: the optimization framework for finding a closed-surface segmentation proposed in [7]. For [7], the boundary confidences were generated by the CADA-L predictor.
Performance evaluation: In order to compare different supervoxel clustering schemes, we trained (on one 250 3 volume) and segmented two 520 3 volumes 5 times and averaged their scores. We plot the average VI UE and VI OE respectively on x and y-axis respectively in plots on the left column of As the plots show, both the delayed agglomeration and two-phase segmentation process attained significant improvement over past methods: compare the performance of LASH (red +) with LASH-D (black x) and that of GALA (cyan Ã ) with CADA variants (green square and blue circle). Compared to the rest of the techniques, the two variants of proposed methods, namely CADA-L and CADA-F, appear to achieve the most favorable segmentations by reducing the over-segmentation steeply without increasing the false merge numbers much. During segmentation, the delayed version decreases the time needed for segmentation approximately 5 times among the agglomerative approaches.
It is also worth mentioning that, in a two stage segmentation scheme, the performance of a depth limited RF (i.e., CADA-L, green square), learned without accumulating training set over multiple passes, is very similar to that of the standard RF (CADA-F, blue circle) trained over cumulative learning passes. Training full-depth RF (CADA-F) with multiple passes needed several hours whereas training a depth limited single iteration (CADA-L) required 5 minutes. Fig 4 shows three sample planes from the test volume 1. In the following plot in Fig 5, we show example outputs of the methods LASH-D, GALA [15], Global multicut [7] and CADA-L on these planes. Three columns correspond to three planes, and each row presents the outputs of the aforementioned methods. The segmentation labeling is overlaid with artificial (randomly selected) colors. We have selected the parameter that results in the lowest false merges (undersegmentation) with a false split (over-segmentation) error below 0.7 for all except the proposed CADA-L for which we selected the lowest over-segmentation (error value approx 0.56). The results are largely compatible with the quantitative ones. All the three methods, especially Global Multicut, leave many false boundaries intact. The false-splits are not limited to cytoplasm mitochondria borders, both Global and GALA over-segmented some cytoplasm regions as well. By separating these two sub-classes within cell bodies, the proposed method CADA-L was able to eliminate the false merges between them. Table 3, we report the running times of the context oblivious standard agglomerative clustering used in LASH [10]; the context-oblivious delayed agglomerated clustering used in LASH-D, GALA [15]; the context-aware delayed agglomeration used in CADA-L, CADA-F; and the Global Multicut [7] method. Both the standard and delayed agglomeration were executed up to the same threshold δ = 0.2. The context-aware method executes two phases of agglomeration, which is why CADA required more time than LASH-D. The Global multicut algorithm utilizes the solution of an optimization problem (requires optimization packages like CPLEX or Gurobi) in order to find the edges to merge for producing the final segmentation.

Runtime comparison: In
All the algorithms, except CADA-L and Global Multicut, perform standard agglomeration multiple times (we repeated 5 times) in order to obtain extensive training sets for superpixel boundary learning. Both CADA-L and Global method exploited the same classifier learned from the initial set of boundaries existed in the over-segmented data (without training set augmentation).
In the following subsections, we analyze why the proposed strategies improve the segmentation performance over the existing approaches.
3.3.1 Context-aware vs Context-oblivious agglomeration. It is perhaps intuitive that traditional context-oblivious agglomeration will result in higher degree of over-segmentation than the context-aware method. The mitochondria-cytoplasm borders indeed have strong feature similarity with cell membranes and consequently superpixel boundary predictors cannot distinguish between these two types of borders perfectly. Recall that, for segmentation, we need to dissolve the mitochondria-cytoplasm border but retain the cell boundaries. In order to substantiate our claim, we trained a superpixel boundary classifier in context-oblivious fashion (0: false cell membrane, 1:true cell membrane) and computed its confidences on these two types of boundaries. Fig 6 shows the histogram of confidence levels for actual cell boundaries and mitochondria-cytoplasm borders in red and blue respectively. If we wish to minimize false merges among neurons, we have to stop agglomeration at a lower value (δ 0.3). The overlap between the two distributions in the range 0.1 * 0.5 suggests that many of the mitochondria borders will not be merged and will lead to over-segmentation.
In addition, due to appearance dissimilarity, the distribution of same features computed on cytoplasm and mitochondria will be substantially different from each other. Combining these two types of feature value distribution will impede the identification of false boundaries between cytoplasm superpixels such as the one in the lower left corner of the output of GALA in Fig 1. In practice, mitochondria from two different cells could also lead to false merges. Often the mitochondria regions from two cells are closely located to the cell membrane, or other mitochondria regions from neighboring cells, blurring the boundary.   than those of Global [7] method in clustering cytoplasm regions only (mitochondria not merged). In order to analyze why this happens, we save the initial confidences (predictor confidence at the beginning of agglomeration) of h c (e) on all e that • were incorrectly split (over-segmented) by the Global method, Note that, the agglomerative process correctly reduced the confidences of many false boundaries that received a high score by the predictor at the beginning (high x value but low y value). This refinement is possible through the evolution of the superpixels in the agglomerative process-an advantage the Global method of [7] cannot benefit from. The Global method [7], in comparison, generated many more false positive boundaries as depicted by the  If several boundaries within a chain of supervoxel faces receive very high predictor confidences, then, by construction, the Global method tends to retain the another boundary e within the same chain with a low h c (e). Such tendency may be the reason behind the high concentration of false splits with low h c within the rectangular region in Fig 8 (left).

Delayed vs standard agglomeration.
In order to illustrate the improved accuracy attained by the delayed agglomeration over the standard one, we collected all faces that were incorrectly dissolved by standard agglomeration algorithm (LASH) and examine their confidences under a delayed scheme (LASH-D) operating at δ c = 0.14 The confidences (clipped to 0.25) of these 534 edges generated by standard and delayed agglomeration are plotted in Fig 8 (right) in blue square and red + respectively. The proposed delayed agglomeration accurately increased the confidences h c of many of these faces, among which, 41 exceeded the threshold of 0.14 (green line) and avoided a false merge. In addition to these common supervoxel edges, the standard and delayed algorithms independently generated 163 and 4 more incorrect merges respectively.

Segmentation Performances-ssTEM data
This section reports the 2D segmentation results that our method and others produced on a different data modality, namely ssTEM images. These images were part of those generated for the work of [2] and were collected from the authors. Fifteen 500 × 500 images were used for training both the pixel and superpixel boundaries. We follow the techniques and 2D versions of features described in Section 3.1 for ssTEM data. The same pixel prediction and watershed regions are provided as input to all competing methods. The segmentation is performed on each image without connecting them across planes.  methods. Our method CADA-L produces less over-segmentation than GALA [15] in almost all threshold values. The result of the Global method [7] were too poor to show on this plotlowest over-segmentation error at 4.11 with 0.13 under-segmentation average.
In Fig 10, we show input images and the segmentation results (overlaid on the image) of GALA and our methods at the same under-segmentation error. Examining the qualitative output in Fig 10, GALA seems to struggle to absorb the mitochondria regions despite multiple learning iterations and even merges two cells in one occasion. While a more accurate mitochondria predictor could potentially reduce the segmentation errors of the proposed method, context-invariant algorithms such as GALA would be less effective around mitochondria

Discussion
We argue that, due to considerable ambiguity in appearances, it is only rational for an EM segmentation algorithm to be context-aware in each of its stages, i.e., in both pixel and superpixel levels (and in alignment for anisotropic data). The results reported in this paper support our claim that a context-aware clustering of sub-classes such as cytoplasm and mitochondria can improve segmentation accuracy significantly given fairly accurate sub-class detection. Our examination of both isotropic and anisotropic data suggests cell structures cannot be meaningfully identified without mitochondria regions and it is non-trivial to combine detection with a segmentation that ignores it (e.g., [7]) in order to produce the final segmentation. Our analysis also illustrates how a delayed agglomerative procedure benefits from the intermediate boundary probabilities and improves the efficiency of the segmentation process significantly.
In addition to reducing the over-and under-segmentation errors, one of the variants of our classifier, namely CADA-L, can be trained considerably faster than those in other methods because CADA-L demands substantially fewer training examples and no training iterations. A context-oblivious strategy gain significantly (compare LASH-D with GALA in Fig 3) by accumulating training set over multiple iterations. However, in context-aware approach, one does not benefit much by accumulating the training set (CADA-F in Fig 3) over a classifier trained from a single iteration (CADA-L). One possible explanation is that previous context oblivious strategies require the extra iterations to mitigate the impact of the noise introduced by mitochondria superpixels. This explanation implies that detecting the sub-classes, and considering them separately as necessary, is perhaps the key to train a boundary classifier accurately and efficiently.
We further investigated this conjecture and developed a semi-supervised active learning algorithm to train the supevoxel boundary classifier with as few as < 20% of the total examples [20]. The requirement of exhaustive labels is a critical bottleneck for automatic EM segmentation, especially for reconstructing larger brain regions, or whole animal brain, where one may anticipate the necessity to train several different classifiers [21]. The interactive training of both pixels (using Ilastik [24] for example) and superpixel boundaries (using [20]) holds the promise of removing the need for such complete groundtruth and paves the for scaling up the EM reconstruction algorithms.
We have applied our context aware algorithm to segment 216 FIBSEM volumes of 520 3 voxels each, with a 10nm isotropic resolution, from the Medulla region of fly retina. To our knowledge, this is an attempt to reconstruct one of the largest volumes for such animal. Compared to the result of [15] on two of the 520 3 blocks, our segmentation resulted in an estimated 30% reduction in subsequent manual correction time. In addition, our segmentation was sufficiently accurate for regions that pertains to Post-synaptic densities (PSD), i.e., the synaptic partners of a cell. During the manual annotation of these PSDs, the output of our segmentation method assisted the experts to improve their performances [26].
for his support with the visualization software Raveler and Margaret Jefferies for her assistance in writing.