The Filament Sensor for Near Real-Time Detection of Cytoskeletal Fiber Structures

A reliable extraction of filament data from microscopic images is of high interest in the analysis of acto-myosin structures as early morphological markers in mechanically guided differentiation of human mesenchymal stem cells and the understanding of the underlying fiber arrangement processes. In this paper, we propose the filament sensor (FS), a fast and robust processing sequence which detects and records location, orientation, length, and width for each single filament of an image, and thus allows for the above described analysis. The extraction of these features has previously not been possible with existing methods. We evaluate the performance of the proposed FS in terms of accuracy and speed in comparison to three existing methods with respect to their limited output. Further, we provide a benchmark dataset of real cell images along with filaments manually marked by a human expert as well as simulated benchmark images. The FS clearly outperforms existing methods in terms of computational runtime and filament extraction accuracy. The implementation of the FS and the benchmark database are available as open source.


Introduction
In the last decade it has become evident that for cell dynamics the mechanical environment can be as important as traditionally investigated biochemical cues [1,2].Especially striking is the mechanically induced differentiation of human mesenchymal stem cells (hMSCs) cultured on elastic substrates of different stiffness [3].During the early stage of this mechano-guided differentiation process in hMSCs, the structure and polarization of actinmyosin stress fibers play a key role and can be used as an early morphological marker [4] and analogously for myoblast differentiation [5].These stress fibers are approximately linear filaments of varying width and length.
Using fluorescence microscopy, filament arrangements can be visualized and reliable extraction of all filaments' location, length, width and orientation (e.g.relative to the cell's main axis) is essential for further processing and the fundamental understanding of the complex cell and matrix mechanics.In particular, studies of the dynamics of stress fibers where multiple time series of living cells are recorded in parallel over periods of 24-48 hours, demand a fast and reliable fiber detection algorithm.Such experiments are essential towards a more detailed understanding of the role of the acto-myosin cytoskeleton in mechanically induced differentiation [4].

The Filament Sensor and the Ground Truth Database
The above task demands from the filament sensor (FS) to extract I) fast and unsupervised II) robustly III) all filament features: location, length, width and orientation; where II) implies dealing with several specific problems illustrated in Figure 1 IIa) detecting darker lines crossing bright lines, IIb) dealing with image inhomogeneities and IIc) dealing with image blur.The FS proposed in this contribution is specifically designed to meet these challenges.Dealing with image inhomogeneity calls for the application of local image processing tools.Blurring effects will be mitigated by line enhancement through direction sensitive methods.Crossings of lines of varying intensities can be rather successfully detected by what we call line Gaussians which are specific oriented thin masks, cf. Figure 3.After local binarization, finally an adaption of the semilocal line sensor approach to fingerprint analysis [6] is applied to exhaustively extract all filament features.As the FS is modularized, employs local and orientation dependent image analysis methods and outputs the entire filament data, expert knowledge such as detecting fewer filaments in specific low variance areas, say, can be easily incorporated.
In order to assess our method we have devised two ground truth databases.One database comprises filaments in hMSCs of varying image quality, manually labeled by an expert.The other database has been simulated from ground truth filament data providing for a test scenario with completely controlled knowledge.
In order to compare our new method to existing methods (discussed in Section 4) we have to restrict our comprehensive output to the limited output of others.Specifically for eLoG (elongated Laplacian of Gaussian) method [4] as well as for the Hough transformation such limited output is given by angular histograms: accumulated pixel length of filaments per angular orientation.For the constrained inverse diffusion (CID) method [7] we can only compare sets of pixels which is also possible for the eLoG method.
A Java implementation of the FS, the ground truth databases, and a python script for evaluation are available under free open source and open data licenses under the project's web page http://www.stochastik.math.uni-goettingen.de/SFB755_B8.

Related Work
Despite the existence of a tremendous body of techniques for image processing and especially line detection (for an overview e.g.[8,Chapter 4]), currently the methods used for filament identification in cell images are often ad hoc, require partial manual processing and a fair amount of runtime, e.g.[4,9,7].The latter two requirements are particularly undesirable if large numbers of images are to be evaluated, as is typically the case.
Fundamental global methods in this context include the Hough transformation [10] and brightness thresholding via the Otsu method [11].However, variable brightness of cell plasma and filaments demand that such methods must at least be supplemented by local methods like the Laplacian of Gaussian (LoG, e.g.[12,13]), anisotropic diffusion (e.g.[14,15]), or a beamlet approach, e.g.[16].
Many algorithms exist that focus on the analysis of networks of strongly curved microtubuli (as opposed to the properties of single filaments), such as line thinning [17], active contours [18] and the recently proposed constrained inverse diffusion (CID) method [7].Other methods which are geared toward extraction not only of filament pixel position but also of local orientation include the FiberScore algorithm [19], elongated Laplacians of Gaussians (eLoGs) [4] and gradient based methods [20,9].

FS
The eLoG method, like the gradient method aims at detecting not only filament pixels but also their orientations.While filament length and width are not extracted, counting the number of pixels per orientation, these methods yield histograms of cumulated filament length per orientation angle which are then further analyzed [4,9].Line thinning and CID identify only a skeletal filament network structure without filament orientation, length, nor width.Rather, these methods are geared towards detection of thin microfilaments and not of wide stress fibers we are interested in.
The CID method uses a maximum cross correlation coefficient method for templates with different orientations as a preprocessing step.While this yields considerable line enhancement in many cases, in some cases it leads to situations where bright lines sever fainter lines they cross, as illustrated in Figure 3. Certainly, this is undesirable in view of the aforementioned challenge IIa.One could try to account for this by adding cross masks to the list of filters.This, however, increases the number of filters and thereby calculation time considerably; additionally bright lines acquire a halo.
The FiberScore program [19] produces local orientation and centerline images and provides global information on accumulated line length and average width.It produces no line objects as output.It turned out that the methods applied in FiberScore did not yield optimal results for our cell images.The most important obstacle, however, for using FiberScore is that neither the program's nor its framework's source codes are freely available.Although the original developer has very helpfully supported us to make the program run we could not tailor FiberScore to our needs.
An overview over the different types of output of some methods is given in Table 1.

Implementation of the Filament Sensor (FS)
The cells used in this work were acquired from a commercial vendor[21] which anonymized personal data of cell donors.
Our proposed method decomposes into two parts.First, a set of specifically tailored image enhancement and binarization procedures are applied, followed, secondly, by a width aware segment sensor, which extracts filament data from the binary image.Suitable default values detailed below have been determined by expert knowledge on hMSCs images of a database separate from the two databases we have tested on in Section 4.

Parameter Choice and Typical Workflow
The empirically determined default values proposed by the FS yield good results for most images.Otherwise, starting from the default values a local optimum in the parameter space can usually be determined very quickly.Tracing results are fairly weakly dependent on parameter variation in the optimal parameter region and image properties do not vary strongly over individual 24-48 hours time series of images with overall acceptable quality.Therefore, to commonly adjust parameters for a whole batch of images, it is usually sufficient to consider the first and last image only.In this sense, the FS is "semi-automated".

Preprocessing and Binarization
Apart from an adjustable contrast enhancement and a normalization of the image to 256 grey levels, only local methods are employed.The first two preprocessing steps use generic Gaussian and Laplacian filters with adjustable variance and magnitude (with empirically successful default values), respectively, the order of which can be interchanged.
The linear Gaussian.In the third preprocessing step we accumulate intensities along rod kernels which have been used for cross correlations [19].This filter is orientation sensitive and thus very well suited for line enhancement.It is comparable to the eLoG approach but performs much better in terms of calculation time because the filter uses a Gaussian mask which is then restricted to lines of different orientations only and convolved with the image.In our implementation we choose the filter size to be l = 2 • 3σ + 1 pixels (here " • " denotes rounding to the next integer and σ is 3 pixels by default) and we use 2l − 2 lines; one line per pair of opposite boundary pixels of the mask.Because the filter operation is only performed on pixels on a line, it is by a factor of approximately l faster than the eLoG approach which uses square filter masks.The highest response is taken as the new grey value of the pixel.In fact, this filter proves to be very successful at enhancing darker lines crossing bright lines, cf. Figure 3. Local binarization is achieved via a combination of Gaussian weighted adaptive means [22] and global thresholding, where the global threshold is not compared to single pixels but to neighborhood means.
Homogeneous noise cancellation is achieved by one of two filters.The default filter calculates mean brightness along rods as used in the linear Gaussian filter around the white pixels of the binary image and then thresholds the ratio of standard deviation to a function f of the mean µ of these mean brightnesses.The simpler alternative filter thresholds the ratio of Gaussian weighted standard deviation and the same function f of the mean µ of pixel brightnesses in neighborhood of white pixels.The function f is given by Optionally, a filter performing morphological "closing" [8, Chapter 3.3.2]can be applied to the binary image.

A Width Aware Segment Sensor
After preprocessing and binarization, filament data is now extracted from the white pixels using the following algorithm.Denote the loci of the white pixels by W .
1. Every white pixel at locus (x, y) ∈ W is assigned a width, w(x, y) which is the largest radius of a circular mask (cf. Figure 5) in which the ratio of white pixels of the binary image is above an adjustable tolerance (with default value 95 %).
In particular, this gives a range of widths w 1 < . . .< w k attained by pixels in W and it extends the binary image W to a nested sequence of binary images W j = {(x, y) ∈ W : w(x, y) ≥ w j }, j = 1, . . ., k.The filament data set F and the orientation field O are each initialized by the empty set.
2. For every j = k, . . ., 1 in decreasing order, apply the segment sensor to W j as follows.
(a) For each pixel (x, y) ∈ W j the segment sensor probes into each direction (360 by default giving 180 orientations) for the maximal length at which pixels can be found, connected by a straight line to (x, y) in W j as illustrated in Figure 6.The orientation for which the added length of the two opposing directions is largest, is stored, if the added length exceeds an adjustable threshold of minimal filament length (20 pixels by default).Also, the corresponding endpoints are stored along with additional data that uniquely defines all pixels on the corresponding segment.This yields a new data set of segments, on which stored pixels lie.
(b) In the next step, segments are called in the order of their length, long segments first.For every segment, the orientation field is looked up.If less than 30 % of its pixels have a conflicting orientation entry in O, -i.e. the entry in O differs by less than an adjustable minimal tolerance angle (of default value 20 degrees) from the segment's orientation -the segment is accepted as valid.For pixels lying on the segment with width increased by two (in order to avoid duplication), the segment's orientation is stored to O if O does not yet have an entry there.
The segment is then also added to F. If at least 30 % of the pixels on a segment have a conflicting orientation, we have the following cases.
i.If O does not carry a conflicting orientation for any of the endpoints, the segments is discarded.ii.Otherwise, the endpoints with conflicting orientations are iteratively removed from the segment until the remaining segment's endpoints have no longer a conflicting orientation.If the resultant segment length is above 75% of the threshold of minimal filament length, this new segment is added to list of segments and the original one is removed.The new segment is revisited when its length is called.
As lines are blurred due to scattering and as the preprocessing usually enhances line width, the FS tends to find greater line width than a human expert.Taking into account this expert knowledge, the FS returns a filament width reduced by one for filaments with width larger than 1.Such and other adjustments are feasible for the FS because it extracts filaments individually.

Proof of Concept
The performance of the FS is assessed via two ground truth databases and compared against three existing methods for which implementations are available.eLoG (elongated Laplace of Gaussian) where we rely on [4] but use a faster implementation of our own, HT (Hough transformation) following [10], and CID (constrained inverse diffusion) by [7].
Since none of these methods provides for the full filament data, as elaborated in Section 2, comparison based on ground truth is only possible via angular histograms recording accumulated length (for eloG and HT) or via centerline pixel detection (for CID and eLoG) detailed below.

Ground Truth Databases
To obtain ground images of different quality, images of cells that were bleached by long exposure to light, less bleached cells from time series with moderate exposure to light and completely unbleached fixed cells have been chosen.From this pool, 10 images, two of very poor quality (labeled 1 and 2 below), two of poor quality (3 and 4), three of medium quality (5 through 7), and three of good quality (8 through 10) (cf. Figure 1) have been selected and their stress filament structure manually labeled by an expert.
For simulations of another 10 images we take a simple straightforward approach.The filament process is viewed as a random marked Poisson point process (e.g.[23]) where the marks indicate length and orientation of filaments centered at the respective points.Only filaments with their center point contained in an independently generated ellipse are recorded (unbiased sampling) and the cytoplasm background is mimicked by strongly blurring a subset of the filament pixels.We let angular orientation follow an independent mixture of one or two wrapped Gaussians -giving an angular distribution with a realistic mode pattern [24].Grey levels of cell background and of filaments are independently perturbed; the resulting image is blurred by a Gaussian filter, and finally, independent Gaussian white noise is added.
Like the FS, also eLoG and CID allow for visual display of filament pixels.Along with the ground truth we provide such images in the top row of Figure 7 and in the left top display of Figure 10 for a typical real cell image as well as in the top row of Figure 8 and in the right top display of Figure 10 for a typical simulated cell image.

Angular Histograms
As noted before, the two methods eLoG and HT allow for angular histograms (accumulated filament length per angular orientation) as displayed as well for the FS in the bottom two rows of Figures 7 and 8.

Total histogram mass.
Inspecting the ratio of detected total mass of histograms over total mass of ground truth histograms (top row of Figure 9) we observe the following.
1.For real cells, all three methods detect too much mass, the amount of excess mass correlates negatively with image quality.This effect is consistently strongest for HT (the maximum is 72.2-fold mass) and weakest for the FS (between 1.2-and 3-fold mass) while the eLoG method yields results in between (between 3-and 12.9-fold mass).The good performance of the FS is a consequence of the usage of the expert knowledge that blurred lines appear broadened in the binary image.By their very construction this expert knowledge cannot be incorporated by the alternate methods.
2. For simulated cells, HT detects consistently slightly too much mass (between 1.3and 2.8-fold) while eLoG and FS detect consistently too little mass (between 0.54and 1.00-fold).
Normalized histograms are obtained by dividing by total histogram mass, making them comparable to one another, and in particular to ground truth normalized histograms.Figure 7: Images and angular histograms for a hMSC stem cell (black curves from kernel smoothing with a Gaussian of standard deviation σ = 10).Top row: filament segments are marked yellow (correctly identified), red (non-identified true segments), and green (erroneously identified non true segments).
For comparison we use the shortlist implementation [25] of the earth movers (or 1st Wasserstein) distance of histograms [26] with arc length as ground distance.Although an evaluation building on normalized histograms only, is misleading as it ignores excess mass which is unknown in general.For a complete picture with ground truth available, however, from the illustration in the bottom row of Figure 9 we observe that 1. on the average the FS yields the lowest distance, directly followed by eLoG, where  the FS outperforms HT in most of the cases; 2. the eLoG method performs slightly better than the FS on smaller cells with small ground truth mass; those sparse histograms are very sensitive to detection errors, an effect which is damped by the large excess mass the eLoG method produces; In consequence the FS is (on the average) the most robust even in terms of histogram shape only, ignoring excess mass.

Pixel Identification
The CID method does not aim at identifying filament orientations, furthermore, it only returns pixels along thin filament skeletons having width one.For comparison in terms of correctly identified filament pixels we thus define the following procedure.
Let N t denote the number of ground truth pixels for which at least one pixel identified by the method is in a 3 × 3-square around it, N fn the number of other ground truth pixels and N fp the number of pixels detected by the method for which no ground truth pixel is in a 3 × 3-square around it.We define the false negative ratio as R fn = N fn /N where N = N t + N fn and the false positive ratio as R fp = N fp /N .The results are displayed in the bottom two rows of Figure 10.This comparison which can also be performed for eLoG shows that 1. for real cells, in terms of detecting true filament segments, consistently, eLoG performs best, followed by the FS, which is closely followed by CID; this effect seems not to correlate with image quality;  2. for real cells, in terms of non-detection of non-marked filament segments, consistently, CID performs best, closely (note the logarithmic scale) followed by the FS and further followed by eLoG; this effect -which is much stronger than the one precedingcorrelates negatively with image quality and once again illustrates the tendency of the eLoG method toward detecting consistently far too many filament pixels; 3. for simulated cells, similar effects are visible, yet on a much smaller scale, however, in terms of missing true segments, the FS outperforms CID and in terms of overdetection, the FS performs equally less optimal as eLoG, where the absolute rates are very low for all methods.
In summary, CID finds too little pixels and the eLoG finds too many while the FS achieves the balance between detecting too many and too little filament pixels.

Speed
Using an orientational grid of 180 angles, eLoG takes more than 20 minutes per image compared to approximately 20 seconds for the FS and for HT (these runtimes have been observed on a single core 1.73 GHz Intel Celeron with 2 GB ram).In contrast, the CID method required more than 20 hours (running on an AMD Opteron 6140 with 32 cores at 2.6 GHz and 128 GB ram).This comparatively short runtime in connection with the semi-automated nature of its workflow (fast, uninterrupted batch processing) thus makes the FS a viable tool for the task of analyzing the actin cytoskeleton via time series of live cell images.

Discussion
For the task of automated and exhaustive extraction of straight filament structures in images, in particular from live human mesenchymal stem cells (hMSCs), we have developed the Filament Sensor (FS).Reliable extraction of the entire filament data is essential for future analysis, e.g. for detailed statistical studies of the relationship between environment elasticity and early differentiation of stem cells and myoblast differentiation [3,4,5] .To the knowledge of the authors, no method has previously been available for this task.
We have provided for a proof of concept by checking against a database manually marked by a human expert and a simulated database.Moreover, we have compared our FS against three other methods, the output of which, however, encompasses only a small portion of the entire filament data we are interested in.In this comparison, the FS performs best in terms of histogram mass and histogram distances and takes a middle ground between false positives (where, for real cells, eLoG is best and CID is worst) and false negatives (where, for real cells, CID is best and eLoG is worst).
In terms of speed, the FS overwhelmingly outperforms all competitors.While the HT would require additional tracing, eLoGs are slower by a factor of more than 60 and CID by a factor of more than 10 3 (on a stronger machine).Clearly the latter is beyond any acceptable runtime for the processing of thousands of images we are aiming at.
We can thus say that the FS provides an automated fast and robust tool to exhaustively extract complete filament data from hMSCs images.
At the end of this exposition we would like to point out two challenges we leave for future work.
1.As the FS starts finding wide lines first and then proceeds to thinner lines, it will sometimes detect lines of variable width as several pieces with different width.These line fragments could be matched to produce a single long line.
2. A future version of the FS may include slightly curved stress fibers and a correspondingly appropriate matching procedure.We plan to explore the application of curved Gabor filters [27] for detecting curved stress fibers and for coping with fiber crossings.
Recently, Gabor wavelets have proved to be beneficial for a similar application in retinal vessel tracking [28].

Figure 1 :
Figure 1: Images of human mesenchymal stem cells of varying quality in view of filament expression.Scale bars are 50 µm.

Figure 3 :
Figure 3: The cross correlation filter suppresses the faint line running from the upper left to the lower right which is enhanced under line Gaussians.

Figure 4 :
Figure 4: Left: an isotropic two dimensional Gaussian.Middle: its restriction to a line.Right: an elongated Gaussian.

Figure 6 :
Figure 6: Some connectivity lines (left) with detail (right) along which the segment sensor probes.

Figure 8 :
Figure 8: Images and angular histograms for a simulated cell (black curves from kernel smoothing with a Gaussian of standard deviation σ = 10).Top row: filament segments are marked yellow (correctly identified), red (non-identified true segments), and green (erroneously identified non true segments).

Figure 9 :
Figure 9: Top row: total mass found in relation to ground truth.Bottom row: earth movers distances in arc length of normalized histograms to ground truth.Data points corresponding to same methods are connected only for better visualization.
(a) CID for the real cell from Figure 7 (b) CID for the simulated cell from Figure 8 R fn for simulated cells

Figure 10 :
Figure 10: Comparison of R fn (false negative ratios) and R fp (false positive ratios).Data points corresponding to same methods are connected only for better visualization.

Table 1 :
Data collected by various methods.