Probabilistic fluorescence-based synapse detection

Deeper exploration of the brain’s vast synaptic networks will require new tools for high-throughput structural and molecular profiling of the diverse populations of synapses that compose those networks. Fluorescence microscopy (FM) and electron microscopy (EM) offer complementary advantages and disadvantages for single-synapse analysis. FM combines exquisite molecular discrimination capacities with high speed and low cost, but rigorous discrimination between synaptic and non-synaptic fluorescence signals is challenging. In contrast, EM remains the gold standard for reliable identification of a synapse, but offers only limited molecular discrimination and is slow and costly. To develop and test single-synapse image analysis methods, we have used datasets from conjugate array tomography (cAT), which provides voxel-conjugate FM and EM (annotated) images of the same individual synapses. We report a novel unsupervised probabilistic method for detection of synapses from multiplex FM (muxFM) image data, and evaluate this method both by comparison to EM gold standard annotated data and by examining its capacity to reproduce known important features of cortical synapse distributions. The proposed probabilistic model-based synapse detector accepts molecular-morphological synapse models as user queries, and delivers a volumetric map of the probability that each voxel represents part of a synapse. Taking human annotation of cAT EM data as ground truth, we show that our algorithm detects synapses from muxFM data alone as successfully as human annotators seeing only the muxFM data, and accurately reproduces known architectural features of cortical synapse distributions. This approach opens the door to data-driven discovery of new synapse types and their density. We suggest that our probabilistic synapse detector will also be useful for analysis of standard confocal and super-resolution FM images, where EM cross-validation is not practical.


Introduction
Deeper understanding of the basic mechanisms and pathologies of the brain's synaptic networks will require advances in our quantitative understanding of structural, molecular, and functional diversity within the vast populations of individual synapses that define those networks [1] [2] [3] [4]. Regardless of the subject of interest, synapse heterogeneity makes assay at the single-synapse level paramount. Here, we introduce and characterize a novel image analysis method for automated detection and molecular measurement of individual synapses and single-synapse molecular profiling of diverse synapse populations from multiplex fluorescence microscopy (muxFM) image data. The proposed methodology for structural identification and molecular analysis of single synapses at scale will be an enabling step toward deeper experimental analysis of the relationships between synaptic structure, molecules, and function. Reliable, high-throughput methods for large-scale synapse detection will also help to analyze volume images large enough to contain complete neural arbors, and thus to allow discernment of the relationships between detected synapses and their presynaptic and postsynaptic parent neurons [5].
The synapse detection methodology described here is not the first to grapple with the challenges of detecting synapses in immunofluorescence images [6] [7] [8] [9] [10]. The special utility and novelty of this tool partially lies in (1) producing outputs in the form of probability maps, reflecting the limited certainty with which synapses can be detected by most experimental modalities [11], and (2) the superior utility for both interactive and algorithmic exploration which is conferred by the query-based architecture resulting from the unsupervised framework. The probabilistic detection algorithm we introduce has perhaps its closest precedent in probabilistic synapse detectors that were introduced recently for the analysis of Focussed Ion Beam Scanning Electron Microscope (FIBSEM) images [12] [13]. The relationship in particular with [8] will be further discussed later in this paper.
Single-synapse profiling of large and diverse synapse populations poses formidable challenges [14] [15] [16]. Electron microscopy (EM) of appropriately labeled specimens defines the current 'gold standard' for synapse detection: the nanometer resolution of EM is necessary for the unambiguous identification of defining synaptic features such as presynaptic vesicles, synaptic clefts, and postsynaptic densities [17] [18]. Unfortunately EM data acquisition is www.nsf.gov); GS; and U.S. National Geospatial Intelligence Agency (HM0177-13-1-0007, HM04761610001, www.nga.mil); GS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: I have read the journal's policy and the authors of this manuscript have the following competing interests: SJS and KDM have founder's equity interests in Aratome, LLC (Menlo Park, CA), an enterprise that produces array tomography materials and services. SJS and KDM are also listed as inventors on two US patents regarding array tomography methods that have been issued to Stanford University.
technically difficult, slow, burdened by large data processing and storage requirements, and offers only limited capacities to discriminate amongst the hundreds of different synaptic proteins that constitute the synaptic proteome. In contrast, fluorescence microscopy (FM) of tagged specimens is much faster and less expensive, easier to segment for analysis, and offers much greater molecular discrimination power. Unfortunately, the ability of FM to detect and discriminate individual synapses is compromised by resolution limits and the close crowding of synapses in most neural tissue specimens of interest. Robust FM detection of synapses is nonetheless potentially possible by combining measures that extend resolution limits and multiplexing for localization and co-localization of synaptic markers. In designing the algorithm and software reported here, we first relied on images acquired with conjugate array tomography (cAT), which combines the strengths of FM-AT with those of electron microscopic array tomography (EM-AT), allowing both EM and muxFM imaging of individual cortical specimens. Array tomography's ultrathin physical sectioning provides zaxis resolution far beyond the light microscopic diffraction limit, as well as high sensitivity and high lateral resolution, while greatly simplifying voxel-conjugate registration of FM-AT and EM-AT images. FM-AT can moreover multiplex large numbers of synaptic markers by its combination of sequential and spectral label multiplexing. Thus, cAT provides an ideal platform for the development and rigorous design and testing of algorithms aimed at single-synapse molecular analysis and population molecular profiling.
The remarkable structural and molecular diversity within mammalian synapse populations challenges our present biological understanding of how to define a synapse [11]. Difficulties also arise from a very broad distribution of synapse size, with the smallest synapses occurring at the highest frequencies. Thus, detection of a synapse inevitably involves setting some minimum-size criterion for any candidate cell-cell contact specialization to qualify as a synapse. For FM, the 'size' metric is typically the intensity of one or more fluorescent synaptic protein tags. The fact that there are clearly non-synaptic 'backgrounds', and that the observed size distributions are log-normal, enforces high sensitivity of synapse detection on some rather arbitrary threshold minimum size value. This sensitivity in turn makes key results of widespread interest, such as the synaptic density in a region or the presence/absence of a synapse at a given microscopic site, uncomfortably dependent on that same size threshold value. The probabilistic synapse detector proposed here may lead both to relief from such arbitrary-threshold (parameters) and to improvements in our biological understanding of what defines a synapse.
The unsupervised probabilistic synapse detector reported here accepts molecular-morphological synapse models in the form of user queries, and delivers a volumetric map of the probability that each voxel represents part of a synapse. These maps can then be used directly to detect, classify, and map putative synapses, with confidence statistics for each. Taking human annotation of cAT EM data as ground truth, we show that our algorithm detects synapses from muxFM data alone as effectively as human annotators (while seeing only the muxFM data), and can reproduce known architectural features of cortical synapse distributions. The algorithm is actually validated with the most comprehensive AT datasets currently available. Though we here address only array tomography image data, our probabilistic synapse detector may also be useful for analysis of widely available confocal and super-resolution FM images.

Overview
The proposed algorithm is inspired from biological knowledge of synapse characteristics. Synapses include two major structural components: a presynaptic terminal and a postsynaptic terminal. Detecting synapses using data from immunofluorescence imaging involves identifying such adjacent presynaptic and postsynaptic antibody markers, as shown in Fig 1, which diagrams the locations of four major excitatory synaptic proteins. Fig 2 is an example of an excitatory synapse with images of presynaptic and postsynaptic antibody markers (synapsin and PSD-95) overlaid upon an EM image. For this example, only two antibody markers are shown for visual simplicity-in practice, any number of presynaptic or postsynaptic antibody markers may be used by the proposed algorithm for synapse detection.
Manual synapse identification involves determining the punctum size and brightness in one channel, and then considering adjacency to similarly-defined puncta in other channels. However, without corresponding EM data, detections using only IF data have an associated degree of uncertainty. Thus, we propose a query-based probabilistic synapse detection method that reflects the thought process underlying expert manual synapse detection.
The first step is to distinguish signal from background noise. This calculation encodes the probability that the pixel value represents authentic antigen detection. The second step is to determine whether the foreground pixels correspond to a 2D punctum, since photons emanating from only a single pixel usually reflect noise. Therefore, adjacent 'positive' pixels, more likely to reflect a synaptic punctum, are augmented. Third, puncta that span multiple slices have a higher probability of belonging to a synapse than those that do not. To visualize this effect, the probability of a punctum belonging to a synapse is attenuated based on whether the prospective punctum spans multiple slices. The last step in computing the synapse probability map is to evaluate the presence of adjacent presynaptic and postsynaptic puncta by correlating the corresponding IF volumes. This produces a probability map, where the value at each voxel is the probability it belongs to a synapse. This algorithm provides a general framework for the evaluation of a wide variety of synapse subtypes, user-defined by setting the presynaptic and postsynaptic antibodies and puncta size. . An axon terminal packed with small round vesicles of neurotransmitter (right) is closely apposed to a dendritic spine; at the junction a slightly increased electron density on the presynaptic plasma membrane ('presynaptic active zone') is precisely matched across the about 30 nm wide synaptic cleft by a dark extension into the dendritic spine, the 'postsynaptic density.' This synapse is perforated (the slight break in increased density halfway along the synapse). The membranous structure within the spine head is a 'spine apparatus.' Because of a fortunate plane of section, the plasma membrane of this spine is continuous with its parent dendritic shaft (left edge of photo), which contains longitudinally-sectioned microtubules. The scale bar represents 500 nm. Right: Cartoon diagramming the molecular architecture of an excitatory PSD-95-expressing synapse [19]. Basic biological knowledge about synapse structure and protein composition as depicted in this figure is used to inform the proposed query-based probabilistic algorithm. https://doi.org/10.1371/journal.pcbi.1005493.g001 The following sections describe in detail each step in the process, as diagrammed in Fig 3. Before that, let us relate our proposed approach with the current state-of-the-art method [8], which inspired this work. The method in [8] requires manually annotating a large number of excitatory synapses using the EM data, and then using this as labeled data for training (supervised training). EM data allows the user to differentiate between symmetric and asymmetric synapses, but does not allow for subtype identification (limited labels/supervision). Thus, the support vector machine (SVM) classifier used in [8] is trained with synapses containing the marker for PSD-95, but does not take into account synapses without the PSD-95 marker (limited classes in the supervision). In contrast, the approach here proposed and detailed below is unsupervised, allowing the user to detect synapses with multiple proteometric compositions without first using other methods to identify large numbers of synapses for training. Our proposed approach does not require the user to manually inspect associated EM for training data; in fact, we do not require associated EM data with the IF data. Instead, we enable the user to 'define' (biologically inspired) a synapse by specifying which synaptic markers should be present (query) and what the minimum size of those markers should be, allowing a more class  specific synaptic search. This is critical also for the discovery of new types of synapses, exploitation of new markers, and data-based discovery. The method in [8] will need re-training for every new class they want to find in the IF (potentially even new data acquisition protocols as well). There is a direct numerical comparison of the two methods in the experimental section, showing that the proposed algorithm is not only unsupervised and more broadly applicable than [8], but it actually outperforms it in the cases where both methods can be used.
Step 1: Computation of foreground probability Raw immunofluorescence image data is noisy; for example, speckles of the antibody markers often bind with cellular structures not associated with synapses, such as mitochondria. In addition, fluorescence imagery contains signal from sources other than fluorescently-labelled antibodies, e.g. from background autofluorescence. Finally, all digital imagery contains inherent noise from sources such as camera read noise and photon shot noise. The noise produced by these sources is usually smaller in magnitude than that originating from authentic synaptic labeling, but it cannot simply be filtered out and dismissed from consideration, since the signal may originate from a true synaptic site, and we want to allow for the possibility that a concordance of weak evidence will lead to the detection of a synapse. Thus, the first step of the algorithm consists of differentiating the bright voxels, the foreground (potential objects of interest), from a noisy background in a probabilistic fashion.
IF data volumes, when stained for synaptic markers, are also extremely sparse-approximately 2% of the voxels in the dataset belong to the foreground, as indicated in Fig 4. Therefore, the IF image volume can be used to approximate the distribution of the background noise.
Let v(x, y, z) be the intensity value of a voxel at position (x, y) in slice z, for a given channel of the IF data. A probabilistic model, p B , is computed which characterizes all the pixels that belong to the background, which includes approximately 98% of the voxels. The background noise model is computed independently for each slice to account for variations in tissue and imaging properties. The background model p B is assumed to be a Gaussian distribution, whose mean and variance ðm B ; s 2 B Þ are empirically computed from each slice z (the z index is omitted in Eq 1 for simplicity of notation). Then, the probability of a voxel belonging to the background, i.e. not being 'bright', is given by Therefore, the probability of a voxel associated with the foreground, p F , is computed as Step 2: Probability of 2D puncta Once foreground pixels have been identified in a probabilistic fashion, the next step is to determine if they form a 2D punctum. Since synapses appear as bright puncta in the IF image data, voxels which form puncta should have a higher probability of being associated with a synapse than those which do not. The probability of a voxel belonging to a 2D punctum, p P , is computed by multiplying the voxel's foreground probability by that of its neighbors in a predefined neighborhood region, where W is the neighborhood size, defined by the smallest expected punctum size. These operations are analogous to applying a box filter on the logarithm of the probability map, for computational efficiency. In our experiments, W was set to be slightly larger than the size of the point spread function of the microscopes used.  The proposed approach clearly differentiates the high probability bright pixels from the background low probability pixels. The 'dark' rings around the puncta are an artifact of the deconvolution performed prior to image alignment, and its spatial extent has been taken into account in the spatially-oriented next steps of the algorithm. The AT data appears 'quantized' because it has been upsampled from its native 100 nm per pixel resolution to 2.33 nm per pixel to align the AT data with the EM data. probability it belongs to the foreground to the probability it belongs to both the foreground and to a punctum.
Step 3: Probability of 3D puncta Potential synaptic puncta can span multiple slices of a given channel; puncta that span multiple slices have a higher probability of being associated with a synapse than those that do not. Therefore, we propose a factor f(x, y, z) which diminishes the probability values associated with voxels which do not maintain a similar probability value in adjacent slices, f ðx; y; zÞ ¼ exp À X j¼j end j¼j start ½p P ðx; y; zÞ À p P ðx; y; z þ jÞ The pixel's 2D puncta probability is compared to that of its neighbor in slice(s) before, j start , and slice(s) after, j end . The number of slices compared is dependent on the input size parameter for each antibody. The factor attenuates values for 2D puncta that do not span the required number of slices, as shown in Fig 7. The 3D puncta probability map is then computed by multiplying the 2D puncta probability map by this factor, which further improves the probability of a detection by considering the slice-to-slice spatial distribution, going from 2D to 3D.
Step 4: Adjacency of presynaptic and postsynaptic puncta In electron microscopic images, synapses are identified by the presence of synaptic vesicles on the presynaptic side, the close adjacency of the membranes of the presynaptic axon terminal to a postsynaptic dendrite or dendritic spine, and the presence of a distinct postsynaptic specialization, as diagrammed in Fig 1. Synapses are identified in IF data by the close spatial arrangement of pre-and postsynaptic antibody markers, which correspond to proteins associated with synapses. Therefore, the next step in our approach is to look for the presence of presynaptic puncta in the neighborhood of postsynaptic puncta. More precisely, for each postsynaptic antibody voxel (i.e., PSD-95), we search in the adjacent 3D neighborhood of the corresponding presynaptic (i.e., synapsin) volume for a high intensity probability signal. To accomplish this, a rectangular grid is defined in the presynaptic channels around each postsynaptic voxel, as shown in Fig 8. The size of the grid is defined by the initial query parameters, which depend on both the inherent biology and microscope resolution. The logarithm of the 3D puncta probability map Eq (5) is integrated in each grid location and the maximum is taken as presynaptic signal level around the given postsynaptic location, where the grid G is centered at the current voxel (x, y, z) and divided into K × K × K subregions G k . To search in a grid around a defined voxel location for the presynaptic signal, K is set to 3. When searching for the postsynaptic signal, K is set to 1 since postsynaptic signals are expected to loosely co-localize. These values can be adopted to the data resolution. The postsynaptic antibody pixel probability Again, the probability information is here maintained (Fig 9), now including the morphological relationship between the channels. This 'grid' like approach allows the method to be robust to slight image alignment and registration issues, as well as to deconvolution artifacts.

Results
The proposed method was evaluated on a series of array tomography (AT) datasets published in [8] and [19]. These datasets were acquired using the AT methods described in [20]. Each dataset was stained and imaged with antibodies for presynaptic and postsynaptic proteins and then aligned and registered. In the conjugate AT (cAT) dataset, the tissue samples were also imaged with a scanning electron microscope (SEM) and then the IF data were up-sampled, aligned, and registered to the EM data [8]. Synapses identifiable in the EM image data were labeled and used to provide ground truth. Table 1 lists the synaptic markers used. Synapsin is ubiquitous in both excitatory and inhibitory synapses; therefore, it is used as a presynaptic marker for excitatory and inhibitory queries. PSD-95, the postsynaptic density marker used here, is generically considered a reliable marker of excitatory synapses [8], [21], [20]. For each tissue sample, there were multiple antibody staining cycles and each cycle contained up to three different antibodies. Each round of staining included the fluorescent DNA stain DAPI, which helped facilitate the registration and alignment process. The exact overview of the sequence of antibody application can be found in [8] and [19]. Probabilistic fluorescence-based synapse detection All the primary antibodies used are from commercial sources (see Table 1) and have been thoroughly characterized in previous work. The authors in [8] and [19] performed AT-specific controls described in detail in [20] [19]. Such controls include, but are not limited to, comparison with a different antibody for the same or similar antigen to test for the specificity of  Table 1. Synaptic markers used in this work across the various datasets. Not all markers were present in each dataset. Details, including the order of antibody application, can be found in [8] and [19].

Synapses
Antigen Host Antibody Source RRID staining, comparison between adjacent sections to test the consistency of staining, and comparison with an antibody against a spatially exclusive antigen or nuclear label to evaluate background staining. Highly cross-adsorbed secondary antibodies of the appropriate species were used, such as ThermoFisher Scientific A-11029, A-11032, and A-21236 for detecting mouse primary antibodies. The application of these antibodies without a primary antibody did not result in any labeling.
Evaluation on conjugate array tomography Experimental setup. The proposed method was first evaluated on the cAT dataset published in [8] using the associated EM image data to create the 'ground truth' needed for evaluation. The datasets themselves are described in Table 2. To evaluate the method's performance on excitatory synapses, the set of query parameters in Table 3 were used. For inhibitory synapse detection, the queries listed in Table 4 were used. These parameters were based on prior literature concerning synaptic proteins and their respective antibodies [7] [19]. Only 20 inhibitory synapses were manually identified in the KDM-SYN-120905 dataset; therefore, inhibitory  Table 3. Excitatory synapse detection queries for the cAT data. Note that the size dimension in x, y correspond to the window width W in Eq (3) and the z range corresponds to the number of slices, j, mentioned in Eq (4).  synapse detection performance is only reported for the larger KDM-SYN-140115 dataset. For evaluation and visualization purposes, the output probability map, p synap (x, y, z), from each query was thresholded, and adjacent voxels that lie over the threshold were grouped into detections. Performance metrics. The ground truth used in this work is obtained from the EM data since it represents the current 'gold standard' for manual synapse identification. Prior to imaging with scanning electron microscope (SEM), the tissue was embedded in Lowicryl, which preserves fine ultrastucture detail [22]. Not every synapse present in the tissue is identifiable with EM data, and not every synapse is marked with the antibodies used (that is, not identifiable with IF data, the only input for our algorithm). Consequently, there are synapses whose presence may be inferred with the IF data, but cannot be validated by visual inspection of the EM data. Similarly, there are synapses which are visually identifiable in the EM data, but, for a variety of reasons, were not stained by the antibody markers. These are examples of data points for which validation with images only (EM or IF) is not possible, and there is no expectation of an IF-based algorithm to detect/reject. These edge cases, which were excluded from evaluation, were estimated to be less than 10% of the total population of synapses.

Antibody
We report in Table 5 the precision and recall values obtained for these two tested datasets. We differentiate two cases: first, considering all synapses manually identified in the EM data and counting all false positives returned by the program (referred to in Table 5 as 'EM'); and second, considering the subset of detections that can be manually verified by an expert using only IF data (referred to in Table 5 as 'IF'). For example, detections that the EM data lists as a false positives but are impossible to verify using only IF data are removed from evaluation. Similarly, manually-identified synapses in the EM data which do not appear in IF data are also removed from secondary evaluation.
Results. Once the final probability map for each query was computed, maps for excitatory synapses were thresholded at 0.6 for the KDM-SYN-140115 dataset and 0.55 for the KDM-SYN-120905 dataset. The maps for inhibitory synapses were thresholded at 0.7. These thresholds were based on the intersection of the precision/recall curves in Fig 10. The difference among threshold values likely reflects the different signal/noise distributions of each antibody. Note that this threshold, the only non-biological parameter of the system, can be ignored when working directly on the output (Eq 8), or easily set for the entire dataset by visually inspecting a few detections. Fig 10 shows the relationship between the final threshold and accuracy in greater detail. As shown in Table 5, the proposed algorithm successfully Table 5. Results of excitatory and inhibitory synapse detection. Precision is defined as the number of true positives detections / (true positive detections + false positive detections). Recall is defined as the number of true synapses detected / (true synapses detected + missed synapses). The value after the precision-recall values is the 95% confidence interval as computed by the Agresti-Coull method [23]. detects most synapses in both datasets, with only a small fraction of false positive detections. Based on the IF only indicator, we observe that the algorithm performs at human level (approximately 90% accuracy), with false positives and false negatives limited to cases which human experts (including co-authors of this manuscript) are also not confident of their own result [8].

Dataset
As shown in Table 6, the proposed algorithm performs as well as the state-of-the-art method for excitatory synapse detection [8], while eliminating the need to undergo the laborintensive process of cultivating a training dataset. Furthermore, due to the approximate tento-one ratio of excitatory to inhibitory synapses, creating training sets for inhibitory synapses is difficult. Our method is insensitive to the number of synapses per class as it only returns possible synapses which match the query parameters. Finally, the fact that we can skip training also makes the proposed system more applicable to diverse datasets without the need for redesign the entire process, as here demonstrated. Fig 11 shows an example of a true positive detection of excitatory synapses in the KDM-SYN-120905 dataset. The figure shows an example of a 'synaptogram', where each row (third to sixth rows) shows a different channel of immunofluorescent signal and each column is a 2D slice. The first row marked as Label shows the manual annotation of the synaptic cleft, i.e., the ground-truth, and the second row, marked as Result, corresponds to the output of the proposed synapse detection algorithm. Rows 3-6 are corresponding sections of each channel's foreground probability map (the output of Step 2). The seventh row, marked as EM, shows the corresponding EM data. The panel below the synaptogram shows enlarged, consecutive slices of the EM data, which was used to manually annotate the synapse. Fig 12 shows an example of a false positive which cannot be differentiated from a real detection by an expert without the assistance of EM data (not available for our algorithm). Fig 13 shows a similar situation for a false negative detection.  Table 6. State-of-the-art detection results for excitatory synapses from [8]. The value after the precisionrecall values is the 95% confidence interval as computed by the Agresti-Coull method [23]. Evaluation on array tomography The proposed method was evaluated on the array tomography dataset published in [19], which contains a portion of the mouse barrel cortex extending from Layer 3 to Layer 5. Unlike the conjugate array tomography dataset, there is no associated EM imagery. This larger series of datasets includes 11 volumes representing a total of 2,306,233 μm 3 of cortical volume. Since no gold standard is available for these data, the proposed method was evaluated by verifying known properties of the dataset: there is an approximately ten-to-one ratio of excitatory to inhibitory synapses [24], and there are more inhibitory synapses in Layer 4 than Layer 5 in the mouse barrel cortex [25] [26] [27]. For this dataset, the query parameters were adjusted to reflect the different synaptic markers used. Tables 7 and 8 list the query parameters used for detecting both inhibitory and excitatory synapses, similar to those in [7] [8] [19].
Thresholding the probability map. Once the probability maps were computed (Fig 14), they were thresholded for evaluation purposes only. Thresholds for each dataset were determined by examining the synaptic density values across various thresholds, as shown in Fig 15. As the figure shows, the appropriate thresholds for each dataset exist in a narrow band, consistent with the results in the cAT dataset. Thresholding at the optimal value shown in Fig 15 for each dataset, as set by plots in Fig 15, amounted to 2,326,692 excitatory synapses and 252,833 inhibitory synapses. This amounts to approximately 1.12 synapses per cubic micrometer and an overall ratio of 9.2 excitatory to inhibitory synapses, which is consistent with results in the literature [30] [29]. Previous quantitative electron microscopy indicates that the synapse density should be higher in Layer IV than in Layer V [31], consistent with the results from our algorithm, as shown in the graphs in Fig 16. For all three inhibitory synaptic queries, there is a synapse density difference of more than 50% between Layer IV and Layer V. There is also a greater than 50% synaptic density difference between Layer IV and Layer V for excitatory synapses containing VGluT2, as supported by [7] [32] [33] [34]. These results further support the validity of the proposed method by confirming known biological properties of a large dataset.
The threshold of the estimated probability can be set to optimize a specific desired property (density in this case), thereby becoming an additional 'query.' The threshold can actually add flexibility, since different thresholds might lead to selective detection of different types of synapses. This possibility will be studied when new data becomes available, now that the unsupervised algorithm here introduced can be applied to such data (previous algorithms were basically limited to making binary decisions for detecting synapses they have been trained to While the corresponding EM sections shows a synapse, there is insuficient synaptic IF signal available to justify the presence of a synapse using solely IF data. Again, the algorithm makes the same mistake a human expert would make when working only with the IF data. Each 'block' is 1.086μ m ×1.130μ m. As before, the bottom panel shows enlarged, consecutive slices of the EM data, which was used to manually annotate the synapse. The scale bar on the lower left side is 500 nm. https://doi.org/10.1371/journal.pcbi.1005493.g013  detect). This means that the only non-straightforwardly physical parameter of the proposed algorithm (virtually all image processing algorithms have critical parameters) can add flexibility to the technique. Finally, the threshold can be ignored if we work directly with the probability map Eq (8), e.g., to compute 'fuzzy volumes.' This unique aspect of the proposed algorithm output will also be the subject of study when running the algorithm on new AT data in the future. Thresholding in [8] was done via manual inspection to get the average punctum size of 0.09 um 2 , a number found by [8] to be the most effective PSD-95 size for synapse detection. This value might change with new data protocols or different antibody markers, thereby requiring new supervised training. We replaced the thresholding step with a step which computes the foreground probability of each pixel. Approximately 2% of the IF data volume is foreground, the rest is background. Therefore, we model the probability of a pixel belonging to the Plots showing the variation of putative synapse density across different thresholds. In this first row, each curve represents a dataset in [19] and the red lines show the expected synaptic density. For excitatory synapses, the expected density is 0.9 synapses μ m 3 ± 0.15μm 3 . For inhibitory synapses, the expected density is 0.1 synapses μ m 3 ± 0.05μm 3 [28] [29]. The first row shows the relationship between density and the threshold for each dataset, while the second row shows the average density of all the datasets as a function of the threshold. The error bars represent the standard error.
https://doi.org/10.1371/journal.pcbi.1005493.g015  background using a Gaussian to model the entire dataset as the background. The proposed system permits to change this threshold per other biological (or instrument or antibody) queries.

Discussion
The proposed synapse detection method serves the potential future needs of both basic and clinical neuroscience. Methods for large-scale synapse detection could analyze image volumes large enough to contain complete neural arbors, and thus allow the discernment of the relationships between detected synapses and their presynaptic and postsynaptic parent neurons. An understanding of the statistics of synapse variation in any given synaptic network is certain to be critical to interpreting and modeling results of mechanistic physiological study. Moreover, advances in imaging methods for tracing complete axonal and dendritic arbors [35] are likely to allow network analyses at the level of individual neurons and their synaptic connections, which might be optimally detected and measured by probabilistic means like those introduced here. When combined with complete arbor measurements [36], emerging methods for in situ measurement of single-cell transcriptomes [37] should allow single-synapse measurements to be associated with specific presynaptic and postsynaptic parent neurons of known transcriptomic profiles. Such capacities are likely to enhance our understanding of the molecular origins of synapse diversity. On the clinical side, the analysis tool we introduce here is likely to advance our abilities to detect possible abnormalities of synapse population statistics that have long been hypothesized to underlie a wide variety of mental and neurological disorders [ [44]. Quantification of synapse populations, in human postmortem and biopsy specimens, [45], and in animal models of disease, has already provided important insights into disease etiologies [39] [45] [42]. More reliable measurements based on probabilistic tools like those introduced here seem likely to facilitate future efforts to better understand disease mechanisms and to develop the quantitative assays essential to the discovery of effective therapies [46].
This work introduces a model-based unsupervised synapse detection algorithm that incorporates fundamental biological knowledge of how synapses are identified in the immunofluorescence data. We created a series of probabilistic excitatory detectors for various subtypes of synapses, and included the 3D spatial relationships typical of synaptic structures. This novel approach provides a probabilistic-based detection algorithm yielding not only detection but detection with confidence values. The implementation of synapse detection as a probability map (i.e., probability of each pixel belonging to a synapse), as opposed to a binary detection / no-detection result may provide a powerful tool to assist experts throughout the exploratory process to gain new insights from the immunofluorescence data, including potentially discovering new subtypes of synapses. The influence of different biological and AT components on the actual probability values, from the noise of the system to the expression level of the proteins and the subclass of the synapses, an important new topic of investigation, will become possible when the proposed algorithm is applied to large new datasets, currently being generated. Creating conjugate Array Tomography datasets require specialized equipment, including the use of a Field Emission Scanning Electron Microscope (FESEM) to provide ground truth validation of synapse results. The computational work presented in this paper, together with the publicly available code and data, is a step in the direction of making this kind of analysis robust enough to no longer require expensive FESEM validation.
The algorithm is computationally very simple and the only parameters are the user's definition of a synapse subtype, rendering it ready for massive datasets. The results were validated with the best available cAT and AT data, producing state-of-the-art results without the need for supervised training. As demonstrated here, the proposed framework can be exploited for the explicit detection of synapses or their properties, the latter being critical for the discovery of new subtypes as well as the patterns of distributions of known subtypes. These, and the potential consequences of the approach here proposed to other modalities, are the subjects of our current efforts.