Signal-based optical map alignment

In genomics, optical mapping technology provides long-range contiguity information to improve genome sequence assemblies and detect structural variation. Originally a laborious manual process, Bionano Genomics platforms now offer high-throughput, automated optical mapping based on chips packed with nanochannels through which unwound DNA is guided and the fluorescent DNA backbone and specific restriction sites are recorded. Although the raw image data obtained is of high quality, the processing and assembly software accompanying the platforms is closed source and does not seem to make full use of data, labeling approximately half of the measured signals as unusable. Here we introduce two new software tools, independent of Bionano Genomics software, to extract and process molecules from raw images (OptiScan) and to perform molecule-to-molecule and molecule-to-reference alignments using a novel signal-based approach (OptiMap). We demonstrate that the molecules detected by OptiScan can yield better assemblies, and that the approach taken by OptiMap results in higher use of molecules from the raw data. These tools lay the foundation for a suite of open-source methods to process and analyze high-throughput optical mapping data. The Python implementations of the OptiTools are publicly available through http://www.bif.wur.nl/.

 Reviewer's comment: The authors used signal intensities to measure the real length between the peaks. Do the signal intensities from the peaks also reflect the number of cut sites in the pixel?
This is a good observation. We previously studied the relationship between peak height and the number of labels in the same pixel, however we could only find a low correlation. We trained a logistic regression model on a window centered on peaks to predict whether it contained single dual labels (one or two cuts/labels within a pixel). The model's precision was 63% at a recall of 43%. Some possible explanations of the low performance can be that the peak intensities have a high level of noise and/or there are more factors involved in peak height, such as surrounding sequence or local DNA structure. Further analysis is needed for a clearer view on this relationship. We have now added some sentences suggesting that further study is needed to look into what different patterns of peaks mean (lines 320321.
 Reviewer's comment: The alignment was iterated 20 times to compare OptiMap and RefAligner. Is this done on the same references with the same molecules?
A random set of references (with corresponding molecules) as selected at each of 20 total iterations. We realise that this was not clear from the text. We have now changed the wording (lines 349351.
 Reviewer's comment: In "For RefAligner, we chose 10  10^10 for the maximum p − value threshold", What does this mean? How was p-value calculated for this?
We thank the reviewer for bringing this up. The score threshold had a typo and should have ; and we have now fixed this in the text (line 348. For the second part of the question, we could not find any literature or software manual on how the was calculated by the Bionano Genomics software (RefAligner).

 Reviewer's comment: What is (¿125 kb) in the "OptiMap performs better all-round" section?
This was a typo and now changed to kb (line 349.  Reviewer's comment: In " For OptiMap (dense mode), we used a score threshold of t = 0.65 for the naive mode (first round) and 0.55 for the sparse mode (second round)", what does "t" mean?
We tried to explain as score threshold in text here: We designate a pair of molecules as an aligning pair if the score is above the threshold ( , default 0.60 and discard the remaining pairs." However, we think that this was confusing and we replaced altogether and used the following notation: " " throughout the paper to denote score threshold (lines 208, 229, 237, 243, 346, 347, 357.  Reviewer's comment: Are the graphs in Fig 10 made with Optiscan or BGN Are the data stretched)?
We thank the reviewer for their attention to the figure. The figures were made from the perspective of Optimap, after linear stretching, as it shows where Optimap goes wrong (Fig 10 B.: in the top example there is local stretching just after 200bp and in the bottom example after 150bp). This is now explained in the figure caption.