Skip to main content
Advertisement

< Back to Article

Fig 1.

Summary of the SmartMap analysis workflow and algorithm.

(A) Flowchart outlining the workflow for traditional ChIP-seq (or ICeChIP-seq) analysis [2931] utilizing only unireads (left, green) vs. the workflow for SmartMap analysis utilizing multireads with an iterative Bayesian reweighting algorithm (right, blue). (B) Schematic showing the Bayesian reweighting algorithm utilized in the SmartMap analysis. Each mapping associated with a read is assigned a weight such that the weight is greater for those mappings associated with loci of greater map weight density. For more detailed description of the algorithm, see Methods.

More »

Fig 1 Expand

Fig 2.

Characteristics of validation dataset.

(A) Schematic outlining the workflow to validate and optimize SmartMap. A set of six million randomly selected 200bp loci were used to simulate paired end reads. The true read depth distribution was then compared to both uniread and SmartMap analyses, with each analysis conducted in both “scored” and “unscored” modes, per Methods. (B, C) Number of (B) alignments or (C) reads vs. number of alignments per read for the validation datasets. (D) Mean absolute error of read depth at true origin loci in SmartMap scored mode vs. number of reweighting iterations (E) Genome browser view showing the read depth in the (top) uniread, (center) SmartMap (0 iterations), and (bottom) SmartMap (1 iteration) datasets of an example locus.

More »

Fig 2 Expand

Table 1.

Alignment statistics for the datasets used in this study.

More »

Table 1 Expand

Fig 3.

SmartMap and uniread analyses of the validation dataset.

Iteration 0 and iteration 1 refer to SmartMap analysis with 0 and 1 iterations of reweighting, respectively. Scored and unscored refer to whether alignment score was considered in analysis, per Methods. Dashed lines are presented for readability of overlapping curves rather than discontinuities in data throughout this figure. (A) Quantile plot of read depth at the true origin loci, with Gold Standard dataset and analysis conducted in (left) scored mode or (right) unscored mode. (B) Quantile plot of excess read depth in SmartMap datasets relative to corresponding uniread dataset at true origin loci in (left) scored mode and (right) unscored mode. (C) QQ plot of read depth at true origin loci in the SmartMap (1 iteration) scored dataset vs. uniread scored dataset. Color scale represents percentile of each point, from 1st to 99th percentiles. (D-E) Median (D) read depth or (E) excess read depth vs. mappability score (UMAP50) [25] of the true origin loci. (F-G) Average read depth (F) at true origin loci and (G) outside true origin loci. (H) Mean absolute error of read depth at true origin loci for each dataset, with Gold Standard as the reference point. (I) Mean proportion of alignments intersecting with the true read of origin for each weight after SmartMap with no reweighting (green) and one iteration of reweighting (red) in scored mode. Dashed line represents line with slope of unity. (J) Mean weighted overlap proportion score between alignments intersecting the true read of origin and the true read locus for each weight after SmartMap with no reweighting (green) and one iteration of reweighting (red) in scored mode. Weighted overlap proportion score is meant to represent the proportion of a read’s weight that maps to the correct location due to a particular alignment and is computed as a weighted geometric mean of the proportion of the alignment covered by the true read and the proportion of the true read covered by the alignment.

More »

Fig 3 Expand

Table 2.

Analysis of reads across genomic windows.

More »

Table 2 Expand

Table 3.

Analysis of high-depth regions under SmartMap analysis.

More »

Table 3 Expand

Fig 4.

SmartMap and uniread analyses of ICeChIP-seq input depth.

All analyses conducted on 200bp genomic windows for the Inputs defined in Table 2. (A) Quantile plot of read depth for SmartMap and uniread analyses. (B) Median read depth vs. mappability score (UMAP50) for SmartMap and uniread analyses. (C) Quantile plot of excess read depth in SmartMap relative to uniread analysis. (D) Median excess read depth vs. mappability score (UMAP50). (E) QQ plot of read depth in SmartMap vs. uniread analysis. Color scale represents percentile of each point, from 1st to 99th percentiles. Dashed line represents line with slope of unity. (F) Quantile plots of depth-normalized log ratio of read depths of biological input replicates under SmartMap and uniread analysis. Graph breaks are present on both the upper and lower ends of the graphs. (G) Mean absolute depth-normalized log ratio for the comparisons presented in panel F.

More »

Fig 4 Expand

Fig 5.

ICeChIP-seq histone modification density in SmartMap and uniread analyses.

All analyses conducted on 200bp tiled genomic windows. (A-B) (A) Mean or (B) Median HMD vs. mappability score (UMAP50) for SmartMap and uniread analyses. (C-D) Scatterplots of (C) specificity or (D) log specificity for uniread vs. SmartMap analyses. Specificity is measured as the enrichment of each on- or off-target internal standard nucleosome as a percentage of on-target enrichment.

More »

Fig 5 Expand

Table 4.

Analysis of ICeChIP Calibrant Barcodes.

More »

Table 4 Expand

Fig 6.

Assessment of histone modifications at promoters of repetitive DNA elements.

(A) Mean histone modification densities (HMDs) about promoters for classes of all repetitive elements, as defined by k-means clustering. Corresponding analyses of LINE, SINE, and Simple Repeat elements in S13 Fig. (B) Heatmap of repeat promoters with newly measurable HMD in SmartMap analysis, sorted on first principal component of repetitive elements. (C) Proportion of each cluster comprised by each repeat class or family for all repeats (left), LINE elements (center), and SINE elements (right). All significance tests performed as post-hoc Bonferroni-corrected pairwise 2x2 chi-square tests. (D) Quantile boxplots of average normalized RNA-seq read depth across LINE elements for each LINE cluster. Solid line with marker represents 90th percentile; dashed line with marker represents 95th percentile. Significance test shows difference in median by Bonferroni-corrected pairwise Mood’s median tests. Significance markers: *p<0.01, **p<10−5, ***p<10−10.

More »

Fig 6 Expand

Table 5.

Clustering of Repetitive Elements.

More »

Table 5 Expand

Table 6.

Clustering of LINE Elements.

More »

Table 6 Expand

Table 7.

Clustering of SINE Elements.

More »

Table 7 Expand

Fig 7.

Analysis of increased usable read depth.

This figure graphically represents the data in Tables 1 and 2. (A) The percent increase in the number of reads usable in SmartMap analysis (reads with 1–50 alignments) relative to uniread analysis (reads with 1 alignment). (B) Percentage of the total number of regions with an increase in read depth in the SmartMap dataset relative to the uniread dataset. For all datasets except the ENCODE RNA-seq datasets, the list of regions analyzed is the set of 200bp genomic windows across the relevant genome (hg38, mm10, or dm3). For the ENCODE RNA-seq dataset, the list of regions analyzed is the set of distinct Refseq genes. (C) Percent increase in the number of regions with nonzero read depth in the SmartMap dataset relative to the uniread dataset. Regions are defined as per panel B.

More »

Fig 7 Expand

Table 8.

Benchmarking SmartMap software.

More »

Table 8 Expand