Skip to main content
Advertisement

< Back to Article

Fig 1.

Bamgineer conceptual overview.

Haploype-specific CNVs simulated using re-paired reads. Red and blue colors represent read-pairs corresponding to different haplotypes. Purple color represents new reads corresponding to the red haplotype. A. Original BAM file used as the input. B. Allele-specific loss, reads matching a target haplotype are removed C. Allele-specific single copy number gain (CN = 3). D. Allele-specific 2-copy gain (CN = 4). In C and D, new read pairs are constructed from existing reads and reads are modified at SNP loci to match the desired haplotype.

More »

Fig 1 Expand

Fig 2.

Example of allele specific CNV calls generated from modified bam files.

A) Genome-wide (left) and chromosome-view (right) of allele specific copy number, BAF and depth ratios for balanced gain of p-arm in chromosome 22 inferred using Sequenza. Blue and red lines show allele specific copy number profiles for each chromosome (lines are offset from discrete copy number values by ± 0.1 for visual separation of the two alleles). The small blue and red spots on the top figure (orange circle) show a balanced gain on p-arm of chromosome 22 (BAF is not affected as a result of balanced gain). Each black dot on the right figures represents a genomic locus and the red lines indicate the inferred value for consecutive segments. B) Allele-specific gain of entire chromosome 21(orange circle). As shown only one copy of the chromosome is gained and hence the allele frequency is reduced from the 0.5 to ~0.33 in the chromosome view. C) Genome-wide (left) and chromosome-view (right) for 36 events (21 gains and 25 losses) sampled from Genome Atlas for Bladder Urothelial Carcinoma (BLCA) for 100% tumor content. As expected depth ratio and BAFs are approximately 0.5 and zero respectively.

More »

Fig 2 Expand

Table 1.

Average number of copy number gains, losses and percent genome altered (PGA) across TCGA cancer types and the three exemplar tumors selected for our study.

More »

Table 1 Expand

Fig 3.

Log2 ratios from simulated exemplar tumors at varying purity levels.

To assess the ability of Bamgineer to recapitulate CNV profiles of known cancers, we compared, from 10 TCGA cancer types, profiles from an exemplar tumour (top track for case) with CNV called from bam files generated by Bamgineer at five purity levels: 100%, 80%, 60%, 40% and 20% (subsequent tracks for each case). Log2 ratios of copy number segments inferred using the Sequenza algorithm, shown as a heatmap (blue: loss, red: gain; data range is -1.5 to 1.5) for different cancers and different tumor cellularities shown for one exemplar tumor for each cancer type. We generated exemplar composite seg files by combining high-ranking tumor profiles from TCGA segments for each cancer defined by similarity score; Eq 6 (top track for each cancer type). The quantized segments from each representative tumor were used as the CNV input for Bamgineer along with a single normal BAM file and sampled to artificially create tumor-normal admixture at the desired purity. The purity value for each tumor type (in gray), is the median purity value for each cancer type according to TCGA segments. As the purity decreases, we observed a corresponding decrease in log2ratios of tumor segments (reduction in color intensities). Occasional discrepancies between exemplar and simulated CNV profiles for each cancer appear to be due to variation in the segmentation of copy number calls by Sequenza as purities fall rather than inaccurate allele-specific coverage at these locations.

More »

Fig 3 Expand

Fig 4.

Exome-wide simulated copy number profiles at a range of tumor purities yield expected allelic and copy number ratios.

Allelic ratio for allele-specific copy number gain (A) and loss (B) events at heterozygous SNP loci for haplotypes affected (blue), haplotypes not affected (red), and SNPs not in engineered CNV regions (green) as negative controls at different tumor cellularity levels (x-axis) across all cancers. C) Tumor to normal log2 depth ratio boxplots of copy number gain (red) and loss (blue) segments from Sequenza across all cancers (Table 1). D) Accuracy of Sequenza copy segment calling gains (red triangles) and losses (blue circles) decreases as simulated tumor content decreases.

More »

Fig 4 Expand

Fig 5.

Simulated low frequency CNVs in circulating tumor DNA data yield expected allelic ratios and retain underlying bimodal fragment size distribution.

A) Percentage increase in read count above the original input BAM file for the affected haplotype for three SNPs in EGFR region. Shifts in coverage of specific allelic variants, and haplotype representation consistent with the targeted allele frequencies. B) Comparison of DNA inserts size distribution from a targeted 5-gene cell-free DNA sequencing library subject to introduction of read pairs supporting an EGFR single-copy gain. Distribution of insert sizes using 5bp bin size. C) Experimental Cumulative Density Functions (ECDF) of all fragment lengths (blue; median 173, mean: 194.5) and newly introduced read pairs (red; median 175, mean: 197.7) allele specific gain for EGFR. Despite the multimodal nature of the cell free DNA distribution of fragment size (two major peaks at ~160 and 330bp), the fragment size distribution of the original read pairs and that of read pairs introduced to simulate EGFR gain are reasonably consistent (Two sided KS test: 0.11: p-value: 0.81; we note minor discrepancies in relative intensity of second peak at around ~330 relative to the original BAM).

More »

Fig 5 Expand