MixHMM: Inferring Copy Number Variation and Allelic Imbalance Using SNP Arrays and Tumor Samples Mixed with Stromal Cells

doi:10.1371/journal.pone.0010909

Figure 1.

Chromosome instability events as CNV states for copy number up to four.

All nine possible CNV states and genotypes with copy numbers up to 4 are presented here as a “pseudo chromosome”. (See Table 1 for an alternative representation of 20 states with copy numbers up to 7). All states are assumed to be derived from the underlying normal two copy state (‘FM’) which has regions from both chromosomes (‘F’ in blue, ‘M’ in red). The top track indicates the composition of each state based on the source chromosomes. The second track gives a graphical representation of the state composition along different regions. The third track gives the copy number for a region, from 0 to 4, which are separated by the vertical bars. The fourth track shows an example set of haplotypes making up the region (‘A’ and ‘B’ are the alternate alleles). There are up to four distinctive genotypes in each state, with each genotype for an individual SNP shown in a vertical column (for example, the SNP genotype indicated by the red arrow is ‘AAB’). In the homozygous deletion state (‘O’), both regions are deleted (labeled in gray). In the LOH states (labeled with only ‘F’s), one of the source chromosomes is deleted, while the other can be amplified one or more times. The normal state (‘FM’) has regions from both chromosomes. The remaining states harbor regions from both source chromosomes with one or both regions amplified. States such as ‘MM’, ‘FMM’, etc are not listed because they are not distinguishable from ‘FF’ and ‘FFM’ by genotying array data.

More »

Expand

Table 1.

CNV states and Genotypes.

More »

Expand

Figure 2.

LRR distributions and BAF distributions in simulated mixed samples.

A) Mixing of LRR. Each line represent a state of a certain copy number (color code on right) mixed with a proportion of normal ‘FM’ cells (proportion on top), with ‘FM = 0’ corresponding to a pure tumor sample. B) Mixing of BAF. Each subplot represent a certain CNV state (name on top) mixed with a proportion of ‘FM’ cells (proportion on left), with ‘FM = 0’ corresponding to a pure tumor sample.

More »

Expand

Figure 3.

CNV detection from simulated data.

A) Detection of copy number (CN) and allelic imbalance (AI) from simulation of pure tumor and mixed samples on Chromosome 1. Each of the 20 states are simulated to be a 300-SNP region. The numbers on the left side are proportions of ‘FM’ cells. The underlying truth simulated is depicted in the panels of ‘simu. CN’ and ‘simu. AI’. The BAF and LRR plots are of simulated pure tumor cells (). In the PennCNV and GenoCNA CN tracks, the copy number are from 0 to 4 with the baseline (gray) representing 2n, and flat box (the orange fragment) is copy neutral LOH. The results of MixHMM are separated to copy number and allelic imbalance. In the CN tracks, the baseline (gray) represents 2n, and the copy numbers range from 0n through 7n. In the AI tracks, the baseline represents 0, and it ranges from 0 through 0.5. B) Box plots of recovery rates of copy number and allelic imbalance detected using MixHMM from the simulation. The numbers on the left side are proportions of ‘FM’ cells. Values of each copy number/imbalance comes from the simulations of 220 regions with each region spaning 100 SNPs.

More »

Expand

Figure 4.

CN detection in dilution series of a breast cancer cell line (CRL-2324D).

The numbers on the left of each track are the proportion of normal (‘FM’) cells, the BAF and LRR tracks are for pure tumor sample (). Some putative CNV states as detected with MixHMM from pure tumor sample are labeled below all tracks. The chromosome and approximate start and end location is labeled on top of each column. The arrow head in the left panel point to a short region with LRR values between those of 1n and 2n. In the CN tracks, the baseline (gray) represents 2n, and the copy numbers range from 0n through 7n.

More »

Expand

Figure 5.

Comparison of three algorithms in dilution series of a breast cancer cell line (CRL-2324D).

Each subplot shows the recovery (the upper row) and false discovery rates (the lower row) in a cancer sample with a certain proportion of normal cells (proportion labeled above each column). The collapsed CNV states are labeled on x-axis, with copy number = 0 (‘0n’), 1 (‘1n’), 2 (‘FF’,‘FM’), 3 (‘3n’), > = 4 (‘4n’). The blue points (connected with blue solid lines) are results using MixHMM, the red points (connected with red dotted lines) are for PennCNV and the green points (connected with green dashed lines) are for GenoCNA. When there are no SNPs detected in a state, there will be no point in the plot.

More »

Expand

Figure 6.

Detection of copy number (CN) and allelic imbalance (AI) in tumor samples.

A) A melanoma sample (‘LAC-mel’) composed of almost pure tumor cells. B) A breast cancer sample (‘BT5’) with about 30% of normal cells. Choice state regions as detected by MixHMM are labeled below all tracks.. The top panels are results of PennCNV detection. On top of each panel we show the chromosome arm and approxiate start and end positions. The range of copy number (CN) is from 0 to 7 with the baseline represent 2n. The range of allelic imbalance (AI) is from 0 (for balanced states) to 0.5 (for LOH states), the AI of total deletion (‘O’) is set to 0.5 in this analysis. In the PennCNV track, the solid organge fragments on baseline represent copy-neutral LOH (‘FF’).

More »

Expand

Figure 7.

BAF emission probability.

The blue lines represent the distributions when , the red lines represent the distributions when . Here stands for the population frequency of ‘B’ allele. Each subplot represent the distributions of a certain CNV state (state names labeled on top). The first track on top of the graphs are copy numbers of each state.

More »

Expand