Skip to main content
Advertisement

< Back to Article

Figure 1.

Schematic illustration of CopySeq.

A. ‘Locus selection’, i.e., definition and selection of loci of interest for copy-number genotyping. B. ‘Mappability assessment’, i.e., construction of k-mer mappability locus maps. Sequence sub-stretches not uniquely mappable by k-mers are identified in each locus (represented by red blocks) and masked (i.e., excluded from further analysis). C. ‘Read-mapping’, by default carried out with MAQ [24] (other read-mappers, such as BWA [23] can optionally be applied). D. ‘Copy-number genotyping’: The locus-specific read-depth is determined, and the locus-specific ‘read-depth ratio’ computed and corrected both for the locus-specific k-mer mappability as well as for G+C-content bias (see Materials and Methods). A Gaussian classifier infers locus copy numbers by comparing locus-specific read-depth ratios with read-depth ratio distributions which are expected for different copy-number genotypes (distributions for the copy-number genotypes 0, 1, 2, 3, 4, and 5 are indicated with different colors). E. Copy-number genotypes are reported.

More »

Figure 1 Expand

Figure 2.

Copy-number genotyping results in a chromosome 1 CNV set.

A. Copy-number genotyping concordance between CopySeq- and microarray-based [14] copy-number genotypes inferred for 99 CNVs on chromosome 1 in 118 individuals, using different CNV size cutoffs. Plotted circles represent the total number of high-confidence genotypes, with the largest circle corresponding to >10,000 copy-number genotypes and the smallest to 348 copy-number genotypes. As expected, the genotyping concordance increases with higher CNV size cutoffs. B–C. Copy-number genotyping results for chromosome 1 example CNVs across 150 individuals, i.e., a bi-allelic deletion (chr1:150,822,330–150,853,218; see B) as well as a bi-allelic duplication (chr1:164,451,105–164,460,994; see C). Copy-number genotypes inferred by CopySeq are indicated with different colors: ‘0’, red; ‘1’, orange; ‘2’, grey; ‘3’, blue; ‘4’, purple. Individuals have been arranged according to population: squares, CEU; triangles, CHB+JPT; circles, YRI. The scaled read-depth ratio (indicated on the y-axis) has been calculated by multiplying the read-depth ratio by two.

More »

Figure 2 Expand

Figure 3.

Copy-number genotype inference in olfactory receptor (OR) loci across 150 individuals.

A. Distribution of locus-specific read-depth measurements in 808 OR loci. Altogether 121,200 data points are depicted (808 loci times 150 samples). Points relate the GC-adjusted read-depth to the expected read-depth, which is estimated based on the k-mer mappability of a locus and the genomic sequencing coverage of a sample. CopySeq copy-number genotypes are indicated by colors (bottom to top): ‘0’, red; ‘1’, orange; ‘2’, grey; ‘3’, blue; ‘4’, purple; ‘5’, green; ‘6’, brown; ‘7’, yellow; ‘8’, light blue; ‘9’, black). B–D. Dissecting a complex CNV region with CopySeq. The displayed region (chr11:4,921,968–4,930,581) harbors a multi-allelic CNV involving both a deletion and a duplication. The deletion results in an OR51A2—OR51A4 fusion-gene [4]. Read-depths are shown on the left and the inferred locus-structure on the right. CopySeq was carried out in conjunction with breakpoint-junction analysis [26], generating the following copy-number genotypes. NA19138: ‘2’ for OR51A4, ‘2’ for OR51A2, ‘0’ for the fusion-gene (B); NA12716: ‘0’, ‘0’, ‘2’ (C); NA19172: ‘2’, ‘4’, ‘0’ (D). Orange and blue boxes indicate open-reading frames (ORFs), and orange/blue lines denote the respective loci (with 3′ and 5′- regions). Both ORFs are on the reverse strand of the reference genome. The gene fusion occurred near the ORFs' 5′-end within a sequence stretch where both share extensive homology (thus, no reads map to this stretch uniquely). E. Copy-number genotype map of OR loci in 150 individuals. Each bar represents the frequency of a copy-number genotype (y-axis) at a particular OR locus (x-axis). Colors indicate copy-number genotype frequencies (color scheme is on the right).

More »

Figure 3 Expand

Figure 4.

Distribution of inter-individual copy-number differences in autosomal OR loci.

A. Commonly variable loci account for the majority of inter-individual OR copy number differences. OR loci were ranked by the frequency at which they displayed a copy-number genotype other than ‘2’ (indicating a CNV), followed by iterative exclusion of the rarest CNVs (i.e., first the loci that most rarely vary in copy-number were excluded, then the more common ones). Pair-wise copy-number differences between all samples were calculated, and average copy-number differences across all pair-wise comparisons determined. The y-axis indicates the inter-individual copy-number difference as a percentage of the maximum average copy-number difference, and the x-axis indicates the percentage of all copy-number variable (polymorphic) OR loci for each OR frequency rank step. For example, ∼15% of the OR loci account for ∼80% of the inter-individual OR copy-number differences between any two samples. B. Distribution of inter-individual OR copy number differences computed separately for each pair of samples. Pair-wise copy-number differences were computed as quantitative differences between copy-number genotype values summed up over all OR loci between pairs of samples (x-axis). (In this regard, for example, the difference for a given locus is 2, if in one sample a copy-number genotype of ‘0’ and in the other a copy-number genotype of ‘2’ is inferred.). Blue solid line: OR genes; red solid line: OR pseudogenes; red dotted line: OR pseudogenes, excluding the CNV-enriched OR7E family.

More »

Figure 4 Expand

Figure 5.

Heritability of CNVs in a parent-offspring trio of European ancestry.

A. Chromosomal origin of the largest human OR genomic cluster and pedigree of the European family. B–D. CNV inheritance, indicated in terms of scaled read-depth ratios and inferred copy-number genotypes among 96 bi-allelic OR loci located in the largest human OR cluster (11@55.6; see nomenclature in http://genome.weizmann.ac.il/horde/; chr11:54,842,512–56,344,668). The x-axis represents genomic coordinates, and individual OR positions are marked by ticks. The copy-number genotypes identified in NA12891 (B), NA12892 (C), and NA12878 (D), were inferred based on low-coverage genomic data (Table S1) and are consistent with Mendelian segregation. Bi-allelic CNVs were classified according to copy-number genotypes identified in the European (CEU) individuals. Copy-number genotypes are color-coded: ‘1’, orange; ‘2’, grey; ‘3’, blue.

More »

Figure 5 Expand

Figure 6.

Concordance of copy-number genotypes inferred in OR loci with microarray-based calls and qPCR experiments.

A. Comparison of >5,000 copy-number genotypes inferred in OR loci, using CopySeq, with microarray-based [14] copy-number genotypes. The comparison is based on 46 OR loci, assessed in 118 individuals. Circle size indicates the number of comparisons falling into a certain bin (the largest circle, representing >3,000 copy-number genotypes, corresponds to concordant copy-number genotype calls of the homozygous reference allele, i.e., copy-number = ‘2’). Blue lines denote the function y = x and have been included to facilitate evaluation of the data. B. Validation of 50 copy-number genotypes in 5 OR loci×10 samples by qPCR. Experimentally determined qPCR values are expressed in terms of adjusted Ct values, which were estimated as described in the Materials and Methods section.

More »

Figure 6 Expand

Figure 7.

Analysis of ‘young’ and ‘ancient’ ORs.

The figure displays the distribution of sequence identities with the most similar (‘nearest’) paralog for non-variable, bi-allelic, and multi-allelic OR loci. Each point represents the sequence identity of an OR to its nearest paralog (y-axis), and the type of locus (non-variable, NV; bi-allelic, BI; multi-allelic, MU). Green points: OR locus lacks a one-to-one ortholog in the chimpanzee genome; blue points: OR locus has a one-to-one ortholog in the chimpanzee genome (as assessed by comparing human and chimpanzee ORFs at the DNA level using BLAST, and classifying as one-to-one orthologs sequences displaying mutually highest sequence identity). Blue and green rhomboids represent the corresponding distribution average; red rhomboids represent averages for NV, BI, and MU. Rhomboid error bars represent 95% confidence intervals of the average.

More »

Figure 7 Expand

Figure 8.

Analysis of the population distribution of bi-allelic OR loci reveals shared and population-specific CNVs.

Venn diagram of 265 bi-allelic OR loci, which were distributed according to their recorded presence in the three populations analyzed (CEU, CHB+JPT, and YRI). Numbers in parentheses indicate OR loci in which a single copy-number genotype other than ‘2’ (indicating a CNV) was observed across 150 individuals; these loci may display rare, rather than population-specific CNVs.

More »

Figure 8 Expand