Genome-Wide Signatures of ‘Rearrangement Hotspots’ within Segmental Duplications in Humans

doi:10.1371/journal.pone.0028853

Figure 1.

A schematic illustrating our hierarchical approach.

mrsFAST was used to obtain read depth distribution of the NA18507 human genome with maximum mismatch (n = 2) was allowed against the repeat masked reference human genome (build 36). A mean-based approach was utilized to computationally predict the boundaries of regions associated with excessive read depth. MAQ was used to obtain the consensus genome (mapping quality Q>30 and n = 2) from the NA18507 genome assembly. The consensus sequence for highly excessive read depth regions was obtained in order to apply a window-based alignment algorithm. The previously identified novel 4.8 Mb sequence from de novo assembly within this genome was also included in the rearrangement analysis.

More »

Expand

Figure 2.

Segmental duplication (SD) units which represent the most complex rearrangements within the NA18507 human genome.

a) A total of 1963 SD complex units (i.e., ≥10 rearrangements) were identified that were significantly different (p<1.0×10⁻⁶) compared with the rest of the NA18507 genome duplicated regions. The plot illustrates the concordance of the predicted autosomal complex regions compared with previous studies [17], [19]. b) Genes that completely or partially overlapped with detected SD units in which 73% (41/56) of the most variable genes in three different populations were detected in our analysis of the NA18507 human genome. Among the 1626 genes identified in this study, 10% (i.e., 166/1626) of genes that overlapped with a SD unit revealed extreme inter- and intra-chromosomal rearrangements, 50% of which have been previously validated [17]. c) Observed gene content transfer between hotspot and non-hotspot agenic SD units. d) scatter plot illustrating DNV count for hotspot and non-hotspot SD units. e) A histogram illustrating the mean read depth (RD) of the computationally predicted SD unit breakpoints. The blue bars represent the mean read depth for each of the 20,237 SD unit breakpoints and the red bars represent the mean read depth for hotspot regions.

More »

Expand

Figure 3.

The physical position of rearrangement hotspots that has been mapped within the proximal/distal breakpoints of a pathogenic deletion (red horizontal block) or duplication (green horizontal block).

More »

Expand

Figure 4.

Landscape of chromosomal rearrangements in the NA18507 human genome.

Chromosomal rearrangements located within duplicated regions are plotted against the human genome. Green bars represent the signature of intra-chromosomal rearrangements, black bars represent inter-chromosomal rearrangements and red bars represent ‘rearrangement hotspots’. Cytobands with duplications for each chromosome and selected genes that completely or partially overlapped with SD units are also indicated.

More »

Expand

Figure 5.

Signature of rearrangement hotspots located at a) 16p12.1 and b) 22q11.21.

A 40 kbp region within 16p12.1 is illustrated with its corresponding derivative copies which were localized by hierarchical analysis. This region consists of the NPIPL3 gene derivatives. The inter- and intra-chromosomal localization of the copies is approximated in the physical map within the chromosome contig (18p11.21). The alignments are color coded for chromosomes (i.e., color coded rectangles below the read depth plot) and FISH validation is illustrated for both inter- and intra-chromosomal localization. The pathogenic deletions and duplications located within these regions [27]–[32] are depicted in red and green bars, respectively The blue bars under the contig represent the approximated inversions previously reported by Antonacci, F. et al [26]. b) Analysis of a 37 kbp duplicated region within 22q11.21 revealed it is comprised of a core 2.7 kbp tandem duplicon copied from different chromosomes. Black lines represent the read depth (x-axis), green shade represent an SD unit, and blue bars represent the region with common repeat elements. The horizontal blocks (color coded according to chromosomes) are the rearrangement (intra/inter) fragments with >90% sequence similarity and >100 bp in length.

More »

Expand

Figure 6.

Rearrangement hotspots comprising a 68 kb gene desert located within 1q21.1 region.

Validation of a gene desert where extreme intra-chromosomal rearrangement without any signature of inter-chromosomal duplication observed in our in silico predictions. The rearrangement consists of gene fragments from the NBPF gene family located within the p and q arm of chromosome 1.

More »

Expand