Replicative and non-replicative mechanisms in the formation of clustered CNVs are indicated by whole genome characterization

Clustered copy number variants (CNVs) as detected by chromosomal microarray analysis (CMA) are often reported as germline chromothripsis. However, such cases might need further investigations by massive parallel whole genome sequencing (WGS) in order to accurately define the underlying complex rearrangement, predict the occurrence mechanisms and identify additional complexities. Here, we utilized WGS to delineate the rearrangement structure of 21 clustered CNV carriers first investigated by CMA and identified a total of 83 breakpoint junctions (BPJs). The rearrangements were further sub-classified depending on the patterns observed: I) Cases with only deletions (n = 8) often had additional structural rearrangements, such as insertions and inversions typical to chromothripsis; II) cases with only duplications (n = 7) or III) combinations of deletions and duplications (n = 6) demonstrated mostly interspersed duplications and BPJs enriched with microhomology. In two cases the rearrangement mutational signatures indicated both a breakage-fusion-bridge cycle process and haltered formation of a ring chromosome. Finally, we observed two cases with Alu- and LINE-mediated rearrangements as well as two unrelated individuals with seemingly identical clustered CNVs on 2p25.3, possibly a rare European founder rearrangement. In conclusion, through detailed characterization of the derivative chromosomes we show that multiple mechanisms are likely involved in the formation of clustered CNVs and add further evidence for chromoanagenesis mechanisms in both “simple” and highly complex chromosomal rearrangements. Finally, WGS characterization adds positional information, important for a correct clinical interpretation and deciphering mechanisms involved in the formation of these rearrangements.

Introduction Structural variants (SVs) contribute to genomic diversity in human [1] and include copy number variants (CNVs) (deletions, duplications), as well as copy number neutral (balanced) variants (inversions and translocations), and more complex rearrangements, resulting from chromothripsis and/or chromoanasynthesis [2,3]. Complex SVs (complex chromosomal rearrangements, CCRs) often result in congenital and developmental abnormalities, as well as in cancer development, although carriers with unaffected phenotypes have also been reported [4].
A rare phenomenon regularly observed in clinical genetic diagnostic laboratories is multiple CNVs co-localizing on the same chromosome. Even though a chromosomal microarray (CMA) may identify such rearrangements, further characterization with whole genome sequencing (WGS) may be useful. A previous WGS study of two closely located duplications revealed additional copy-neutral complex genomic rearrangements associated with pairedduplications, such as inverted fragments, duplications with a nested deletion and other complexities, which were cryptic to CMA [5].
Proposed mechanisms that could explain the formation of multiple CNVs on the same chromosome include chromothripsis and chromoanasynthesis [6,7] while the term chromoanagenesis, a form of chromosome rebirth, describe the two phenomena independent of the underlying mechanism [8].
Chromothripsis is a chromosome shattering phenomenon, where part of or an entire chromosome, or few chromosomes, are fragmented into multiple pieces and reassembled in a random order and orientation resulting in complex genomic rearrangements [9]. During this process, some of the generated fragments can be lost resulting in heterozygous deletions. One of the distinctive features of chromothripsis is that the rearrangement breakpoints (BPs) are localized to relatively small genomic regions, usually spanning a few Mb. The causes of such clustered fragmentations are still unclear, however some studies suggested that chromothripsis could be generated through the physical isolation of chromosomes within micronuclei, where the "trapped" lagging chromosome(s) undergo defective DNA replication and repair, resulting in chromosome pulverization [10,11]. Others hypothesized that the clustered DNA double-strand breaks (DSBs) during chromothripsis could be initiated by ionizing radiation [9,12], breakagefusion-bridge cycle associated with telomere attrition [9,13], aborted apoptosis [14], as well as endogenous endonucleases [15]. The highly characteristic breakpoint-junction (BPJ) sequences in the derivative chromosomes point to non-homologous end-joining (NHEJ) [16] or microhomology-mediated end-joining (MMEJ) [17] as being likely underlying repair mechanisms for rejoining of the shattered DNA fragments [9,18,19]. Although non-allelic homologous recombination (NAHR) was excluded as a chromothripsis repair mechanism [20], our recent report showed that homologous Alu elements may also mediate germline chromothripsis [15]. Chromothripsis was deciphered by the help of whole genome next generation sequencing technologies (WGS) in microscopic complex chromosomal rearrangements involving three or more BPs [18,19,21,22], as well as in microscopically balanced reciprocal translocations [23,24].
Chromoanasynthesis [25], was described by high resolution chromosome microarray analysis (CMA) and refers to clustered copy number changes, including deletions, duplications, and triplications, that are flanked by regions of normal dosage state. Small templated insertions and microhomologies found at most BPJs pinpointed that chromoanasynthesis likely involves replication failures, such as fork stalling and template switching (FoSTeS) [26] and/or microhomology-mediated break-induced replication (MMBIR) [27]. Another rare but distinct underlying mechanism of formation is atypical chromoanasynthesis that seems to only involve single chromosomes and exclusively generate duplications [28], either clustering on one chromosome arm or scattered throughout the entire chromosome.
It has also been shown that clustered duplications confined to a single chromosome may not only be integrated into the chromosome-of-origin in tandem, but could be integrated at multiple positions in the derivative chromosome and have non-templated insertions at the BPJs, indicating a different mutational mechanism, such as alternative NHEJ mediated by the DNA polymerase Polθ [28]. Finally, evidence suggests that both chromothripsis and replicative errors are not only responsible for highly complex rearrangements involving several chromosomes or a large number of chromosomal segments. Even simpler rearrangements involving a small number of chromosomal segments on a single chromosome could have formed through shattering of a chromosome or replicative errors [21].
To delineate the chromosomes and analyze the plausible underlying mechanisms of formation of multiple CNVs on a single chromosome, we characterized 21 germline complex rearrangements initially detected by CMA. The rearrangements involved only duplications, only deletions or both deletions and duplications. Underlying mechanisms of rearrangement formation were inferred from the BPJ architecture as well as the overall connective picture.

Results
We investigated the BPs of 21 individuals with clustered germline CNVs using WGS (matepair or paired-end sequencing) to elucidate potential underlying mechanisms of rearrangement formation and possibly clinically relevant genomic imbalances or gene disruptions. Cases were included if they harbored two or more CNVs on the same chromosome. The clinical symptoms were variable, including congenital malformations and neurodevelopmental disorders. Phenotypes and CMA results are presented in Table 1.
Segregation analysis had been performed in 20 cases and showed that the CNVs were inherited in 8 and de novo in 12. Parental DNA samples for further investigation of parental origin were available in seven of the de novo cases. It was found that the rearrangement was on the maternal chromosome in four cases and on the paternal chromosome in three cases (S1 Table). We also excluded presence of copy number neutral inversions in the parents. Among the eight inherited cases, the rearrangement segregated from a phenotypically unaffected mother (n = 6) or father (n = 2), indicating that the complex chromosomal rearrangement may be an incidental finding. We detected a complex overall picture with 83 BPs associated with deletions, duplications, inversions and insertions (  Table). Resolution was on single nucleotide level in 83 BPJs (75%) ( Table 2).

Classification of complex clustered CNVs
Based on the CNV type, all rearrangements were classified into deletions-only group (n = 8), duplications-only group (n = 7) and deletions-and-duplications group (n = 6) (S1 Fig). Examples from each group are presented in Fig 1. The average number of BPJs per case was 4 (range = 2-14). The rearrangements in the duplications-only group contained the fewest BPJs per case (average = 3, range = 2-5) and consisted mostly of DUP-DIP-DUP rearrangements ( Table 1). The rearrangements in the deletions-only group contained slightly more junctions (average = 4, range = 2-7). The rearrangements belonging to the deletions-and-duplications group showed the highest degree of complexity with more BPJs per case (average = 6, range = 2-14).

Clustered CNVs show additional complexities at nucleotide-level resolution
In total, WGS revealed additional duplicated or deleted fragments not detected by CMA in 16 out of 21 cases (76%) ( Table 3). In most of the cases, the obtained BPJs allowed us to resolve the exact nature of rearranged chromosomes. For one case (P5513_206) from the duplications-only group, there was no conclusive order for the duplicated fragments, hence three possibilities are shown in Fig 2. In one highly complex case (P1426_301) the full connective picture of rearranged chromosomes could not be established (Fig 3).
In four cases where CMA suggested two clustered duplications separated by a diploid fragment (P4855_511, P2109_150, P06 and P74), WGS revealed a nested deletion within the duplicated segment (S2 Fig). Notably, all these four rearrangements were maternally inherited indicating that the duplication and the deletion are located in cis. In addition, WGS allowed detection of copy-neutral segments (inversions and insertions); and in total, 37 inversions were detected within the clustered CNVs (Table 3). The deletions-only group contains a large number of inverted fragments similar to the deletions-and-duplications group, while the duplications-only group contains only four duplicated fragments with inverted orientation in three cases (P209_151, P4855_512 and P5513_206) ( Table 3).

Additional disease causing genes were revealed by WGS
Several OMIM morbid genes were identified in clustered CNVs detected by CMA (S3 Table). A CNV was assessed as pathogenic or likely pathogenic in 11 cases, as benign in one case, and in the remaining cases as variants of unknown significance ( Table 1). The pathogenicity classification was based on the American College of Medical Genetics and Genomics (ACMG) guidelines [29] and included the segregation analysis, amount of OMIM morbid genes or specific disease-related genes, size of the CNVs and/or if the CNVs had been reported previously in patients with similar phenotype. None of the CNVs disrupted an OMIM morbid gene but all CNVs that were classified as likely pathogenic or pathogenic was based on gene dosage sensitivity mechanisms. In four cases (P2046_133, P5513_206, P5513_116 and P1426_301) WGS enabled detection of further OMIM morbid genes, which could not be revealed by CMA (S3 Table).

Duplications are mostly interspersed and not tandem
Thirteen of the 21 rearrangements consisted of 36 duplicated fragments ( Table 1): 17 of these fragments belong to the duplications-only group (7 individuals) and 19 fragments belong to the deletions-and-duplications group (6 individuals). In all cases, the WGS data analysis could detect whether the duplications were tandem (3 fragments) or interspersed (33 fragments).
Notably, the majority of the duplications were interspersed (92%). There was a single tandem duplication in the duplications-only group (P4855_512) and two tandem duplications in the deletions-and-duplications group (P5371_204 and P2109_176) ( Fig 1B). All interspersed duplications were intrachromosomal and 46% of the duplicated fragments were inverted, indicating random orientation of the duplicates. The duplicates of the interspersed duplications clustered tightly: 79% of the duplicates were inserted next to another duplicate. P5513_206 represents such a rearrangement that consists of five interspersed duplications, all inserted in a clustered but seemingly random manner in the same region (Fig 2).

Mutational signatures indicating underlying mechanisms of rearrangement formation
Molecular signatures at the BPJs further enabled the reconstruction of underlying mutational mechanisms. For example, blunt joints, absent or short microhomology (1-4 bp) and small insertions or deletions at the BPJs are characteristic of DNA DSB repair through direct ligation by NHEJ. In the clustered CNVs studied here, we observed that most of the BPJs involved in  the deletions-only group showed such signatures ( Table 2, S2 Table) pinpointing involvement of NHEJ. Alternatively, DNA DSBs can also be repaired by alternative NHEJ (alt-NHEJ) mechanisms, such as MMEJ which is a more error prone repair pathway highly dependent on microhomology [17]. MMEJ may result in deletions of the DNA regions flanking the original BP, and longer stretches of both templated (sequences found within 100 nucleotides upstream  [28,30,31], which is associated with short single-strand overhangs after a DSB. This typically leads to inserts of 5-25 bp before ligation and hence leads to short stretches of microhomology seen in the BPJ [31], similar to what is seen in MMEJ. In addition, canonical NHEJ and alt-NHEJ can operate simultaneously in the same cell [32], and this possibility needs to be taken into consideration as well. Overall, microhomologies were mostly prevalent at the BPJs of the complex rearrangements containing duplications (54% and 59% for duplications-only group and deletions-and-duplications group, respectively) ( Table 2, S5 Fig). A model of replication-based mechanisms, for example multiple template switching, could better explain the formation of these complex rearrangements (Fig 3B, Fig 4). Such mechanisms are commonly associated with similar features as MMEJ, as well as de novo single nucleotide variants around the BPJs [33].

Fig 2. Three different plausible end products in a complex case involving five duplications.
In case P5513_206, five duplications were shown to not be tandem, but inserted in a seemingly random but clustered manner. The exact location of each duplicate could not be determined using WGS only, but three plausible outcomes are shown. Here we show a schematic drawing of the 11 chromosomal segments involved on human chromosome 14q labelled A-K. In the linear representation the copy number status is indicated as black (normal) or blue (duplicated). Each BP is shown as a short vertical black line. Above the line the genomic coordinates of identified BPs is indicated and if repeat elements are disrupted by a BP they are shown below the line. In the three solutions the regions are shown as boxes and copy number status is indicated as white (normal) and blue (duplicated).

Identical rearrangements on 2p53.3 in two unrelated individuals
Seemingly identical rearrangements on 2p25.3 were identified in individuals P4855_511 (from Sweden) and P06 (from Denmark), belonging to the duplications-only group based on CMA results. However, these two cases were later redefined as having duplication with a "nested" deletion inside the duplicated fragment. An identical blunt BPJ without microhomology (the BPJ of the nested deletion) was detected in both P4855_511 and P06. The duplication junction was resolved at nucleotide level only in P4855_511 and a 3bp microhomology (TGC) was detected at the BPJ through split reads in the deep paired-end data. However, for case P06 no split-read was present for the BPJ showing the duplication in the shallow mate-pair WGS data. Several attempts were made to amplify the BPJ using breakpoint PCR and Sanger sequencing without success due to GC-rich sequences in the area. Hence, we could only compare the junction sequences of one junction, which were identical, including a SNV (rs4971462) in cis upstream of the junction (S4 Fig). This may suggest that the 2p25.3 could be a rare founder variant in Europe. However, using the WGS data from P4855_511 and the Affymetrix Cytoscan HD SNP array data from P06, we analyzed 100 common SNVs surrounding the rearrangement and found that the haplotypes for these variants varied in a way that would be expected for two unrelated individuals. Hence, it was not possible to assess whether the rearrangement in these two individuals have occurred through separate events or in a common ancestor. No evidence suggest that the region is a hotspot for CNV formation, no common repeat structure was present in the BPJs and we also assessed the junction sequence from the common BPJ (S4 Fig) in the Predict a Secondary Structure Web Server (https://rna.urmc.rochester.edu/ RNAstructureWeb/Servers/Predict1/Predict1.html) and no significant structure was seen. Remaining rearrangements were all unique.
Finally, the junction architecture may indicate that the nested deletion occurred via nonreplicative mechanisms (e.g. NHEJ), which require no microhomology. Although the tandem duplication might occur during replication process, we hypothesize that they occurred within a single cell cycle, as the duplication is co-segregated with deletion in both families.

Alu-Alu and LINE mediated rearrangements
We and others have previously shown that the sequence homology between Alu elements (average 71%) may facilitate unequal crossover between genomic segments and generate Alu-Alu mediated CNVs, inversions, translocations and chromothripsis [15,34,35]. In the current cohort, DEL-INV-DEL rearrangements on 17p13.3 are associated with fusion Alu-Alu elements at both junctions (P2109_123), suggesting an Alu-Alu mediated mechanism in this complex rearrangement. Sequence identity between the AluSx_AluSx1 and AluSq2_AluSq2 pairs are 73.3% and 78.6%, respectively. Notably, both AluSx_AluSx1 and AluSq2_AluSq2 pairs are in opposite orientation on the reference genome, which resulted in inversion of the fragment C ( Fig 4A). As the sequence identity of involved Alu pairs is < 90%, it might not be sufficient for homologous recombination, while MMEJ or FoSTeS/MMBIR could potentially generate Alu-Alu mediated rearrangements here as previously suggested by other studies [34][35][36]. Indeed, 17p13.3 region is known to be Alu rich and consequently many Alu-Alu mediated CNVs and complex genomic rearrangements associated with multiple disorders have been reported [35]. Similarly, in P2109_176 involving a combination of deletions, duplications and other copy-neutral rearrangements on chromosome 2, we observed LINE elements at all 11 BPs, indicating underlying LINE-mediated mechanisms (Fig 4B). Here, we found 3-5 bp Finally, 14 out of 25 BPs in the most complex case (P1426_301) containing deletions, duplications, and inversions are located within repeat regions of different classes likely providing microhomology for multiple template switching (Fig 3).

Discussion
In the current study we present 21 individuals with two or more clustered non-recurrent CNVs confined to a single chromosome including both chromosomal arms (two cases) or to a single chromosomal arm (19 cases). WGS enabled us to decipher the true nature of the rearrangements including detection of copy neutral variants within or flanking the rearrangements. The individuals had a wide range of clinical symptoms, including congenital malformations and neurodevelopmental disorders. Dosage of the genes located within the deleted and/or duplicated fragments and/or the disruption of genes located in the BPJs could be responsible for the clinical manifestations. In the current cohort, the more exact resolution of WGS as compared to CMA resulted in a reduction of the number of morbid OMIM genes affected in three cases (14%) and in an increase in one individual (5%). However, this information did not influence the overall assessment of the clinical relevance.
WGS analysis revealed additional complexities such as inversions and interspersed duplicates in most cases, findings that are in line with previous findings in a cohort of autism spectrum disorder where 84.4% of large complex SVs involved inversions [3]. In addition, we detected that most of the interspersed duplications were inserted next to another in a seemingly random manner, similar to the few cases reported before [28].
For ultra-complex chromosomal rearrangements such as the ones seen in P1426_301 and P00, the large number of genomic pieces with breakpoints often located in repetitive regions complicates the mapping of the final structure of the derivative chromosome(s). Third-generation sequencing including Pacific Biosciences SMRT long-read sequencing platform or Nanopore MinION sequencing has showed promising results [37,38] for bridging repetitive sequences and hence overcoming one of the largest limitations with short-read sequencing. The current study is limited by the fact that we did not try any of these technologies, which would be the next step needed to completely solve the structure of the derivative chromosomes in this case (P1426_301). Long-read sequencing might also add information in case P5513_206 that is presented here with three possible rearrangements of the duplicated fragments.
By mapping all the BPs and resolving the links between the generated fragments, we observed several hallmarks of germline chromothripsis and chromoanasynthesis [4,25,39]. First, all the BPs associated with the complex rearrangements were clustered and confined to a single chromosome. Second, the rearranged fragments within the derivative chromosomes had random order and orientation. Third, the copy-number states detected in deletions-only group oscillated between one and two, typical to chromothripsis, while the rearrangements including duplications were mostly resembling chromoanasynthesis. Fourth, signatures of NHEJ and MMEJ pathways were mostly detected at the BPJs of the complex rearrangements included in the deletions-only group, which is compatible with the previous reports describing BPJs associated with chromothripsis [9,18,19,32]. Even though both chromothripsis and chromoanasynthesis are generally of paternal origin [6,40], the current de novo chromosomal rearrangements occurred on the maternal and paternal chromosomes to the same extent. Of the seven de novo cases where we had parental samples, three had characteristics of chromoanasynthesis and replicative errors and two of those arose on the maternal chromosome. This is in contrast to the expectation that replicative error-mediated chromosomal aberrations would be biased towards spermatogenic origin. In addition, among the four cases with characteristics of chromothripsis, two were of paternal origin and two of maternal origin. Finally, we confirmed that Alu-or LINE-mediated mechanisms may also underlie chromothripsis formation.
Most of the reported germline chromothripsis cases are nearly dosage-neutral, possibly due to embryonic selection against loss of dosage-sensitive genes. However, there are few reports of heavy imbalances detected by CMA, suggesting chromothripsis event [41][42][43][44][45]. Such cases need further investigations by paired-end or mate-pair sequencing in order to decipher the balanced rearrangements involved as well as to understand the underlying mechanisms. Our approach of applying high-resolution sequencing in such cases with clustered deletions, confirmed that additional copy-neutral SVs may coexist. Combined picture of such complex rearrangements resembled catastrophic phenomenon of chromosome "shattering", where some of the fragments may be lost (deleted), while retained fragments would be resembled by repair machinery with random order and orientation. The fact that clustered duplications and combinations of deletions and duplications typical to chromoanasynthesis revealed both non-tandem and inverted nature of most duplicates, enriched with microhomologies at the BPJs, further supports the notion that replication based mechanisms, may explain the complex nature of these derivative chromosomes. In summary, we suggest that seven cases in the current study (P2109_190, P72, P2109_302, P2109_123, P2109_188, P81 and P00) represents chromothripsis, ten cases (P06, P4855_511, P2109_150, P2109_151, P74, P4855_512, P5513_206, P2109_162, P5513_116, P5371_204) are chromoanasynthesis events and four cases (P2109_185, P2109_176, P2046_133 and P1426_301) have ambiguous mutational signatures. All four ambiguous cases showed large non-templated insertions in the BPJ (typical to Polθ-driven atypical chromoanagenesis or retrotransposition-mediated chromothripsis), but three cases harbored both duplications and deletions (typical to chromoanasynthesis) and one case contained only deletions (typical to chromothripsis). Of the seven chromothripsis cases, one case was Alu-Alu mediated (P2109_123) and one was likely mediated by replicative errors and the DSBs were joined through alt-NHEJ (P2109_188), while remaining cases showed more consistent signatures of canonical NHEJ or MMBIR. Among the cases involving duplications or both duplications and deletions, most BPJs showed signatures of replicative errors with microhomology in the breakpoints, some possibly caused by repeat elements, except in three cases from the deletions and duplications-group (P2109_185, P2109_176, P1426_301) with non-templated insertions ranging in 8-52 bp in size and short microhomology (2-6 nt) in the BPJs. These features are not fully consistent with replicative joining mechanisms such as FoSTeS/MMBIR, but it is possible that these cases are mediated by replicative errors, and that Polθ is involved in the stitching of the chromosomes, hence two operating repair machineries in the same cell.
In two of the cases in our cohort (P5513_116 and P2109_185) the clustered CNVs were detected on both arms of the chromosomes involved (chromosome X and 5, respectively). Notably, these two cases show similar patterns, where a terminal duplication of one chromosomal arm is inserted in the place of terminal deletion of the other chromosomal arm with an inverted orientation. A breakage-fusion-bridge cycle process could explain parts of this kind of rearrangement. Briefly, the process starts when a chromosome loses its telomere and after replication the two sister chromatids will fuse into a dicentric chromosome [46]. Then, during anaphase the two centromeres will be pulled towards opposite nuclei, resulting in the breakage of the dicentric chromosome. Random breakage may cause large inverted duplications. After the breakage there will be new chromosome ends lacking telomeres resulting in a new cycle of breakage-fusion-bridge, the cycles will stop once the chromosome end acquires a telomere. This mechanism has previously been suggested to explain some cases of chromothripsis formation [9,13,47]. Here, with telomeric regions of both chromosome arms being involved, it is likely that the breakage-fusion-bridge cycle has been accompanied by a formation-attempt of a ring chromosome. However, chromosome analysis and FISH had previously shown that no ring chromosome was formed in either of these cases. In addition, as mentioned previously, case P2109_185 showed characteristics of Polθ involvement in the stitching with large nontemplated insertions in the BPJs.
In conclusion, the BP characterization of the derivative chromosomes showed that multiple mechanisms are likely involved in the formation of clustered CNVs, including replication independent canonical NHEJ and alt-NHEJ, replication-dependent MMBIR/FoSTeS and breakage-fusion-bridge cycle, as well as Alu-and LINE-mediated pathways. WGS characterization adds positional information important for a correct interpretation of complex CNVs and for determining their clinical significance; and deciphers the mechanisms involved in formation of these rearrangements.

Ethics statement
The local ethical board in Stockholm, Sweden approved the study (approval number KS 2012/ 222-31/3). This ethics permit allows us to use clinical samples for analysis of scientific importance as part of clinical development. Included subjects were part of clinical cohorts investigated at the respective centers and the current study reports de-identified results that cannot be traced to a specific individual. All subjects have given oral consent to be part of these clinical investigations.

Study cohort
The subjects included in this study (n = 21) were initially referred to the Department of Clinical Genetics at the Karolinska University Hospital (n = 13), Kennedy Center (n = 5), Sahlgrenska University Hospital (n = 2) or Linköping University Hospital (n = 1). All subjects were part of clinical cohorts investigated at respective centers with CMA due to congenital developmental disorders, intellectual disability or autism. Karyotypes and phenotypes are provided in Table 1.

Chromosome microarray analysis
Genomic DNA was prepared from whole blood using standard procedures. CMA was carried out using either SNP (single nucleotide polymorphism) or oligonucleotide microarrays. Fluorescent in situ hybridization (FISH) analysis or quantitative PCR (qPCR) with Power SYBR Green reagents (Applied Biosystems, Carlsbad, CA, USA) was employed to verify the structural variants. FISH-, qPCR-, or array comparative genomic hybridization (aCGH) analysis was used to investigate parental inheritance when possible.
In 13 cases (P2046_133, P2109_123, P2109_150, P2109_151, P2109_162, P2109_188, P2109_190, P2109_302, P4855_511, P4855_512, P2109_176, P1426_301, P2109_185), the CMA was performed with an 180K custom oligonucleotide microarray with whole genome coverage and a median resolution of approximately 18 kb (Oxford Gene Technology (OGT), Oxfordshire, UK). Experiments were performed at the Department of Clinical Genetics at Karolinska University Hospital, Stockholm, Sweden, according to the manufacturer's protocol. Slides were scanned using an Agilent Microarray Scanner (Agilent Technologies, Santa Clara, CA, USA). Raw data were normalized using Feature Extraction Software (Agilent Technologies, Santa Clara, CA, USA), and log2 ratios were calculated by dividing the normalized intensity in the sample by the mean intensity across the reference sample. The log2 ratios were plotted and segmented by circular binary segmentation in the CytoSure Interpret software (OGT, Oxfordshire, UK). Oligonucleotide probe positions were annotated to the human genome assembly GRCh37 (Hg19). Aberrations were called using a cut-off of three probes and a log2 ratio of 0.65 and 0.35 for deletions and duplications, respectively.

Mate-pair WGS
Mate-pair libraries were prepared using Nextera mate-pair kit following the manufacturers' instructions (Illumina, San Diego, CA, USA). The subjects were investigated with the gel-free protocol where 1 μg of genomic DNA was fragmented using an enzymatic method generating fragments in the range of 2-15 kb. The final library was subjected to 2x100 bases paired-end sequencing on an Illumina HiSeq2500 sequencing platform.

Paired-end WGS
The PCR-free paired-end Illumina WGS data was produced at the National Genomics Infrastructure (NGI), Stockholm, Sweden. The WGS data was generated using the Illumina Hiseq Xten platform, which produced an average coverage of 30X per sample. The average insert size of the WGS libraries was 350 bp, and each read length was 2x150 bp.
The WGS data and Sanger reads were analyzed for junction features such as microhomology, insertions, single nucleotide variants (SNVs), and repeat elements using blat (https:// genome.ucsc.edu/cgi-bin/hgBlat?command=start) and an in-house developed analysis tool dubbed SplitVision (https://github.com/J35P312/SplitVision) (S1 Appendix). In short, SplitVision searches for split reads bridging each BPJ. A consensus sequence of these reads are generated through multiple sequence alignment using ClustalW [52,53] and assembly using a greedy algorithm; maximizing the length and support of each consensus sequence. The consensus sequences are then mapped to the reference genome using BWA. The exact BPs as well as any microhomology and/or insertions at the BPJs are found based on the orientation, position and cigar string of the primary and supplementary alignments of the consensus sequences. Additionally, SplitVision searches for repeat elements and SNVs close to the BPJs (<1 kb). Repeat elements are found using the USCS repeat masker [54] and SNVs are called using SAMtools [55]. Lastly, the SNVs were filtered based on the SweFreq (SweGen Variant Frequency Dataset) [56] and gnomAD (http://gnomad.broadinstitute.org). The allele frequency threshold was set to 0, removing any previously reported SNVs, and SNVs located in regions not covered by the SweGen dataset. The quality of the remaining SNVs was assessed using the Integrative Genomics Viewer (IGV) tool [57].

10X Genomics Chromium WGS
10X Genomics Chromium WGS was performed on sample P00 at NGI, Stockholm, Sweden. Libraries were prepared using the 10X Chromium controller and sequenced on an Illumina Hiseq Xten platform. Data was analyzed using two separate pipelines developed by 10X Genomics: the default Long Ranger pipeline (https://support.10xgenomics.com/genome-exome/ software/downloads/latest) and a custom de novo assembly pipeline based on the Supernova de novo assembler (https://support.10xgenomics.com/de-novo-assembly/software/downloads/ latest). The custom de novo assembler pipelines included mapping of raw Supernova contigs with the bwa mem intra-contig mode, as well as extraction of split contigs using a python script (https://github.com/J35P312/Assemblatron).