Whole Genome and Exome Sequencing of Monozygotic Twins with Trisomy 21, Discordant for a Congenital Heart Defect and Epilepsy

Congenital heart defects (CHD) occur in 40% of patients with trisomy 21, while the other 60% have a structurally normal heart. This suggests that the increased dosage of genes on chromosome 21 is a risk factor for abnormal heart development. Interaction of genes on chromosome 21 or their gene products with certain alleles of genes on other chromosomes could contribute to CHD. Here, we identified a pair of monozygotic twins with trisomy 21 but discordant for a ventricular septal defect and epilepsy. Twin-zygosity was confirmed by microsatellite genotyping. We hypothesized that some genetic differences from post-twinning mutations caused the discordant phenotypes. Thus, next generation sequencing (NGS) technologies were applied to sequence both whole genome and exome of their leukocytes. The post-analyses of the sequencing data revealed 21 putative discordant exonic variants between the twins from either genome or exome data. However, of the 15 variants chosen for validation with conventional Sanger sequencing, these candidate variants showed no differences in both twins. The fact that no discordant DNA variants were found suggests that sequence differences of DNA from leukocytes of monozygotic twins might be extremely rare. It also emphasizes the limitation of the current NGS technology in identifying causative genes for discordant phenotypes in monozygotic twins.


Introduction
Down syndrome (DS; OMIM 190685) is a human chromosomal disorder caused by an extra copy of genomic region on chromosome 21. It is one of the most common causes of human genetic disorders occurring at approximately 1 in 750 live births [1]. The extra copy of chromosome 21 is largely resulted from the failure of normal chromosomal segregation in maternal meiosis, which accounts for approximately 87% of DS patients [2]. Common characteristic facial features include oblique eyes, flat nasal bridge, epicanthus, and protruding tongue. Other phenotypes include intellectual deficit, hypotonia and other associated developmental disorders and congenital anomalies [3].
One of the severe phenotypes is congenital heart defect (CHD) occurring in approximately 40% of patients with trisomy 21, albeit about 0.8% occurrence in the general population [1]. This suggests that the increased dosage of genes on chromosome 21 is a risk factor but not sufficient for abnormal heart development [4]. Interaction of genes on chromosome 21 or their gene products with certain alleles of genes on other chromosomes could contribute to CHD.
In view of genetic disorders, phenotypic discordance in monozygotic twins may provide a better understanding of relevant factors that are involved in disease etiology [5]. Although, monozygotic twins are generally considered to be genetically identical, the underlying genetic differences may arise during embryonic development, for example, single nucleotide mutations, deletions, conversion, copy number variation and postzygotic mitotic recombination. These variations have been suggested as possible genetic mechanisms causing discordant monozygotic twins [6]. Some recent studies described single nucleotide polymorphism differences between monozygotic twins [7].
The detection of genetic differences generally relies on genome comparison between a sequence of interest and the reference sequence. However, in the case of monozygotic twins, their sequences can be compared against each other. Identified mismatches could be selected for validation as possible mutations causing the discordant phenotype. For instance, discordance of monozygotic twins with autosomal dominant neurofibromatosis type 1 (NF1) was explained by the presence of a de novo NF1 mutation in all investigated cells of the affected twin, while the cells from the unaffected twin were mosaic [7].
Since only a small number of genetic differences between twins are expected [8], a highly sensitive method with high resolution and whole genome coverage should ideally be applied [9]. With the advent of next generation sequencing (NGS) technology, whole genome single nucleotide differences can efficiently be mapped [10]. Moreover, a genome-wide coverage would allow for a nonbiased approach, not restricted to certain pre-selected regions. A conventional Sanger sequencing approach can be used to validate the candidate discordant variants obtained from NGS [11].
Here, we identified a pair of Thai monozygotic twin boys with trisomy 21 discordant for a CHD and epilepsy. We hypothesized that some genetic differences from post-twinning mutations, e.g., Single Nucleotide Variants (SNVs) or small insertions or deletions (Indels) caused the discordant phenotypes. Therefore, NGS was used in order to identify such genetic differences.

Ethical statement
The study was approved by the institutional review board of Faculty of Medicine of Chulalongkorn University. Written informed consent was obtained from the parents of patients included in the study.

Patients
Thai twin boys born at the King Chulalongkorn Memorial Hospital, Bangkok, had hypotonia, low-set ears, upslant eyes and flat nasal bridge. A clinical diagnosis of Down syndrome was given and later confirmed by chromosome analysis indicating three copies of chromosome 21 in both twins. Here we assigned twin A as the older brother and twin B as the younger one. Echocardiography revealed that twin A had a ventricular septal defect (VSD), requiring a corrective operation when he was one year and two months old. Although twin B had a normal heart, he developed seizure when he was six months old. Even with several antiepileptics, he continued to seize until he was given vigabatrin when he was one year old; the seizure then stopped. Such seizure was not observed in twin A.

Zygosity analysis
After informed consent, six milliliters of peripheral blood was obtained from both twins and their mother. Genomic DNA was isolated from their white blood cells using QIAamp DNA blood    Genome sequencing and targeted capture exome sequencing Genomic DNA from both twins was sent for whole genome sequencing (WGS) using the service offered by Beijing Genomic Institute (BGI), China. The sequencing was performed using Applied BiosystemSOLiD4.0 (Sequencing by Oligonucleotide Ligation and Detection) system. The primary sequencing data were analyzed by using standard SOLiD analysis workflow. After that, the sequencing reads were aligned to human genome reference sequence (UCSC hg18) using BioScope software.
Whole exome sequencing (WES) of these genomic DNAs was done using Illumina HiSeq 2000 with the service from Macrogen, Inc., South Korea. Real Time Analysis (RTA) software version 1.7 was used to perform base calling and quality scoring. The reads were then aligned to UCSC hg19 using the Burrows-Wheeler Alignment (BWA) tool [12].

Discordant SNVs/Indels analysis
Both WGS and WES datasets of the twins were deposited to NCBI Sequence Read Archive (SRA) with the sample IDs ''SAMN02680286'' and ''SAMN02688784'' for twins A (sample name G3142) and B (G3143), respectively. The experimental IDs for WGS and WES are SRX485008 and SRX522555 for twin A and SRX487546 and SRX522556 for twin B, respectively. These samples can be downloaded from NCBI BioSample database and were registered under a project id ''PRJNA240916'' in the NCBI BioProject database.
Candidate single nucleotide variants (SNVs) and small insertions or deletions (Indels) were extracted by comparing the twins' alignment data (BAM files). Variant calling was done simultaneously on both alignment data from twins A and B (SAMtools mpileup) in order to avoid false positive variants. If a variant could only be observed in one twin but missing due to no or not enough coverage in the other twin, such a variant would be excluded from the candidate discordant variant set.
VarScan version 2.2.5 was used to identify SNVs from the mpileup alignment data. It compared the read counts, base quality and allele frequency between the twins. Discordant SNVs were called with the sequencing depth greater than or equal to 10X. These SNVs must be present in at least three reads with the minimum variant base quality score .15. Genome Analysis Toolkit (GATK) version 1.0.5974 was used to detect Indels using the Somatic Indel Detector command. The Indel results were compared between the two twins. If Indels were detected in only one twin, the discordant Indels would be called. These resulting discordant variants were filtered again by excluding those variants that were likely to be non-functional, e.g., synonymous variants and/or variants located outside the exonic regions.
For exome sequencing data, the alignment and variant calling were done on each twin data as a standard analysis service from Macrogen using SAMtools. We used in-house variant calling script to call SNVs and Indels. To detect discordant variants, we compared side by side at each locus of the variants. Particularly, the underlying discordance would be detected, only if enough read coverage of the corresponding variant was confirmed on both twins. Nonsynonymous variants with at least 306 coverage would be chosen. Subsequently, we picked variants with variantsupporting reads more than four reads in one twin and did not have variant-supporting reads in the other.
To screen out obvious false positive variants, each of the resulting variants from both whole genome and exome sequencing was visualized along with its alignment data from both twins using Integrative Genomics Viewer (IGV) software version 2.1. The variants that passed the aforementioned criteria would be validated.

Discordant SNVs/Indels validation
Twins' genomic DNA extracted from leukocytes as previously described was amplified by polymerase chain reaction (PCR), using primers specific to the candidate genes that had the resulting discordant SNVs/Indels (Table S1). The PCR products were sent to Macrogen Inc., South Korea for performing Sanger sequencing. Sequences were compared between both twins to verify the discordant variants.

Zygosity analysis
Zygosity analysis showed that both twins had the same alleles of all thirteen microsatellite markers, highly suggesting that they were monozygotic twins.

Discordant SNVs/Indels analysis
Whole genome sequencing of both twins resulted in 27.226and 28.676 of average coverage ( Table 1). The discordant SNVs and Indels from VarScan and GATK resulted in 5,701 variants. After excluding non-exonic and synonymous variants, eight discordant variants (two SNVs and six Indels) were obtained from the whole genome sequencing data ( Table 2).
The exome sequencing dataset had higher average coverage at 44.86and 36.76for twins A and B, respectively (Table 1). Exome sequencing of both twins resulted in a total of 226,983 variants prior to the discordant analysis. We selected only discordant variants that had average sequencing depths of greater than or equal to 10X, resulting in 34,226 discordant variants. After applying the stringent filtering criteria, 13 putative discordant SNVs with no Indels were chosen to be validated ( Table 2). These variants from the exome dataset share no common with the discordant variants from the whole genome sequencing dataset.

Discordant SNVs/Indels validation
We chose 15 variants (all eight of the whole genome and seven out of 13 of the exome sequencing datasets) to be validated by conventional Sanger sequencing ( Table 3). The variants from exome datasets were selected by manual observations of each variant's alignment data, using IGV to clarify the mapping result. They were selected when only two haplotypes were found, they were not located at the start or end position of reads and their base quality scores were more than 20 on average. In addition, since CHD occurring in approximately 40% of patients with trisomy 21 and the underlying variants causing discordant CHD should be common variants, we therefore selected the variants with variant frequency of not less than 5% [13]. Of the 13 discordant variants found by exome sequencing, seven met the criteria and were subjected to Sanger sequencing. Electropherograms of all 15 candidate variants showed no differences between the twins (Figure 1).

Discussion
In this study, we used NGS to sequence genome and exome of the monozygotic twins with trisomy 21, discordant for VSD and epilepsy. A rigorous discordant screening revealed 15 SNVs and 6 Indels potentially causing the twin discordance. However, validation of these 15 variants via Sanger sequencing of the corresponding genes showed no differences between the twins.
Because only non-synonymous discordant variants in the coding regions were investigated, it is possible that one of the 5,690 discordant variants in the non-coding regions identified by genome sequencing [14] or one of the 367 synonymous coding variants identified by exome sequencing [15] could have functional consequences and contribute to the discordant phenotype.
Such negative results support the notion that genetic differences between monozygotic twins even with discordant phenotypes are very rare. Several previous studies also failed to map discordant SNVs in monozygotic twins with discordant phenotypes. In particular, Baranzini et al used three platforms-whole genome sequencing (WGS), duplicate array hybridization (DAH) and RNA sequencing (RNA-Seq), to identify discordant SNVs for monozygotic twins discordant for multiple sclerosis. In their study, 3,241, 126, and 322 discordant SNVs were found by WGS, DAH, and RNA-Seq, respectively. Interestingly, they found that no discordant SNVs inferred by one approach were replicated by a second approach, while 98% of concordant SNVs could be replicated by at least two methods. The validation of 15 discordant SNVs via Sanger sequencing showed identical genotypes in the twin pairs [16]. Recent studies of monozygotic twins discordant for VACTERL association, using both WES and high-density microarray approaches, also failed to identify discordant variants that could explain the discordant phenotype [17]. However, discordant variants between monozygotic twins do exist. A study of monozygotic twins discordant for schizophrenia showed two discordant SNVs which were confirmed as actual differences by Sanger sequencing [18].
NGS has successes in finding concordant variants from patients with the same disease. It has been demonstrated in a WGS study of four family members, consisting of two siblings affected with Miller syndrome and primary ciliary dyskinesia and their unaffected parents. They successfully identified the causative gene [19]. Even with a complex disorder like autism, NGS also showed some successes in identification of the causative genes. Whole exome sequencing of 16 probands revealed candidate homozygous recessive mutations in four unrelated families [20].
One possible explanation for a high false positive rate for discordant variants is that NGS technology yields high error rate results. Particularly an overall miscall error rate for Illumina platform is typically around 1% [21]. Given the human genome size of around 3 billion base pairs, it can be assumed that a genome sequencing of a person could have approximately 30 million positions of error calling. This high error rate of genome sequencing process is well illustrated by the study of monozygotic twins discordant for schizophrenia. Of the 846 discordant SNVs identified by genome sequencing, only two SNVs were confirmed as actual differences by Sanger sequencing [18].
We conducted WGS on the twins' DNA in June 2011 and found no discordant variants. Suspecting that the coverage might not be enough on the coding region, we then performed exome sequencing in June 2012, and again found no discordant variants between the two twins. We did not compare variants found by WGS and WES simultaneously. Although the alignment programs and the reference genomes (hg18 vs hg19) used for WGS and WES were different, they should not pose any problems in variant selection as we did not impose the rule that the putative discordant variants must be present in both WGS and WES.
Explanations for a pair of monozygotic twins with identical leukocytes' DNA but discordant phenotype include mosaic genomic alteration. If a somatic mutation occurs before twinning, both twins will have the mutation variant. However, they might show discordant phenotypes because the level of mosaicism in the relevant tissues of the unaffected twin does not reach the necessary level for clinical expression. In addition to mosaic state, epigenetics could be another possible explanation for the different phenotypes in monozygotic twins [22].
Of note, DNA derived from blood may not be suitable for NGS studies of discordant twins [23]. About 70% of all monozygotic twins are monochorionic and share blood circulation in pregnancies. Therefore, the hematopoietic stem cells could be transferred between them and chimeric hematopoietic systems are created. It is therefore possible that post-twinning somatic mutations in one twin could be detected in the co-twin's blood system. This will mask the underlying mutations that cause the disease in the affected twin. Therefore, it is important to sample the tissue with discordant phenotype for DNA extraction. Unfortunately, the tissues with discordant phenotypes (heart and brain) of these twins were unobtainable.
In conclusion, we applied NGS technology in monozygotic twins with trisomy 21 discordant for VSD and epilepsy. Using the stringent filtering criteria, 15 SNVs and 6 Indels were found. However, the validation of those 15 potential discordant variants via Sanger sequencing of the corresponding genes showed no differences. The false positive results emphasized the limitation of current NGS technology in identification of rare genes causing the discordant phenotypes in monozygotic twins.