Prenatal Detection of Aneuploidy and Imbalanced Chromosomal Arrangements by Massively Parallel Sequencing

Fetal chromosomal abnormalities are the most common reasons for invasive prenatal testing. Currently, G-band karyotyping and several molecular genetic methods have been established for diagnosis of chromosomal abnormalities. Although these testing methods are highly reliable, the major limitation remains restricted resolutions or can only achieve limited coverage on the human genome at one time. The massively parallel sequencing (MPS) technologies which can reach single base pair resolution allows detection of genome-wide intragenic deletions and duplication challenging karyotyping and microarrays as the tool for prenatal diagnosis. Here we reported a novel and robust MPS-based method to detect aneuploidy and imbalanced chromosomal arrangements in amniotic fluid (AF) samples. We sequenced 62 AF samples on Illumina GAIIx platform and with averagely 0.01× whole genome sequencing data we detected 13 samples with numerical chromosomal abnormalities by z-test. With up to 2× whole genome sequencing data we were able to detect microdeletion/microduplication (ranged from 1.4 Mb to 37.3 Mb of 5 samples from chorionic villus sampling (CVS) using SeqSeq algorithm. Our work demonstrated MPS is a robust and accurate approach to detect aneuploidy and imbalanced chromosomal arrangements in prenatal samples.


Introduction
Chromosomal abnormalities occur in 1 of 160 live births [1].The risk of giving birth to a child with chromosomal abnormalities, especially Down syndrome (OMIM# 190685), increases throughout a woman's reproductive years [2,3]. Prenatal diagnosis of fetal chromosomal abnormalities is the most common indication for invasive prenatal testing. The prevalence of chromosomal abnormalities in fetuses with aneuploidy accounts for 6-11% of all stillbirths and neonatal deaths [4,5]. Consequently, screening and diagnostic programs to detect the most common trisomies in live born infants are well established [6]. Currently, G-band karyotyping and several molecular genetic methods including multiplex ligationdependent probe amplification (MLPA), fluorescence in situ hybridization (FISH), quantitative fluorescent PCR (QF-PCR) and microarray-based comparative genomic hybridization (ar-rayCGH) have been well established for prenatal diagnosis of chromosomal abnormalities in clinical labs [7,8]. Although these testing methods have been proved to be highly reliable, the major limitation remains restricted resolution or can only achieve limited coverage on the human genome at one time [9,10].
To overcome these limitations, in this study we have developed a new method based on massively parallel sequencing (MPS) platform to detect fetal chromosomal abnormalities, which is independent of particular genetic markers and cell culture in medical practice. We directly sequenced 62 DNA samples extracted from uncultured amniotic fluid (AF) of pregnant women. After statistical analysis, we can clearly detect numerical chromosomal abnormalities among 46 chromosomes. The whole process only takes 7 days and the results were validated by full karyotyping analysis. Subsequently, we investigated the presence of copy number variations among 5 prenatal samples previously identified by arrayCGH. We demonstrated that with different whole genome sequencing coverage MPS platform could be applied for identification of aneuploidy and imbalanced chromosomal arrangements, and those approaches are more sensitive and effective compared with conventional methods.

Detection of Fetal Aneuploidy
In this study, the 32 test cases were used for sequencing analysis under double-blind conditions. For all 62 AF samples (30 normal controls and 32 test cases), we obtained 0.55,3.28 million 35-bp single-end reads (or 19.25 Mb,114.80 Mb sequencing data) which corresponds to 0.006,0.0386 human genome depth. Consequently, 0.20,1.15 million unique reads (UR) were obtained for each sample (Table 1). For each chromosome, the number of unique reads was counted and the UR% was calculated. The complete set of UR% for 62 samples is listed in Table S1. UR% for Chromosome X, Y also showed in Fig. 1. The gender of each sample can be determined from UR% of Chromosome X, Y. For 30 normal control samples, the mean and standard deviation (SD) of UR% for each chromosome were calculated. Then the z-scores of each of the chromosomes, except the Y chromosome, for each of the normal controls and test cases were calculated according to the formula described in Materials and methods (Table S2). From 32 test cases, 5 cases had a z-score.20 for chromosome 18 (indicating Trisomy 18), 6 cases had a zscore.20 for chromosome 21 (indicating Trisomy 21), and 2 cases had a z-scorer20 for chromosome X (indicating XO). All the other chromosomes had z-scores within 63 for all 32 test cases (indicating euploid chromosomes) (Fig. 2, Table. S1). The detection of fetal aneuploidy based on sequencing analysis agreed with karyotyping results.
Normalized UR% values for all 22 autosomes and chromosome X were calculated (Fig. 3). 5 cases of trisomy 18, 6 cases of trisomy 21 and 2 cases of XO were successfully detected in the cohort of 32 test cases, which presented 100% of sensitivity and 100% of specificity.

Detection of imbalanced chromosomal arrangements
To further validated our developed and optimized analysis method among cases with imbalanced chromosomal arrangements, we obtained 44,70 M sequence tags for the 5 ''known'' samples with 1.4 to 2.36 whole genome depth. To detect segmental copy number variations, a sequence-based Matlab CNV detection package, SegSeq, with YH reference genome as comparative genome were used. By applying the SegSeq analysis, we detected microdeletions and microduplications in all 5 samples ( Table 2). All the 5 chromosomal copy number variations ranged from 1.4 Mb to 37.3 Mb were validated by arrayCGH and were mapped precisely to the correct location. Four pregnancies were terminated after genetic detection and only one pregnant gave a live birth (Table S3).
Two samples (A10021383, A10071659) were selected to further ascertain the results on the HumanOmni2.5 M chip (Illumina). CNV partition Algorithm plug-in, was used to detect copy number variations in these two samples. A duplication at 2q36,q37.3 (230369496,242444380) and a deletion at 13q32.3q33.3 (97091318,106462788) were correctly detected, totally in concordance with the arrayCGH and sequencing analysis results.

Discussion
Massively parallel sequencing has been reported only to apply in noninvasive prenatal diagnosis of trisomy 21, 18 and 13 based on cell-free fetal DNA, due to it is limited amount of fragmented fetal DNA [11]. This also makes noninvasive MPS-based prenatal diagnosis difficult to detect all the chromosomal aneuploidies and the sexual chromosome abnormality accurately. In this study, we demonstrated for the first time that combined MPS with powerful bioinformatics analysis method can accurately diagnosis fetal aneuploidy and imbalanced chromosomal structural abnormalities. In fact, this study reports the first retrospective use of MPS (so called next generation sequencing) for prenatal diagnostics of chromosomal imbalance rearrangements to date and shows that it is practically feasible on a large-scale prenatal diagnosis of fetal chromosomal abnormalities.
By establishing a normal control sequencing tag data set, we have been able to demonstrate this new approach only requires a minimum among of DNA materials (100 ng) to achieve the identification of aneuploidies with a ultra low sequencing coverage (0.016). Comparing to the golden standard (G-Band karyotyping) in clinical practice, MPS has no time limitation. Also as long as 100 ng genomic DNA can be extracted from tissues at any gestational weeks, MPS can be performed and report all fetal aneuploidies in 7 days. If necessary, more sequence reads can be performed to detect whether microdeletion or microduplication exists in the fetal genome which may result in severe developmental retardation. In our study, tissues obtained from fetus, such as amniotic fluid, CVS and placenta can be analyzed without cell culture since 100 ng genomic DNA is sufficient for library preparation and sequencing. So it can be used for the research into the molecular mechanism of miscarriage, stillbirth and fetal death when tissues are difficult to culture. Furthermore, our validation study on the 5 arrayCGH samples show that when more sophisticated sequencing protocols and bioinformatics algorithms are applied to the analysis, it is possible to detect smaller size chromosomal copy number variations as well as complex rearrangements across the whole genome of the fetus, such as balanced chromosomal arrangements,structure variations or even single-gene disorders. With the application of thirdgeneration sequencing system in clinical laboratories, such as Miseq/Illumina and Ion Torrent PGM/Life Technologies, the whole process will take less time and acceptable price. Thus, it is likely that MPS will play an increasingly important role in the future development of prenatal screening and diagnosis.
A potential weakness of the study was that in the figure of describing the ratio of unique reads in each chromosome, chromosome19 and 22 have a differently huge coefficient of variation because of their extremely high GC content and made the detection of trisomy19 and trisomy22 difficult. Further study will be set up to deal with these problems, for example the computational correction of GC content among chromosomes. Other chromosomal abnormalities, such as balanced translocation or incomplete aneuploidy caused by mosaics or partial duplication or deletion of a chromosome should, in principle, also be detectable. Further studies are required to determine the effectiveness of massively parallel genomic sequencing in detecting these rare aberrations. Another weakness for the new methodology was the starting materials for library construction; with the conventional Illumina library construction approach100 ng genomic DNA of fetus was required. To further reduce the risk to pregnancies it would be important to reduce the amount for AF or CVS samples, other library construction methods such as using in vitro transposition may be an alternative solution.
In conclusion, we have demonstrated the usefulness of massively parallel sequencing to detect fetal aneuploidy and imbalanced chromosomal abnormalities of genomic DNA in prenatal samples. In principle, massively parallel sequencing can also reveal other features of the genomic material from amniotic fluid such as histone modifications as well as epigenetic DNA methylation. With the rapid reduction of sequencing cost, we expect that the strategy described in this article will become a powerful tool in the detection of all kinds of chromosomal abnormalities in clinical settings.

Subject Enrollment and Sample Recruitment
The study was approved by the Institutional Review Board of Beijing Obstetrics and Gynecology hospital of the Capital Medical University. Informed consent was obtained from each participant.
A total of 32 pregnant women at a high risk of Down's syndrome were recruited as test cases from the Beijing Obstetrics and Gynecology hospital during the period of January to May 2010. Amniocentesis was applied at 19,22th gestational week and standard G-band karyotyping analysis was performed. Another 30 euploid AF samples (20 with male fetus and 10 with female fetus) were included as normal control at the same hospital. To validate the sensitivity of analysis method for detection of microdeletion/ microduplication, 5 chorionic villus sampling (CVS) samples validated by karyotyping and arrayCGH were recruited from the Prenatal Genetic Diagnosis Centre (PGDC) at Department of Obstetrics & Gynecology, Chinese University of Hong Kong.

Sample Preparation and Sequencing
Genomic DNA was extracted from uncultured AF samples with Micro DNA Kit (Tiangen) and quantified with the Quant-iT dsDNA HS Assay Kit (Invitrogen). 100 ng genomic DNA from each sample was sheared into small fragments ranged from 100 to 400 bp with Bioruptor (Diagenode). After end-repair, ''A''overhanging and adapter-ligation, DNA fragments of 300 bp (625 bp) in length were selected by 2% agarose gel electrophoresis and underwent 12 cycles of PCR with multiplex primers. PCR products were purified by Agencourt AMPure Kit (Beckman). Size distribution of the library was detected by Agilent Bioanalyzer DNA 1000 kit (Agilent Technologies) and the concentration was measured by quantitative PCR (qPCR). Libraries with different index tags were mixed in equal moles into a pool and sequenced with single-end 36 cycle multiplex sequencing on Illumina GAIIx platform.
For the 5 DNA samples from CVS tissues, sequencing libraries with the insert size of 500 bp (625 bp) were prepared. Paired-end 100 cycle multiplex sequencing was performed on Illumina HiSeq 2000.

Bioinformatics analysis
z-score for detection of fetal aneuploidy. 35-bp single-end reads from 62 AF samples (30 normal controls and 32 test cases) were aligned against repeat-masked human genome build 36 (hg18) by ELAND. Unique reads (UR), which can be mapped to reference genome sequence without any mismatches or alternative    For autosomes, considering about the type I error rate (a) of 0.01, 3 was set as a cut-off value to determine the fetal trisomy.
SegSeq algorithm for detection of microdeletion/ microduplication. We mapped the Illumina reads to the reference sequence of human genome (HG18, NCBI 36.3) by Short Oligonucleotide Analysis Package aligner (SOAP2) (http:// soap.genomics.org.cn/) [12] with parameter about total allowed mismatches (-v 5), seed length (-s 40), minimal aligning length (-l 40) and insert DNA size enabled. Only unique reads were remained in following CNV analysis. For the CNV detection, we employed a MATLAB packet, SegSeq (http://www.broadinstitute.org/cgi-bin/cancer/publications/pub_paper.cgi?mode = view&paper_id = 182) [13], with YH Illumina reads [14] as reference control. Segments with copy ratio, calculated by SegSeq, less than 0.75 or greater than 1.25 were reported as variations. CNVs on critical regions of identified diseases would be an important signal for clinical screening/ diagnose.

CNV Detection by arrayCGH and SNP typing
array. Genomic chromosomal copy number variants (CNVs) were detected using a targeted high resolution 44 K oligonucleotide array specifically constructed for prenatal screening with the intention of targeting common trisomic aneuploidies and most known microdeletion and microduplication syndromes. This Fetal DNA chip included telomeric and pericentromeric regions, examining the genome to a resolution of 100 kb (http://www.fetalmedicine.hk/en/ Fetal_DNA_Chip.asp) [15]. The Fetal DNA chip is specially (http:// www.fetalmedicine.hk/en/Fetal_DNA_chip/Appendix_I.pdf) with most of the known common non-pathogenic CNVs regions removed. This chip provides a means to detect chromosomal aberrations with resolution of ,100 Kb across the genome. The quality of the array was analysed using Agilent DNA analytics software and cases where the Derivative Log Ratio spread of the array was .0.25 were excluded from further data analysis. Data reporting variations in copy number were released after excluding known non-pathogenic chromosome copy number variants that have been listed at the Database of Genomic Variants.
To ascertain the accuracy of MSP-based CNV detection method, two CVS samples were analyzed with HumanOmni2.5-Quad Bead Chip according to Illumina manufacturer's protocol and CNV partition plug-in software was employed for DNA copy number analysis.