Use of amplicon-based sequencing for testing fetal identity and monogenic traits with single circulating trophoblast (SCT) prenatal diagnosis

A major challenge for cell-based non-invasive prenatal testing (NIPT) is to distinguish individual presumptive fetal cells from maternal cells in female pregnancies. We have sought a rapid, robust, versatile, and low-cost next-generation sequencing method to facilitate this process. Toward this goal, single isolated cells underwent whole genome amplification prior to genotyping. Multiple highly polymorphic genomic regions (including HLA-A and HLA-B) with 10-20 very informative single nucleotide polymorphisms (SNPs) within a 200 bp interval were amplified with a modified method based on other publications. To enhance the power of cell identification, approximately 40 Human Identification SNP (Applied Biosystems) test amplicons were also utilized. This method allowed reliable differentiation of fetal and maternal cells. In fully informative cases, two haplotypes were found within the maternal reads, and fetal cells showed reads with one but not the second maternal haplotype while also showing a novel paternal haplotype absent in the mother. For SNP typing, at least 2 SNPs and 10% of informative SNPs were required to differentiate a fetal cell from a maternal cell. A paternal DNA sample is not required using this method. The assay also successfully detected point mutations causing Tay Sachs disease, cystic fibrosis, and hemoglobinopathies in single lymphoblastoid cells, and monogenic disease-causing mutations in three cell-based NIPT cases. This method could be applicable for any monogenic diagnosis.


Introduction
. For studies of trophoblasts from specific at-risk cases, we also prepared amplicons for 1 1 6 DHCR7 and RASPN (Table 1). The primers of these amplicons were prepared with adaptors 1 1 7 compatible for Illumina True-seq HT i5 and i7 adaptors.  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20108100 doi: medRxiv preprint follow with a customized R script (S4 File) were used for calling variants within selected 1 4 1 intervals (S2 File). The cutoff for calling a variant is at least ten reads of the less frequent allele 1 4 2 and 5% of all reads.

4 3
Variants from each sample were summarized and compared with their paired control with an 1 4 4 R script (S6 File). Typically, a cell with more than 2 SNPs and more than 10% of comparable 1 4 5 SNPs different from its maternal gDNA control is considered as a likely fetal cell. Otherwise, it 1 4 6 will be classified as an uninformative cell.  CIGAR information. The new concise sequences were tabulated and grouped with sequence 1 5 5 similarity according to their Levenshtein distance. We assigned a haplotype to each major group 1 5 6 of a concise sequence. Pair-wise haplotype comparison between the maternal gDNA sample and 1 5 7 the putative fetal cells were performed with another R script (S7 File). All scripts are hosted and 1 5 8 maintained on https://github.com/xmzhuo/NIPT_genotyping. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. We compared the coverage of our amplicons for SNPs with gDNA and NIPT cell WGA 1 6 3 products (Fig 2). For gDNA, most of the samples have very high coverage, which reflects the 1 6 4 distribution of samples with many scorable SNPs. The WGA product of maternal white blood   with Samtools and retrieving the read depth with bam-readcount (Fig 3). This step will produce a  Information on indels was masked to avoid confusion in later steps. Then, we performed the 1 7 8 pair-wise comparison of WGA products with the maternal gDNA. The script will calculate how 1 7 9 many SNPs are different between two DNA samples. In an example case with two cells (Fig 3), CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20108100 doi: medRxiv preprint suggests that it is likely to be a fetal cell. The other cell shows identical calls with maternal 1 8 2 gDNA, which suggests that it may not be a fetal cell. having an allele that the mother does not have. Cell B is homozygous for all SNPs 1 8 7 indistinguishable from the mother and is interpreted as maternal or noninformative (fetal with  We have compared DNA from 156 blood samples, which include gDNA and DNA from 1 9 1 WBCs and fetal cells (confirmed by orthogonal methods, such as gender PCR or low coverage 1 9 2 WGS), to study the sensitivity and specificity of SNP typing in identifying alleles present in the 1 9 3 cell that were not present in the mother and thus being real fetal cells. We used the WBC WGA 1 9 4 product from the matched maternal blood as a true negative (Table 2). Typically, a SNP 1 9 5 difference of more than 6% can largely avoid a false call in our study without missing real fetal 1 9 6 cells. In our application, we used a SNP difference of at least two SNPs and 10% of scorable 1 9 7 SNPs for our calling. To date, we found that 68.9% genotyped putative cells from 156 cases are   CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20108100 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20108100 doi: medRxiv preprint In NIPT #1000 (Fig 4), we isolated eight putative fetal cells from the mother's blood for 2 0 2 WGA. Cells G78, G212, G 227, and G232 all have at least four SNPs which differed from the 2 0 3 maternal gDNA; thus, they were confirmed to be fetal in origin. The remaining four cells (G79, 2 0 4 G113 and G320) had 0-2 SNPs which differed from the maternal gDNA, and these were 2 0 5 considered uninformative and possibly white blood cells accidently isolated from the mother's 2 0 6 blood. Cell G106 has only 2 SNP difference in less than 20 informative SNPs, which was 2 0 7 considered as a low-quality sample and uninformative. informative SNPs per sample. The Y-axis is the number of SNPs with non-maternal alleles in a 2 1 0 cell. As expected, the maternal gDNA samples gave no alleles not present in the mother. Data  The haplotyping of NIPT WGA products potentially has higher power than SNP typing for 2 1 5 identifying fetal cells. Since the WGA product typically went through the 14-16 cycles of 2 1 6 amplification from trace amounts of input DNA, there is a small chance of introducing new 2 1 7 mutations, which would affect the precision of SNP typing at low read-depth. For example, some 2 1 8 of the cells from case #1000 (Fig 4) had one low-depth SNP difference from the maternal gDNA. To address this issue, we developed a haplotyping approach for multiple highly polymorphic interpretation (i.e., one nucleotide change is less likely to change the classification of a major 2 2 3 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20108100 doi: medRxiv preprint haplotype group, which is comparable to HLA typing approach). Mutations introducing a 2 2 4 random change are not rare, but mutations switching from one allele at a SNP to the other 2 2 5 polymorphic allele are much rarer. In addition, the haplotypes of an amplicon can be treated as a 2 2 6 permutation of a given number of SNPs, which theoretically generates much more haplotypes 2 2 7 than SNP types and has higher power at differentiating two cells. Third, we can estimate the is consistent with an artifact from the final step of amplicon-seq.

3 2
We performed haplotyping with the following steps (Fig 5). After regular alignment with overlapping Read 1 and Read 2 were merged. If the amplicon is longer than two reads joined 2 3 5 together, we merged the two reads and padded the gap with a tandem repeat of N. The merged 2 3 6 reads were remapped with BWA-MEM again with a lenient setting to tolerate a larger gap. The 2 3 7 remapped reads were processed with an R script to extract the selected SNPs, and each read was 2 3 8 reconstituted with the concise sequence while preserving the read ID. The concise reads were 2 3 9 then tallied and ranked according to frequency (typically, only the top 10 were kept, which 2 4 0 usually consist of more than 99.99% of all types of reads). The Levenshtein distances were 2 4 1 calculated for these reads, which typically ended up with only one or two major groups to 2 4 2 represent the haplotype of this amplicon. To compare the haplotypes of more than two samples, will determine if these samples share the same read group (haplotype). For example, the maternal 2 4 5 gDNA carried a mocked haplotype 1 (TA) and haplotype 2 (GC). A positive fetal cell should be 2 4 6 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20108100 doi: medRxiv preprint identified to carry at least one new haplotype 3 (GA), which would be TA/GA or GA/GA (result 2 4 7 from dropout of haplotype 1 or 2) (Fig 5). for constructing concise haplotypes. To demonstrate the workflow, we present example short 2 5 2 reads with TA and GC haplotypes. The haplotyping approach can effectively differentiate a candidate cell from maternal gDNA.

5 5
In the case shown in Fig 6, we have the gDNA from both parents. As described previously, we 2 5 6 extracted all 28 SNP sites in the HLA-A amplicon and reconstructed a concise 28 nt sequence for 2 5 7 each read. In this case, the top four most frequent read types of potential fetal cells can be 2 5 8 grouped into two major groups, with a Levenshtein distance of more than 2 (S1 Fig). The intra-2 5 9 group difference has a distance of less than 0.5, which suggests a difference of only one 2 6 0 nucleotide. The difference likely results from artifacts introduced during extensive amplification 2 6 1 (WGA then PCR). The same condition was observed in maternal gDNA and paternal gDNA as 2 6 2 well. We observed the inheritance pattern of haplotypes when all read types from maternal, 2 6 3 paternal, and fetal DNA were plotted together. One fetal haplotype matched with the mother and 2 6 4 the other matched with the father. From these haplotype groupings, we concluded that this is a 2 6 5 true fetal cell. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20108100 doi: medRxiv preprint cell has an allele that the mother does not have and indicate that the putative fetal cell is indeed 2 6 9 fetal. The 24 SNPs to the right are not informative for the fetal cell as they do not have an allele 2 7 0 that the mother does not have. Data from NIPT case number 1000.  We also wished to use this method to genotype for monogenic disease mutations. We first were obtained from the Coriell Institute and single lymphoblasts were picked from tissue culture and processed for WGA. The cells were isolated and genotyped by methods described herein. We successfully detected the known variants in the WGA products of cells carrying these 2 8 7 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20108100 doi: medRxiv preprint mutations (Fig 7). The allele dropout rate was 15% and 8% for unfixed and fixed lymphobasts, 2 8 8 respectively. allele. For the DHCR7 family, seven fetal cells were recovered and four were genotyped. One 3 0 4 cell was heterozygous for the mutation (Fig. 8), while two cells had dropout for the normal allele cell in Fig 8C, shows absence of the maternal mutation but presence of the paternal mutation. were heterozygous while one cell had dropout for the normal allele and one cell had dropout for 3 1 0 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20108100 doi: medRxiv preprint the paternal mutant allele. There is a small probability that the fetus carries the maternal 3 1 1 mutation, but there was dropout in all four cells; more likely the fetus does not carry the maternal 3 1 2 mutation in agreement with the CVS data. The fetus does carry the paternal mutation. normal allele are colored grey and summed in red. In panel B, the mother is heterozygous for a 3 1 7 DHCR7 mutation that is also present in the father. Fetal trophoblast G1286 is also heterozygous out for the N88K variant cannot be ruled out, and multiple cells must be tested to gain statistical 3 2 4 evidence that the fetus has not inherited the N88K variant. All results agreed with data from 3 2 5 amniocentesis or CVS. Data from NIPT case numbers 1180, 1492, and 1607.

2 6
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

8
. D  e  b  e  l  j  a  k  M  ,  F  r  e  e  d  D  N  ,  W  e  l  c  h  J  A  ,  H  a  l  e  y  L  ,  B  e  i  e  r  l  K  ,  I  g  l  e  h  a  r  t  B  S  ,  e  t  a  l  .  H  a  p  l  o  t  y  p  e  c  o  u  n  t  i  n  g  b  y  n  e  x  t  -4  0  4   g  e  n  e  r  a  t  i  o  n  s  e  q  u  e  n  c  i  n  g  f  o  r  u  l  t  r  a  s  e  n  s  i  t  i  v  e  h  u  m  a  n  D  N  A  d  e  t  e  c  t  i  o  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 2, 2020. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20108100 doi: medRxiv preprint