Improving the calling of non-invasive prenatal testing on 13-/18-/21-trisomy by support vector machine discrimination

With the advance of next-generation sequencing (NGS) technologies, non-invasive prenatal testing (NIPT) has been developed and employed in fetal aneuploidy screening on 13-/18-/21-trisomies through detecting cell-free fetal DNA (cffDNA) in maternal blood. Although Z-test is widely used in NIPT NGS data analysis, there is still necessity to improve its accuracy for reducing a) false negatives and false positives, and b) the ratio of unclassified data, so as to lower the potential harm to patients as well as the induced cost of retests. Combining the multiple Z-tests with indexes of clinical signs and quality control, features were collected from the known samples and scaled for model training using support vector machine (SVM). We trained SVM models from the qualified NIPT NGS data that Z-test can discriminate and tested the performance on the data that Z-test cannot discriminate. On screenings of 13-/18-/21-trisomies, the trained SVM models achieved 100% accuracies in both internal validations and unknown sample predictions. It is shown that other machine learning (ML) models can also achieve similar high accuracy, and SVM model is most robust in this study. Moreover, four false positives and four false negatives caused by Z-test were corrected by using the SVM models. To our knowledge, this is one of the earliest studies to employ SVM in NIPT NGS data analysis. It is expected to replace Z-test in clinical practice.


Introduction
On the basis of the discovery of cell-free fetal DNA (cffDNA) in maternal plasma and serum [1] as well as the advance of next-generation sequencing (NGS) technology [2], Non-invasive prenatal testing (NIPT) has been developed in 2008 [3,4] and applied in clinical use recent years for fetal aneuploidy detection mainly on Down's syndrome, Edward's syndrome and Patau's syndrome, respectively corresponding to 21-trisomy(T21), 18-trisomy(T18) and 13-trisomy(T13) [3][4][5]. Before the application of the NGS-based NIPT, there were mainly two PLOS  methods to detect 13-/18-/21-trisomies in clinical practice. One is the non-invasive serological test with high rate of false positive and false negative [6]; the other is the golden standard-the invasive amniocentesis with a rate of 1/250 inducing abortion [7]. Comparatively, NIPT is much more accurate than serological test and safer than amniocentesis. The International Society for Prenatal Diagnosis [8], the National Society of Genetic Counselors [9], the American College of Obstetricians and Gynecologists and the Society for Maternal-Fetal Medicine [10] had published committee opinions stating that such a cffDNA testing could be offered to pregnant women at high risk for fetal aneuploidy as a screening option after counseling. Except those employing deep sequencing or array-based methods, most NIPTs were performed using the low-coverage next-generation sequencing (NGS) platforms such as Verifi [11], Materni21 [12], panorama [13] and NIFTY [14]. Similar to copy number variation analysis, the sequencing reads from a test sample were mapped and counted as depth in bins of a certain size, following by a measurement of deviation from negative control. Since the triploid fetus has 2-5% more cffDNA than diploid fetus, Z-test was frequently employed in deviation measurement [3,15].
Statistically, Z score indicates the significance of deviation from the baseline, e.g. Z > 3 means that the test data approximates the baseline with P < 0.001 and hence is likely to be from a triploid sample. Types of Z-tests were employed in different NIPT studies, such as Chiu et al. using average of negatives as baseline [3] and Zhang et al. using internal reference as baseline [15]. However, these one-Z-test based approaches have many problems in clinical practice. First, only single Z score is insufficient to give accurate prediction on different samples due to read distribution bias among individuals. Further, fetal fraction has been proven to be crucial in trisomy determination [16]; however, it was not involved in one-Z-test based approach in NIPT NGS data analysis. These problems could result in inaccurate prediction, high cost of retesting and delay of treatment.
As shown in Fig 1, the distributions of Z scores of negatives and positives overlapped in a certain intervals, where the cutoff Z = 3 was unable to discriminate. A simulation shown in S1 Fig indicates that small portion of negative samples could have Z > 3 while small portion of positive samples could have Z < 3, especially when fetal fraction is around or less than 5%. In clinical practice, it is guaranteed that any sample with Z score in an interval (1.96, 4), called "grey zone", requires a retest. It is because that only using Z = 3 as cutoff to separate positives from negatives may result in inaccurate results.
Therefore, it is meaningful to develop a more precise method for NIPT data analysis. The support vector machine (SVM) is an excellent tool for this purpose. It is a supervised machine learning (ML) algorithm that identifies an arbitrarily defined framework for discriminating query data using a model build from training dataset with selected features [17]. SVM has already shown high robustness and accuracy in fields [18], such as cancer subtype classification [19], splice site prediction [20] and single nucleotide polymorphism (SNP) prediction [21].
For NIPT on 13-/18/-21-trisomies, it has been reported that positive samples are much fewer than negatives [22,23]. Referred to a clinical experience from~150,000 pregnancies in mainland China [23], the positive rates of 13-/18-/21-trisomies were respectively 0.045%, 0.15% and 0.52%. The large difference in number of positive and negative could lead to class imbalance in ML model training if all data were employed in research. However SVM could reduce the effect of class imbalance by selecting the support vectors from all given input data. Further, feature co-linearity would not affect the SVM model in discrimination. Therefore in this study, SVM is employed to improve the prediction on NIPT NGS data with a purpose to replace the one-Z-test based approach in current clinical practice. Combining multiple Z values with indexes of clinical signs and quality control, SVM model was trained for each dataset of 13-/18-/21-trisomy to accurately discriminate the samples, especially the "grey zone" NIPT results and those falsely predicted before.

Specimen source
This study was a retrospective analysis on the NIPT NGS data obtained from March to July 2016 at Guangzhou DaAn Clinical Laboratory Center. Informed consent was obtained from all participants. Information such as gestational week and maternal age was obtained while the names of participants have been masked so as to protect their privacy. The trisomy samples were validated by amniocentesis. The NIPT experiments were based on the semi-conductor sequencing platform same as in Liao et al 's paper in 2013 [24]. The reported results were output through a CFDA-certified standard operation protocol (SOP) and a DaAn Gene's compiled bioinformatics plugin named "Seqboost" developed on the basis of Liao et al 's paper [24] that described a one-Z-test based approach for NIPT prediction.

Data summary
In total 5518 NIPT data were collected during the period from two semi-conductor sequencers located in the NIPT lab in Guangzhou (Table 1). There are forty-six data from triploid samples with one data labeled "#5267" is positive in both T18 and T21 (S1 Table). Hence there are forty-seven triploid cases, respectively five for T13, fifteen for T18 and twenty-seven for T21. Average age of pregnant mother with negative results was 31.83 (95% CI: 15-51), slightly larger than the average age of ones with positive results (31.70, 95% CI: 17-47). Another 500 negative samples were recruited as reference negative control for NIPT calling.
As shown in S1 Table, a series of values were listed to demonstrate the information of these data, including "Z_run" as the Z scores output by "Seqboost" in one's run, "Real_state" as the results confirmed by prenatal or postnatal diagnosis, fetal fraction predicted using SeqFF [25], peak value of read length, maternal age and gestational week. According to CFDA's NIPT policy and DaAn Gene's SOP, Z score = 3 is the cutoff to distinguish negatives and positives. Negatives and positives are shown in dark and red respectively. Green dash indicates the cutoff of Z = 3 that was frequently used as a criterion in discrimination. Blue dashes shows the "grey zone" interval between Z = 1.96 and 4, which means failure in discrimination using Z = 3, and requires a retest.
Hence in routine NIPT, the data with "Z_run � 3" would be regarded as positive, meaning it's significantly deviated from the baseline of reference dataset; while those with "Z_run < 3" would be regarded as negative. Hence, the data predicted as positive with "Real_state = -1" as negative were false positives; those predicted as negative with "Real_state = 1" as positive were false negatives.
Of these 5518 data, 766 data with unique reads fewer than 3,000,000 or predicted fetal fraction less than 5% were labeled as "QC-filtered" on the basis of quality control (QC) according to the SOP. The remaining "QC-pass" 4752 data were categorized into three groups for specified chromosomes on the basis of the principle of statistics: Group "N" as those with Z scores smaller than 1.96, meaning not significantly higher than baseline of reference dataset (P > 0.05); Group "P" as those with Z scores larger than 4, meaning significantly higher than baseline of reference dataset (P < 0.0001); Group "Unclassified" as those with Z scores between 1.96 and 4, meaning retest is required for double check in nowadays' NIPT. For each specified chromosome, data in Groups "N" and "P" were employed to train models and conduct internal validation in this study. Data in Group "Unclassified" and "QC-filtered" were used in performance test. We also employed the trained model to correct the four false positives and four false negatives caused by Z-test in previous NIPT reports.

Feature selection and data reanalysis
Reads generated from semi-conductor Ion Proton Sequencer (Life Technologies) were trimmed and mapped to human genome 19 (hg19), following by recalibration and realignment through the automated pipeline of the supporting Ion Torrent Suite Software (Life Technologies). Then reads were filtered using SAMtools' command [26] 'samtools view-F 1024 -q 10' to remove PCR duplicates and low quality (mapping quality smaller than 10) reads. Thus, the remaining high-quality unique reads were used for the following analysis. Similar with the CFDA-certified DaAn Gene's SOP, read-depth for each contiguous 20kb bin was calculated using the genome-CoverageBed program in BEDtools [27]. To remove the bias of read-depth distribution caused by data volume difference, GC content and casual sequencing bias respectively, three types of normalization were applied in four steps: 1) Intra-run normalization was used to eliminate the difference between each data; 2) Winsorization that was a transformation reducing the influence of outliers by moving observations outside a certain fractile in the distribution to that fractile [28], was employed to reduce the extreme read-depth among each contiguous window consisting of 15 bins of 20 kb; 3) LOESS was employed to remove GC-bias same as in Chiu et al. 's paper [3]; 4) Intra-run normalization again due to steps 2) and 3) could induce bias of data size. Mean and standard deviation (s.d.) of read-depth of each chromosome were calculated for further statistical analysis. The normalized read-depth of each bin was added up every 15 bins to smooth the readdepth signal. Then the mean and standard deviation of merged read-depth on each chromosome was calculated to statistical analysis for fetal aneuploidy evaluation. For each data, six Z scores were called as described by the following formula: where Z_baseline_vs_n means the Z score normalized to the average of reference negative samples on chromosome i, and ref. means the normalized read-depth values of reference negative samples.
where Z_baseline_vs_p means the Z score normalized to the average of predicted reference positive data, fetal% means fetal DNA fraction. The predicted reference positive data is equal to the mean value of reference negative data multiplied by a factor (1+fetal%/2) based on the assumption that half of fetal fraction would be increased when trisomy happens.
where Z_chr_vs_n means the Z score normalized to the internal reference autosome value that is the median of all averages of normalized read-depth in each autosome of this sample, which was similar in Lau's paper [15]. Similarly, we have: where Z_chr_vs_p means the Z score normalized to the predicted positive internal reference autosome value that is the median of predicted positive averages of normalized read-depth in each autosome of this sample.
Z sample vs n i ¼ À ðmeanðref : i Þ À mean i Þ S m � MAD i . ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where Z_sample_vs_n means the Z score normalized to the average of sample data, MAD means the median absolute deviation of read-depth, window means the number of windows on the chromosome i, and S m is a factor equal to 1.4826 and makes S m � MAD i . ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi window i p approximate to the standard deviation of read-depth of sample data.
where Z_sample_vs_p i means the Z score normalized to the mean value of predicted positive sample data.

SVM discrimination
Six Z scores together with fetal fraction, peak value of read length, maternal age and gestational week, were collected for support vector machine classification. For the ten features selected for SVM classification model training, the six Z score-based features were essential because their distributions between negatives and positives were significantly different (Wilcox Rank Sum test, P < 2.2×10 −16 ), while the other four features were not biasedly distributed ( Table 2). Ten features were collected from the data in Groups "N" and "P" for model building on specified chromosomes. The six Z scores obtained from formula (1) to (6) do not require scaling because they were already normalized, while the other 4 features including fetal fraction, peak value of read length, maternal age and gestational week, would be normalized to same scale ranging from 0 to 3 by the command 'svm-scale-l 0 -u +3'. For the training dataset, '-s' was used to save the scaling range, and '-r' was used to restore the saved scaling range on test data. Then, the SVM model was constructed by 'svm-train' and employed to do prediction by 'svm-predict' in LBSVM package. Despite SVM was quite efficient in handling sparse data, we also evaluated its performance by assigning two class weights (using '-wi' in training) that were inversely correlated with their instance number, aiming to improve the accuracy in unbalanced data.
We trained SVM models using those ten selected features from chromosomes 13, 18 and 21 datasets separately. Let us denote class labels as y i 2 {-1, 1} for normal state and trisomy of each specified chromosome, respectively. Given a set of training data {x i , y i }, i = 1, 2, . . ., n, the SVM returns a maximum margin separating hyper-plane with w and an offset b using subject to: . . . ; n: where w are feature weights representing the hyper-plane, ε i � 0are slack variables designed to allow misclassified data points, and C > 0 is the penalty parameter for misclassification.
By solving for the Lagrangian dual of formula (7), we could obtain a simplified optimization problem This dual problem could be efficiently solved using quadratic programming or sequential minimal optimization (SMO) algorithm.
Here, the solution for w in formula (7) is also given by Once the optimal solution for α i,. . ., α n is found, the optimal b is then determined using the maximum margin condition: The decision function for any new point x is then with f > 0 assign to class 1 and f < 0 assign to class -1.
The inner product ϕ x i ð Þϕ x j � � in formula (8) could also be represented as a kernel function Here, we applied two kinds of kernel functions for our data: linear kernel function and radial basis function (RBF). The linear kernel function is based on inner products of input features between any two samples, so we could verify if our data are linearly separable. The feature space of the RBF kernel, on the other hand, enable us to learn a nonlinear classification by transforming input features into an implied feature space with an infinite number of dimensions.
The RBF kernel is defined as: where γ is a kernel parameter controlling the sensitivity of the kernel function. For trainings with linear kernel function, only one hyper-parameter C needs to be adjusted to select an appropriate model through k-fold cross validation. A low C makes the decision boundary smooth, while a high C could select more samples as support vectors and classify more training samples correctly, which thus is prone to be over-fitting. For the RBF kernel, besides C, there is another kernel parameter γ, defining the extent of influence for those support vectors, with high values meaning a narrow range of influence. To select optimal parameters (C and γ for RBF kernel; C for linear kernel), we employed a grid search approach with 0.1 as step and 5-fold cross validation by using grid.py from LIBSVM [17] and expand.grid from 'caret' package [29]. And to prevent over-fitting, C values were carefully checked to avoid solutions with large values.

Other discrimination methods
Other discrimination methods such as linear discriminant analysis (LDA) [28], quadratic discriminant analysis (QDA), decision tree (Dtree) were also tested on the same NIPT dataset in this study. Both of LDA and QDA assumed samples drawn from a multivariate normal distribution Nðμ; ΣÞ with mean vector μ and covariance matrix Σ. The probability of k class was given by: where p k was the prior class probability of k classes. LDA arises when the covariance matrix for each classes were assumed to be the same, in which case the discrimination boundary could be simplified to: In QDA analysis, the covariance matrix from each class is different and its discrimination boundary is: where S k was the covariance matrix for k class; LDA and QDA were trained by the 'lda' and 'qda' methods in 'MASS' package, respectively. For Dtree, we directly applied the 'ctree' method in the R package 'party', which utilized a binary recursive portion approach to rank and select those input variables according to their association with the input classes. We also employed AdaBoost to create a highly accurate prediction rule using the 'caret' package [29]. We implemented AdaBoost.M1 with decision trees as weak learners. The final classifier of AdaBoost was a weighted combination of weak classifiers, Where h t , β t were the induced weak classifiers and their assigned weights, respectively. The AdaBoost model was also trained with the same input as SVM and 5-fold cross validation to avoid the chance of over fitting.

Performance tests
The performances of SVM models trained using different hyper-parameter settings were compared respectively for chromosomes 13/18/21. Generally, four types of SVM models were compared: RBF kernel without class weight, RBF kernel with class weight, linear kernel without class weight and linear kernel with class weight. Firstly for the data in Groups "N" and "P" on specified chromosomes, an internal validation was done using the model built based on these data themselves. Importantly, the trained models were applied to predict the data in Group "Unclassified", which was the most meaningful application in this study. As well, the models were applied to predict the data in Group "QC-filtered". Similarly, we tested the SVM models trained using different parameter settings. Performances of models trained using two kernel functions were compared. Also, model trained using class weight was compared to those trained without using class weight.
Secondly, other ML models such as LDA, QDA and Dtree, were tested in prediction of 13-/ 18-/21-trisomies on the three groups of datasets. The performances of other ML models were compared with the performances of SVM models mentioned above. We also conducted a comparison between SVM models with optimal parameters and Adaboost.
For visualization of performances of ML models, four ML models trained using feature D1 from formula (1) and feature D3 from formula (3) were tested in internal validation of chromosome 21. The two-dimension hyper-planes for discrimination were plotted using 'contour' in R package 'graphic'. We also visualized the performance of SVM models using three features (features D1, D3 and D7 as fetal fraction) in a 3-D plot and corresponding three 2-D plots.

Inaccuracy of one-Z-test approach in NIPT prediction
We employed all of six Z-tests from formulas (1) to (6) to demonstrate their distributions of Z scores on respectively chromosomes 13/18/21 in all QC-pass NIPT NGS data. As shown in Fig  2, none of Z-test could clearly distinguish positives from negatives. Using Z = 3 as cutoff, Z scores from formulas (1), (3) and (5) were able to identify all the true positives but a number of false positives existed especially formula (3). Using Z = -3 as cutoff, Z scores from formulas (2), (4) and (6) had both false positives and false negatives in discrimination. Though these six Z scores were significantly biased in distributions between positives and negatives ( Table 2, Wilcox Rank Sum test, P < 2.2×10 −16 ), the simple discriminating method that was based on a certain cutoff value would always give false positive or negatives in NIPT calling. Except these six Z scores, the other four indexes were not significantly biased in distribution between positives and negatives ( Table 2).
This result suggests that the one-Z-test based approach could not guarantee the prediction accuracy of NIPT NGS data because of the simplicity of discriminating rule. Hence the calling of NIPT demands a more comprehensive approach that could combine different vectors to improve the accuracy.

Performance of SVM models
As described in the pipeline (Fig 3), we trained the SVM models using the known datasets from Group "N" and "P" for chromosomes 13/18/21 respectively. Firstly, models were respectively trained using different hyper-parameter settings for chromosomes 13/18/21. To compare the performances between kernel functions, both linear and RBF kernel functions were employed to build the models. Further, parameter '-wi' was used to adjust C for class weight. Parameter optimization was employed to find the best C and g using a grid search method with 0.1 as step and 5-fold cross-validation (See Methods).
On one hand, the models with using RBF kernel performed better than the ones with linear kernel ( Table 3). The models with RBF kernel achieved 100% accuracy in both internal (Group "N" & "P") and external validation (Group "Unclassified") for all three specified chromosomes. However, the models with linear kernel did not predict the positives in Group "Unclassified" well. On the other hand, the models using class weight performed as good as those not using. It was because that the models not using class weight were sufficiently accurate in the datasets of this study (S2 Table). Taken together, the SVM models with using RBF kernel function and class weight were selected in the following analysis.
In internal validations, the SVM models predicted the training dataset with 100% accuracies on all three chromosomes. Importantly, as in external validation, the SVM models performed good in predicting the QC-pass data that used to be predicted as "Unclassified" (1.96 � Z score � 4) by Z-test. For chromosome 21, all 61 data in "grey zone" (1.3% of all QC-pass data) were accurately predicted using the training model, including four positives and fifty-seven negatives. It was noted that two of the four positives having Z score smaller than 3 (2.44 and 2.52 respectively), however, the SVM model could correct these false negatives. This result indicated that false negatives would be induced if only using Z score = 3 as discrimination cutoff that was employed by previous studies [3,24]. Fortunately, SVM model trained by known dataset could uncover such false negatives. For chromosome 18, all 48 data in "grey zone" (1.0% of all QC-pass data) were accurately predicted using the trained SVM model, including 4 positives and 44 negatives. For chromosome 13, all 42 data in "grey zone" (0.9% of all QC-pass data) were accurately predicted as "negative". In summary, all of the data that could not be classified using Z-test (nearly 3% of all QCpass data), were precisely predicted using the corresponding trained SVM models. This result suggested that SVM model could save around 3% of resource in retests. In consideration of millions of NIPTs were taken each year, such a reduction of cost is meaningful in clinical practice.
Surprisingly, the SVM models also acted effectively in predicting the 766 QC-filtered data (S2 Table). For chromosome 21, the model precisely predicted all the QC-filtered data, including 4 positives and 762 negatives. For chromosome 18, 763 out of 766 data were correct (99.61%). One positive was wrongly predicted as "negative" with prediction probability 64%, while two negatives were incorrectly predicted as "positive" with predicted probabilities 88.6% and 97.6% respectively. For chromosome 13, 765 out of 766 data were correct (99.87%). Only one positive data that was regarded as "negative" with Z score = 2.79, was also incorrectly predicted as "negative" by the SVM model. This demonstrated that the SVM model could perform well in most of QC-filtered data but could not uncover all false negatives, suggesting that quality control is still necessary to guarantee the accuracy of NIPT.

Comparison with other ML models
Compared with other ML models, SVM models performed relatively better in the datasets of this study. SVM models obtained 100% accuracy in both internal validation and prediction on "Unclassified" dataset (Table 4) Table 3. Performance of SVM models on NIPT prediction using different parameter setting.

Chr21
Group "N" & "P" Group "Unclassified" Four types of SVM models were compared in both internal and external validation for each of chromosome 13/18/21. e w means employing class weight to adjust parameter C; opt means employing optimization for parameters C and gamma in cross validation. f Sens. is short for sensitivity; Spec. is short for specificity.
https://doi.org/10.1371/journal.pone.0207840.t003 three chromosomes. We also compared Adaboost that employed Dtree in model training with SVM models in prediction on training dataset (S2 Fig). Both SVM models using different kernels have similar high accuracy with Adaboost in chromosome 21 and 18, while SVM models performed slightly better in chromosome 13. This result may indicate that SVM models performed comparatively well in NIPT prediction under this dataset. However, other ML models also have the potentials in improving calling. Like Adaboost employing Dtree to create a high accurate prediction rule can enhance the accuracy. Similarly, neuronal network could also employ a class of model to create a high accurate prediction rule. Table 4. Performance of different discrimination models on NIPT prediction using ten selected features.

SVM-based NIPT
We took chromosome 21 as an example to visualize how the SVM models worked (Fig 4). Using two out of ten features (D1 from formula (1) and D3 from formula (3)), nearly all of four ML models illustrated good discrimination lines on the training dataset (Groups "N" and "P"), except that LDA has one false negative. A 3-D plot and its three 2-D plots were also given to show how SVM model works in discriminating negatives and positives (S3 Fig), using feature D1, D3, and D7 (fetal fraction). These results were for visualization of how the ML models discriminated the data, whereas all of ten features were used in model training.

Correcting the previous false callings caused by Z-test
We also employed the trained SVM models to predict eight cases that were wrongly predicted by Z-test before. As shown in Table 5 (2), (4) and (6)) also significantly lower than -3. Similarly, four false negatives also showed ambiguous values of features D1 to D6, suggesting that none of these six Z scores were reliable to do prediction independently. This further demonstrated that the SVM model was better than the commonly used one-Z-test based approach.
In summary, SVM model has shown its potential in discrimination of NIPT results in this study, especially compared with the current one-Z-test based method. The SVM models using RBF kernel achieved 100% accuracies in trisomy detection of 13/18/21 in both internal and external validation. With this improvement, it is expected to reduce the cost of retests on samples in grey zone as well as the cost caused by false positives and false negatives. As shown in Fig 3, we expect that the SVM model could be further improved if 1) more known data were validated and added up to model-training; 2) more impacted features were discovered and added up to model-training. Some other clinical signs such as the values from serological test could be employed together with NIPT data to do prediction. In future, we plan to validate and optimize our ML method for trisomy prediction in a larger dataset.

Discussion
Due to the inaccuracy of serological testing and the potential harm of amniocentesis, NIPT is suggested to be adopted in nowadays' prenatal screening with purpose to detect 13-/18-/21-trisomies. In clinical practice, positive result of NIPT would be suggest to take amniocentesis while negative result does not require. Therefore, a false negative NIPT report would result in wrong diagnosis and delay of treatment, while a false positive NIPT report would require the patient to take an unnecessary amniocentesis with a risk of abortion. Some may argue that the current one-Z-test based NIPT prediction approach is precise enough, however, let us do a simple calculation: assume that 1,000,000 women come take NIPT and 1% of them have trisomy fetuses, which means there are 990,000 negatives and 10,000 positives; based on the accuracy of one-Z-test approach provided by Chiu et al. [3] (Sensitivity 97.9% and Specificity 99.7%), there would be 210 false negatives and 2970 false positives; this means 210 women would be wrongly diagnosed and give birth to trisomy fetuses and 2970 women would take an unnecessary amniocentesis with 12 of them would miscarry the normal fetus when taking amniocentesis. In fact, the number of newborn in China was 4 million in 2015 [30] but it is believed to increase in future since the implementation of two-child policy. Therefore we are motivated to improve the accuracy of prediction on NIPT NGS data, which the one-Z-test based approach could not satisfy.
Recently, many groups have noted that the one-Z-test approach cannot satisfy the accurate prediction on NIPT NGS data. Bayindir et al. supplemented a Meta Z-test, which means the Z test of Z score, in discrimination [31]. Yu et al. improved the count-based analysis by supplementing another size-based approach [32]. Using more cutoff values is a good way to ensure the prediction accuracy of positives and negatives, however it would also increase the number of unclassified samples and hence demand more retests. Further, the fetal fraction was a key  [24]. BGI's NIFTY employed a logarithmic likelihood odds ratio between binary hypotheses that took fetal fraction in consideration [14], but it still relies on a single-dimension cutoff to predict the result. Other information like maternal age is also important in NIPT NGS data prediction [33]. In this study we showed that ML method is a good way to solve the problems above. Combining multiple Z-tests and other features, the trained SVM models achieve extremely high prediction accuracy and decrease the number of unclassified data. In fact the enhancement is instantaneous as there is few steps in parameter optimizing. Both linear and RBF kernels can achieve same high accuracy in prediction. This suggests that positives and negatives have significant differences in the distributions of selected features. We also noted that other ML methods could achieve similar improvement. Since the effectiveness of ML depends on the selected features and dataset, it is uncertain that SVM definitely performs best in NIPT NGS data analysis. However we have achieved our goal of improving the NIPT prediction accuracy to an extremely high level by using SVM models. We also tested some other ML methods using the same features in this study. For LDA and QDA, collinearity between features could be one of reasons of lower accuracy in prediction, while SVM allows collinearity between features. Besides SVM, Adaboost also had high accuracy in prediction. It is worth to keep testing these machine-learning algorithms if there are more features and more data in future, since our objective is to find the best approach for clinical use. Temporarily, SVM showed the most robustness according to this study. For the ten features selected for current SVM model training, the four non-Z-score features actually were not significantly biased in distributions between negatives and positives, though IONA's paper reported that maternal age was useful in correcting its NIPT results [33], which might be due to the differences in sample composition.
In conclusion, we developed an accurate SVM-based approach and showed its potential in trisomy prediction on chromosomes 13/18/21. Compared with the one-Z-test approach, it has advantages in prediction accuracy and effectiveness, resulting in lower rate of false result and lower cost of retest. Other MLs could also improve the prediction accuracy on NIPT NGS data, and SVM is suggested according to this study. Such a ML approach could also have potential in detection of aneuploidy of other chromosomes or even micro-duplication and deletion, which would be included in our program if sufficient diagnosed cases were available. For further validating and optimizing our ML methods, we are planning to gather a larger and more comprehensive dataset in future. We suggest that the ML methods would be employed in NIPT prediction instead of the one-Z-test based approach in clinical practice.

S1 Fig. Z value distributions of current one-Z-test based NIPT in simulation.
Each of the three normal distributions were simulated by bootstrapping 10,000 times for negative samples (green line), positive samples with fetal fraction 5% (cyan line) and positive samples with fetal fraction 10% (red line) respectively. Yellow dash line means Z score equal to 3. Dark dash lines show the interval of grey zone. When fetal DNA fraction is around 5% that is possible to happen in real, it became difficult to distinguish positives and negatives from samples in grey zone.  Table. Detailed information of performance test of SVM models on NIPT prediction using different parameter setting. Column of Group "N"&"P" is the result of internal validation; column of Group "Unclassified" is the result of external validation for the QC-pass data that could not be classified by one-Z-test method; column of Group "QC-filtered" is the result of external validation for the QC-filtered data. The rows of SVM models using RBF kernel with class weight and optimal parameter setting are bold. (XLSX) S3 Table. Detailed information of performance test of different ML models on NIPT prediction. Column of Group "N"&"P" is the result of internal validation; column of Group "Unclassified" is the result of external validation for the QC-pass data that could not be classified by one-Z-test method; column of Group "QC-filtered" is the result of external validation for the QC-filtered data. The rows of SVM models are bold. (XLSX) Validation: Jianfeng Yang.