Comparative analysis of novel MGISEQ-2000 sequencing platform vs Illumina HiSeq 2500 for whole-genome sequencing

doi:10.1371/journal.pone.0230301

Table 1.

Summary of the dataset.

More »

Expand

Fig 1.

Post-filtering data quality control.

(A), (B) Distribution of nucleotide quality parameters across reads. The presented data is for both MGISEQ-2000 (A) and HiSeq 2500 (B) platforms for forward (R1) and reverse (R2) reads, respectively. For each position in the reads, the quality scores of all reads were used to calculate the mean, median, and quantile values; therefore, the box plot can be shown. Overall quality score distribution for MGISEQ-2000 and HiSeq 2500 data (C). Distribution GC-content in the data generated by MGISEQ-2000 and HiSeq 2500 (D). FastQC [15] was used for the analysis.

More »

Expand

Fig 2.

Analysis of the coverage distribution for MGISEQ-2000 and HiSeq 2500 with the use of the E704 sample.

(A) A fraction of genome covered appropriate number of times. (B) A fraction of genome covered not less than the corresponding number of times. The analysis was performed using the R [17] and BEDtools [18] software packages.

More »

Expand

Fig 3.

The results of the QC analysis of read alignment to the reference genome.

(A) The distribution of insert length values between reads of the E704-I library (blue line) and the E704-M library (red line). (B) The number of random errors for HiSeq 2500 (blue line) and MGISEQ-2000 (red line). The alignment algorithm used is BWA-MEM [19]. QC analysis was performed using bamstats [20, 21].

More »

Expand

Table 2.

Mapping statistics for the datasets.

More »

Expand

Fig 4.

The total number of “errors” (the sum of “FP” and “FN”) for SNPs (“total SNP error”) and indels (“total indel Error”) detection that occurred in the course of genomic variants comparison of E704-M (A) and E704-I (B). Four software packages were used for variant calling: Samtool, Strelka2, Sentieon, and GATK. Baseline data is shown in the S2 File.

More »

Expand

Table 3.

Variant calling statistics for the datasets^{^a}.

More »

Expand

Table 4.

Variant calling for E704-M versus E704-I.

More »

Expand