Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data

doi:10.1371/journal.pone.0161583

Fig 1.

The workflow of the method consisting of four main steps.

1) Pick clones and combine them to pools, and sequence the pools’ DNA by NGS; 2) Resolve each clone’s F-sets; 3) Make a physical map with the clones’ FS-sets and split physical contigs into bins according to the clones’ overlap and K-sets; 4) Assemble pools’ NGS reads into sequence contigs, allocate sequence contigs to the physical map, and connect the allocated sequences to form longer sequence scaffolds.

More »

Expand

Fig 2.

Detection rates of F-sets.

The abscissa axis represents sequencing depth, and the vertical axis represents detection rate. “FS” means feature sequence, “KMER” means k-mer, and “DIM” means the pools’ dimensions. (A) and (B) were simulated using a random pooling strategy. (C) and (D) were simulated using a solid pooling strategy. (A) and (C) were simulated with sequencing errors. (B) and (D) were simulated without sequencing errors. Curves with squares indicate the trend of the detection rate of FS. Curves with triangles represent the trend of the detection rate of k-mer. Red curves represent pools. Green, blue and purple curves represent clones in 3D, 6D and 9D pooling, respectively.

More »

Expand

Fig 3.

Detection and correct rates of the intersected and final F-set.

The abscissa axis represents pool coverage, and the vertical axis represents detection rate or correct rate. (A) and (B) were simulated for FS. (C) and (D) were simulated for k-mer. (A) and (C) were statistics from clones’ intersected F-sets. (B) and (D) were statistics from clones’ final F-sets. Curves with squares show the trend of detection rate. Curves with triangles show the trend of correct rate. Green, blue and purple curves represent clones in 3D, 6D and 9D pooling, respectively.

More »

Expand

Fig 4.

Clone usage at different levels of pool coverage.

The abscissa axis represents pool coverage, and the vertical axis represents clone usage.

More »

Expand

Fig 5.

Splitting a contig into bins according to the clones’ order in the physical contig.

The k-mer set of each bin was derived from the intersections or differences among overlapping clones. Assembled sequences were allocated to the best bins. Paired-end sequences were used to connect assembled sequences located at the same or nearby bins to form larger sequences. The orientation of connected sequences was determined by the sequence loci and the directions of paired-end alignments. The red blocks labeled with the prefix “seq” indicate assembled sequences from the long-read assembler. The prefix “PE” labels the paired-ends. The symbol “+” or “-” indicates the directions of paired-end alignments.

More »

Expand

Fig 6.

Comparison between the original and simulated genome sequences.

“TAIR_Test” represents the simulated sequence and “TAIR10” represents the original genome sequence. “M” means mitochondria and “C” means chloroplast. (A) Circle view of the alignments. The upper semi-circle displays the chromosomes of “TAIR-Test” and the lower semi-circle displays the chromosomes of “TAIR10”. (B) Dot-plot view of all chromosomes. (C) Full view of the alignments of chromosome 1. (D) Segment detail of the alignments of chromosome 1.

More »

Expand