Fig 1.
FACS profiles of cells derived from D. japonica and S. mediterranea.
Fluorescence-activated cell sorting (FACS) profiles of cells derived from the whole body of D. japonica SSP-strain (Dj-SSP) and S. mediterranea CIW4-strain (Sm-CIW4). The number of cells analyzed was 9,104 and 11,588, respectively. Each dot indicates the relative fluorescence intensity of Calcein AM and Hoechst 33342, and red color indicates a relatively high population of cells. Calcein AM labels the cytosol of viable cells, and its intensity is plotted on a logarithmic scale on the X-axis. Hoechst 33342 labels chromosomes, and is used to estimate the genome size, and its intensity is plotted on a linear scale on the Y-axis.
Fig 2.
k-mer spectra of D. japonica and control genome.
(A) shows the control results for S. venezuelensis, which is known to be highly heterozygous (0.927%), and showed a bimodal peak consistent with its genome characteristics. (B) shows the results for D. japonica, and neither a monomodal nor a bimodal peak was found.
Table 1.
Information about transcriptome sequences.
Table 2.
Statistics of transcriptome assembly.
Fig 3.
Schematic overview of the Reference Gene Model.
Alternative splicing produces multiple transcript variants of mRNA isoforms from a single gene locus. Because the transcriptome sequences are derived from each isoform, the reads of a common exon map to multiple variants of the reference during the mapping process. To solve this multi-mapping problem, the assembly contigs are combined according to the information of the contig-graph that is constructed by the Newbler program in the process of de novo assembly. The combined contigs, which are called the Reference Gene Model, display the virtual genome sequence without introns. By mapping the MiSeq RNA-seq reads to this model, it becomes possible to count reads, detect SNPs (single nucleotide polymorphisms), and analyze mutations quantitatively in the unit of a gene. Variant-derived sequences, which do not have specific exons, are mapped to the model using a local alignment algorithm to overcome exon gaps.
Fig 4.
(A) Histogram of the heterozygosity rate for SNPs. Typically, SNPs with variant ratio close to 1.0 are considered homozygous SNPs, and SNPs with a narrow normal distribution that peaks at 0.5 are considered heterozygous SNPs. (B) Histogram of SNP number per gene per 1000 bp.
Table 3.
Summary of SNP analysis.
Fig 5.
Pattern of SNP frequency and commonality among sequencing platforms.
Alignments of MiSeq, 454, Sanger and genome reads against the Reference Gene Model. The colored vertical bars indicate the site of a SNP, and the Y-axis indicates read depth. (A) shows genes with many SNPs, and the top-right image is an enlarged view of a SNP-rich region. SNPs common between MiSeq and 454 are detected for both types of genes. (B) shows genes with no SNPs. Regarding EST sequencing reads produced by the Sanger method, some SNPs are common between MiSeq and 454 (C) but are not detected by Sanger sequencing (D).
Table 4.
Analysis of synonymous and non-synonymous SNPs of ORF sequences.
Fig 6.
KOG-annotation-based classification of genes that have extremely large numbers of mutations.
The genes without any SNPs and the genes with 20 or more SNPs per 1,000 bp were classified regarding KOG function. The heat plot shows the log2 SNP ≥20 / no-SNP ratio, with red indicating a high proportion of genes that had an extremely large number of mutations and green indicating that the majority of genes had no SNPs. * indicates a fraction that contained only one of these two types of genes. The right column shows the data of conserved proteins and identical proteins from comparison with a different planarian species (S. mediterranea). The definitions are derived from the identical match ratio calculation using the amino-acid region conserved between homologous proteins.