Figure 1.
Barcodes of five representative human viruses: (a) HIV, (b) Enterovirus, (c) Rabies virus, (d) SARS Coronavirus, and (e) Hepatitis B virus.
For each barcode, the x-axis is the list of all unique combinations of 4-mers arranged in the alphabetical order, the y-axis is same kind of virus joint genome axis contracted by 2,000 fold, and the gray level shows the frequency of each k-mer within a 2,000 bp window in the corresponding location.
Figure 2.
Identification and typing of HEVs.
The x-axis for each plot is the distance between the feature vector of each virus' barcode and the average feature vector of all the viruses we used (in A we used the average feature vector of four kinds of virus: HIV, HEV, SARS and rabies virus; in B we used the average feature vector of all subtypes of HEV), and the y-axis is the distance between the feature vector of each virus' barcode and a normalization vector with value = 1/136 for each of its dimensions, where 136 is the total of number of unique k-mers (paired with its reverse complement [32]). (A): the red dots represent HEVs (395 genomes), the blue ones for HIV (279 genomes), the magenta ones for SARS coronavirus (101 genomes), and the cyan ones for rabies virus genomes (63 genomes). (B): the blue dots represent poliovirus (78 genomes), the green ones for echovirus (52 genomes), the red ones for new virus strain enterovirus 68-71 (72 genomes), and the magenta ones for coxsackievirus A and B group genomes (85 genomes).
Figure 3.
Phylogenetic trees for the HEVs based on a specific HEV gene (A) versus the HEV barcodes (B).
The edge lengths in the trees reflect the genetic distance calculated according to the Kimura-parameter model. The VP1-based tree's reliability was estimated using 1,000 bootstrap replications. The serotype names beside the trees denote what the serotypes of the species having HEVs.
Table 1.
Comparison of one gene-based and whole genome barcode-based phylogenetic trees (the numbers inside parentheses are the number of virus types for the corresponding serotype).
Table 2.
Information of five classes of viruses' complete genome sequences.