Genome Profiling (GP) Method Based Classification of Insects: Congruence with That of Classical Phenotype-Based One

Background Ribosomal RNAs have been widely used for identification and classification of species, and have produced data giving new insights into phylogenetic relationships. Recently, multilocus genotyping and even whole genome sequencing-based technologies have been adopted in ambitious comparative biology studies. However, such technologies are still far from routine-use in species classification studies due to their high costs in terms of labor, equipment and consumables. Methodology/Principal Findings Here, we describe a simple and powerful approach for species classification called genome profiling (GP). The GP method composed of random PCR, temperature gradient gel electrophoresis (TGGE) and computer-aided gel image processing is highly informative and less laborious. For demonstration, we classified 26 species of insects using GP and 18S rDNA-sequencing approaches. The GP method was found to give a better correspondence to the classical phenotype-based approach than did 18S rDNA sequencing employing a congruence value. To our surprise, use of a single probe in GP was sufficient to identify the relationships between the insect species, making this approach more straightforward. Conclusion/Significance The data gathered here, together with those of previous studies show that GP is a simple and powerful method that can be applied for actually universally identifying and classifying species. The current success supported our previous proposal that GP-based web database can be constructible and effective for the global identification/classification of species.

Criterion 2. If more than 50% elements of a cluster of the subject tree-A correspond to some of the elements of a single cluster belonging to the object tree B, a same yet discriminative name is assigned as C X for the cluster of the subject tree and C X ' for the cluster belonging to the tree B.

Criterion 3.
No name is given if ≤50% elements of a cluster of the subjective tree can find the corresponding elements in any cluster contained in the objective tree.
How to calculate cluster matching score (CMS): For C x cluster (criterion 1): CMS is the number of elements at level ℓ-1 contained in C x cluster which is at level ℓ. i.e., CMS Cx = Number of elements in C x cluster (1) If the cluster of interest contains the lower level elements than level ℓ-1, then those elements are scored with the reduction rate ; = = where ℓ, ℓ' and represent the level for the cluster of interest (of which CMS is calculated), level of the lower hierarchy cluster/element and the level difference, respectively. Therefore, in such case, CMS will be expressed as follows; CMS Cx = Σ each element in Cx cluster x (2) For C x ' cluster (criteria 2): When a sub-cluster is involved, the following consideration is made to calculate the CMS: As an m-membered sub-cluster within a cluster of n elements (n ≥ m) can appear in the number of states S (only single and same level elements are considered); = n C m = One possible way to consider this number of degeneracy in the calculation of CMS of C x ' cluster is to divide by this number S: When a higher level cluster is assigned as C x ' which is composed of single and/or cluster elements positioned at level ℓ-1 and/or lower than level ℓ-1, in such cases CMS of C x ' cluster will be expressed as follows; CMS Cx' = Σ each element in C x ' cluster x (4) Definition of Branch length and number: If a cluster positioned at level ℓ and contains elements with a length more than 1 (ℓ-(ℓ-1) = 1unit, the branch length corresponding to the difference in the level), then the length of that particular branch (>1 length, marked as red in Panel 1c) will be considered as; Branch length = where represent the level difference of a branch. If the > 4 of a branch, it has been ignored in this study for the convenience.

Definition of congruence value (V c ):
The definition of the congruence value is: where 0 ≤ V c ≤ 1 and V c ' : V c obtained after the coarse-graining of one partner of a pair of trees which is more finely structured. This can be done by bunching level different clusters under a bunching criterion such as compression of less than 15% height difference.

Case studies 3.1 Example 1 (pair of trees A and B):
Case π of Tree A as subject one; In Panel 1a, all of the possible clusters to be named are indicated with the star symbol (*). First of all, the most basic level clusters of the subject tree A (i.e., P 1 and P 2 ) are subjected to the cluster matching examination against the clusters of the objective tree B. Since the possible clusters, P 1 and P 4, contain exactly the same elements (e 1 and e 2 ), they are assigned to be the congruent clusters and named as C 1 (Panel 1b). On the contrary, P 2 in the subject tree A cannot find a corresponding cluster in the objective tree B and thus cannot be named (denoted as X in Panel 1b). Now, we can see the trees with clusters named as in Panel 1b. Next level 2 cluster is dealt: i.e., P 3 , which has the elements of e 3 , e 4 and C 1 (cluster-element). When the possible cluster P 3 is compared with the clusters in the objective tree B, P 5 is an only possible candidate for the congruence cluster. Since the P 5 cluster has the same elements, i.e., e 3 , e 4 and C 1 , both P 3 and P 5 can be named with the same name, C 2, as shown in Panel 1c.
Case (-π) of Tree B as subject one: In the second session, the relationship of the subject and object is inversed for Trees A and B, rendering Tree A object and Tree B subject as in Panel 1d. Now, there are two (q 1 and q 2 ) and three (q 3~q5 ) possible clusters in each tree. Now, the most basic level cluster in the subjective tree B is q 1 only. The q 1 is examined to find its corresponding cluster in the objective tree A, providing with the corresponding one, i.e., q 3 and thus assigned as C 1 for both clusters.
Next, the higher level cluster q 2 (here, it is the top level cluster) is subjected to the same examination using Panel 1e. The cluster q 2 can find the corresponding matching cluster of q 5 which also contains the same elements of e 3 , e 4 and C 1 as q 2 cluster does. Therefore, q 2 and q 5 can be named with the same name, C 2, as shown in Panel 1f. Let's count V c for all of the possible pairs.

CMS and the number of branches for the pair of Trees A and B:
For Case π (Panel 1c)

Example 2 (pair of Tree C and Tree D):
Case π of Tree C as subject one; In Panel 2a, there are three (P 1~P3 ) and two (P 4~P5 ) possible corresponding clusters in this case.
In the first step, the most basic level clusters of the subject tree C (i.e., P 1 and P 2 ) are subjected to the cluster matching examination against the clusters of the objective tree D. P 1 of the subject tree contains two elements, e 1 and e 3, and P 4 of the object tree also contains e 1 and e 3 along with e 2 at the same cluster level. Therefore, P 1 & P 4 can be assigned as C 1 and C 1 ' based on criteria 2, respectively (Panel 2b). On the contrary, P 2 in the subject tree C cannot find a corresponding cluster in the objective tree D and thus cannot be named (denoted as X (extra naming) in Panel 2b).
The second step deals with the higher level cluster of subjective tree C, i.e., P 3 , which has the elements of e 2 , e 4 and C 1 (cluster-element). When the cluster P 3 is compared with the clusters in the objective tree D, P 5 is an only possible candidate for the congruence cluster. P 5 cluster is composed of elements, e 2 , e 4 and C 1 ' and thus based on criteria 2, P 3 and P 5 can be named with the name C 2 and C 2 ', respectively as shown in Panel 2c.

Case (-π) of Tree D as subject one;
In the second session, the relationship of the subject and object is inversed, rendering the tree C object and the tree D subject as in Panel 2d. Now, there are two (q 1 and q 2 ) and three (q 3~q5 ) possible clusters in each tree, respectively. Then, the most basic level cluster in the subjective tree D is q 1 only, containing three elements, e 1 , e 2 and e 3 . The cluster q 1 is examined to find its corresponding cluster in the objective tree C and can find q 3 (containing two elements, e 1 and e 3 ), which is more than 50% of the number of q 1 elements (three). Therefore, based on the assigning criteria 2, q 1 and q 3 can be assigned as C 1 and C 1 ', respectively (Panel 2e). Next, the higher level cluster q 2 is subjected to the same examination using Panel 2e. The cluster q 2 (having the elements of C 1 and e 4 ) has a chance for finding the corresponding matching cluster of q 5 which contains the elements of e 2 , e 4 and C 1 '. In this case, e 4 representing as a common element for both trees and C 1 ' of object tree representing the more than 50% of C 1 cluster elements. Therefore, as a whole q 5 cluster has >50% elements of q 2 cluster of the subject tree and thus q 2 and q 5 can be assigned as C 2 and C 2 ', respectively (Panel 2f).

CMS and the number of branches for the pair of Trees C and D:
For Case π (Panel 2c) In panel 3a, there are five (P 1~P5 ) and three (P 6~P8 ) possible corresponding clusters in this case. Following the same algorithm as adopted in Cases 1 and 2, clusters C 1 ~ C 3 and C 1 ', C 2 ' and C 3 ' can be assigned as shown in Panel 3b.

Case (-π) of Tree F as subject one;
In the second session where the relationship of the subject and object is inversed, the named clusters C 1 ~ C 3 and C 1 ', C 2 ' and C 3 ' can be assigned as shown in Panel 3d.

CMS and the number of branches for the pair of Trees E and F:
For Case π (Panel 3b)

Example 4 (pair of Tree G and Tree H): Case π of Tree G as subject one;
Case (-π) of Tree H as subject one;

Example 5 (pair of Tree I and Tree J): Case π of Tree I as subject one;
Case (-π) of Tree J as subject one;

Example 6 (pair of Tree K and Tree L):
Case π of Tree K as subject one; Case (-π) of Tree L as subject one;

CMS and the number of branches for the pair of Trees K and L:
For Case π (Panel 6a) Based on criterion 3, no name is given as ≤50% elements of the subjective tree found the corresponding cluster contained in the objective tree.  Fig. 3) trees.

CMS and the number of branches for the pair of phenotype and GP-based (Fig. 3) trees:
For Case π;