Complexity of the Mycoplasma fermentans M64 Genome and Metabolic Essentiality and Diversity among Mycoplasmas

Recently, the genomes of two Mycoplasma fermentans strains, namely M64 and JER, have been completely sequenced. Gross comparison indicated that the genome of M64 is significantly bigger than the other strain and the difference is mainly contributed by the repetitive sequences including seven families of simple and complex transposable elements ranging from 973 to 23,778 bps. Analysis of these repeats resulted in the identification of a new distinct family of Integrative Conjugal Elements of M. fermentans, designated as ICEF-III. Using the concept of “reaction connectivity”, the metabolic capabilities in M. fermentans manifested by the complete and partial connected biomodules were revealed. A comparison of the reported M. pulmonis, M. arthritidis, M. genitalium, B. subtilis, and E. coli essential genes and the genes predicted from the M64 genome indicated that more than 73% of the Mycoplasmas essential genes are preserved in M. fermentans. Further examination of the highly and partly connected reactions by a novel combinatorial phylogenetic tree, metabolic network, and essential gene analysis indicated that some of the pathways (e.g. purine and pyrimidine metabolisms) with partial connected reactions may be important for the conversions of intermediate metabolites. Taken together, in light of systems and network analyses, the diversity among the Mycoplasma species was manifested on the variations of their limited metabolic abilities during evolution.


Introduction
Mycoplasma, a member of the class Mollicutes, is a genus lacking a rigid bacterial cell wall. A number of species of this genus are medically and agriculturally important. Extensive genetic and genomic investigations had been carried out to shed light on their biology and pathogenicity [1,2,3,4,5,6,7,8,9,10]. In addition, Mycoplasmas are genetically simple bacteria with drastically reduced genomes and thus are interesting to study because of the presumably limited but crucial metabolic capabilities and biological activities. To date the genomes of at least twenty species, with sizes ranging from 0.58 (M. genitalium) to 1.36 Mbp (M. penetrans), have been completely determined [4,5,6,7,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24] (http://www.ncbi.nlm. nih.gov/genomes/lproks.cgi). These sequences contained the basic and crucial information for better understanding of this interesting group of microorganisms.
In spite of intensive basic and clinical research on M. fermentans in recent years, its role and involved molecular mechanism in HIV pathogenesis, sexually-transmitted genital tract infection, systemic infection, rheumatic disorders, chronic fatigue syndrome and other diseases has remained elusive [25,26,27,28,29,30,31,32,33,34,35]. M. fermentans is a fastidious microorganism isolated or detected commonly from human genitourinary and respiratory tracts [36,37] and correlated with diseases in both healthy individuals and AIDS patients. It was first described several decades before its detection in patients with AIDS in the late 1980s and was considered an opportunistic pathogen or a sexually transmissible cofactor contributing to the pathology and pathogenesis of HIV-associated diseases [38]. The detection of this species in the peripheral blood lymphocytes and urine of AIDS patients in a previous study suggested that it might have the ability to act as polyclonal activators of both T and B lymphocytes to stimulate the replication of HIV. It is thought to behave as a cofactor or immunomodulator in HIV-related diseases [39]. M. fermentans might be a systemic pathogen, causing fatal disease owing to the infection of the bone marrow in non-HIV-infected patients [40]. It may actively invade cultured cells such as HeLa cells and survive as an intracellular pathogen [41]. They had also been found to be present intracellularly or adhered to cell surface [42]. On the other hand, M. fermentans may play a critical role in genital tract infection and rheumatic disorders such as rheumatoid arthritis [43,44]. In addition, several studies have been conducted to elucidate the relationship of Mycoplasma and chronic fatigue syndrome (CFS) [45]. These studies showed Mycoplasma species including M. fermentans, M. hominis, and M. penetrans could be detected in the patients with CFS, however no solid evidence that these organisms act as a cause of CFS has been reported. In conclusion, M. fermentans is a clinically interesting Mycoplasma species and recent studies had brought the attention towards its possible involvement in several critical human diseases.
Three types of transposable elements including insertion sequences (IS), ICEFs (integrative conjugal element of M. fermentans), and prophage had been found in some M. fermentans strains [46,47,48,49]. These elements had a wide range of size distribution ranging from 973 to 23,778 bp and some were present as multiple copies. For instance, nine copies of IS1630, which encoded an IS30 family like transposase and had diverse target site specificity, have been detected in M. fermentans [47]. IS1550, also known as ISMi1, had been demonstrated to be a possibly active element in E. coli. In addition, IS1550 had been found to be integrated into IS1630 and be inserted by the large transposable element ICEF [46,47,48,49]. ICEF is similar to conjugative selftransmissible integrating elements (constins), but due to the absence of the homologs of integrases, transposases or recombinases, it is considered to be distinct from typical constins in the mechanism of integration-excision. Moreover, the approximately 16-kb M. fermentans prophage WMFV1 might be integrated as single or multiple copies in the genome [49]. Taken together, the presence of these elements might be important for the plasticity of the genome and evolution of M. fermentans [49,50,51].
To increase our understandings of the biology of M. fermentans, which might be helpful for revealing its pathogenic roles in the suspected diseases, a comprehensive comparative genomic analysis was performed on M. fermentans strains M64 and JER and the other Mycoplasmas. We examined the DNA sequences which contributed to the dramatic difference in size between the M64 and JER genomes. The metabolic abilities of M. fermentans M64 were analyzed by a systems analysis method which was based on the evaluation reaction connectivity. Finally, an integrated analysis of a phylogenetic tree, metabolic network, and essential genes was carried out to uncover the essentiality and diversity of metabolic reactions in M. fermentans M64 during evolution.

M. fermentans M64 Harbors a Large Number of Transposable Elements
Similar to Mycoplasma mycoides subsp. mycoides [16], sequence analyses indicated that the M. fermentans M64 genome also possessed a high density of transposable elements (Figures S1 and S2) [52]. Nine copies of two types of large repetitive sequences and many copies of relative small insertion sequence (IS) elements accounted for 21.6% of the genome (Table 1 and Figure 1) [46,47,48,49]. Among the large repeats were three families of ICEF (22.3 to 23.8 kb) which included four copies of the previously sequenced ICEF-1A (23.8 kb) and the partly characterized ICEF-IIA, B, and C (22.3 kb), and two complete (20.2 and 20.8 kb) and one partial (20.7 kb; 52 bp truncation at the 39-end) copies of ICEF-III (ICEF-IIIA and B) belonging to a new family discovered in our recent study [52]. Two copies (20.2 and 20.8 kb) of ICEF-III were complete, whereas another one was truncated by IS1550A. In addition, there were two copies of WMFV1 prophage DNAs (15.8 and 15.6 kb) inserted in the chromosome. A total of 15 complete and 4 incomplete copies of three families of IS elements including eight IS1550 (973 to 1,416 bp; 6 complete copies, one copy interrupted by ICEF-IIA, and another copy might have been truncated by the transposition of ICEF-IA), seven IS1630 (1,178 to 1,384 bp; one interrupted by ICEF-IIIA), and four ISMf1 (1,395 to 1,570 bp; three complete and one incomplete elements) were identified.

M. fermentans strains M64 and JER had Significant Difference in Large Transposable Elements
Sequence comparison indicated that the genome of M. fermentans M64 (1,118,751 bp) [52] is 141 and 115 kb larger than that of the JER strain (977,524 bp) [21] and the nearly complete PG18 strain (1,004,014 bp; Accession no. AP009608) [53]. The differences are mainly contributed by the differences between the copy numbers of ICEF and WMFV1 prophage DNAs in these genomes ( Figure 2). Notably, there are still gaps of unknown size and potential assembly problems in the PG18 genome, thus the actual difference between it and M64 genome remains to be accurately demonstrated. The M. fermentans JER genome essentially lacked all of the previously identified ICEF elements and WMFV1 prophage DNA but contained two partial copies of the newly identified ICEF-III elements [48,49]. Regardless of these differences, the organization of the M64 and JER genome sequence were largely similar. On the other hand, at least two apparently complete ICEFs and one WMFV1 prophage DNA were found within the almost complete PG18 genome. Nonetheless, expectably, the organization of the PG18 genome is considerably different from that of the M64 possibly due to the gaps and sequence arrangement problems in an incomplete genome.
Analyses of the integrity and flanking sequences of the IS elements and ICEFs suggested that the current M. fermentans M64 genome architecture at least in part might result from the transposition and/or recombination events of the elements in different periods of time. Almost all of the elements, except for members of IS1550A-1, -2 and -3 subfamilies, are polymorphic in DNA sequences. As shown in Figure 1, the four ICEFs (ICEF-I, -IIA, -IIB, and -IIC) are terminated with 24-bp imperfect inverted repeats and flanked by distinct 8-bp target site duplications. The new ICEF-IIIs, except for the incomplete ICEF-IIIB which is disrupted by IS1550A, are all terminated with 23-bp imperfect inverted repeats and flanked by 8-bp target site duplications. Sequence comparison of ICEF-IA, IIA, and IIIA and phylogenetic analysis of 11 intact ICEs (Integrative Conjugal Elements) in six Mycoplasma species indicated that ICEF-III is distinct from the other two ICEF families in M. fermentans and unique in all reported ICEs ( Figure S3). Both copies of WMFV1 prophage genomes are terminated with 4-bp inverted repeats and flanked by identical 6bp (TTTTTA) target site duplications, suggesting that this could be the preferred transposition target sequence, as reported previously [49]. The seven complete IS1550 are terminated with 29-bp imperfect inverted repeats and four of them are flanked by 3-bp direct repeats. The seven complete IS1630 are terminated with 27-bp heterogeneous inverted repeats and four of these elements are flanked by 16-, 23-, 25-, and 26-bp duplicated junction sequences. The three complete copies of ISMf1 contain 47-bp imperfect terminal inverted repeat but none of them have noticeable target site duplication. The absence of target site duplication in some of these elements suggested that posttransposition DNA rearrangements or additional transposition events might have occurred prior to strain evolution since those families of transposable elements, except for WMFV1 prophage, are present in M. fermentans strains M64, JER, and PG18. The high frequency of DNA rearrangements in M. fermentans has been shown by Hu et al. in a previous study [51].

Pan-genome Analysis Across the M64, JER, and PG18 Strains
To unveil the variations in gene content among M. fermentans M64, JER, and PG18, sequence similarity searches across all predicted proteins were carried out by using BLASTP to dissect the gene repertoire in these strains. The homologs were recognized under the criterion of E-value ,10 25 . Due to differential activities of the events of gene duplication and/or horizontal gene transfer, the number of genes shared by any two or present in all three strains might be different. For instance, the gene MfeM64YM_0098 of M64 had a total of 8 homologs (MfeM64YM_0098, Mfe-M64YM_0108, MfeM64YM_0229, MfeM64YM_0238, Mfe-M64YM_0385, MfeM64YM_0474, MfeM64YM_0483, and Mfe-M64YM_0992), whereas PG18 and JER, respectively, had only 4 (MBIO_0117, MBIO_0267, MBIO_0276, and MBIO_0750) and 5 (MFE_08000, MFE_02760, MFE_02860, MFE_04903, and MFE_04960) gene homologs. Thus, pan-genome analysis exhibited 762,837 genes constituting the core genome of M. fermentans strains M64 (837), JER (762), and PG18 (809) ( Figure 3B). In agreement with the phylogenetic relationship among these three strains ( Figure 3A), more gene homologs are shared by M64 (955) and PG18 (872) than M64 (849) and JER (774), corroborating the closer relationship between M64 and PG18 obtained from the analysis of the concatenated sequence of dnaE, pyk, and rpoA ( Figure 3B, Table S1). Expectably, M64 with the largest chromosome rich in mobile elements possesses much more strain specific genes (83 genes) than JER (17) and PG18 (11). Further sequence analysis indicated that most strain specific genes encoded proteins without annotated function. Based on the annotations in the KEGG Orthology database, the gene pool in the core genome (762,837 genes) contained 378 well-annotated orthologous groups of genes with approximately half belonging to the genetic information processing biomodules ( Figure 3C).

M. fermentans M64 Metabolic Capabilities Revealed by Reaction Connectivity-Based Analyses
Global analyses of M. fermentans M64 metabolism indicated only 131 of the 1,050 predicted proteins mapped to the metabolic pathways in KEGG database (KEGG Mapper; http://www. genome.jp/kegg/tool/map_pathway1.html). Apparently, it had a relatively large number of proteins in carbohydrate (57 proteins) and nucleotide (31 proteins) metabolic networks, and a scattered distribution of proteins in amino acid (25 proteins), cofactors and vitamin (18 proteins), and lipid (8 proteins) metabolic networks ( Figure S4). A gross examination of the reactions with predicted proteins in the global metabolic network immediately perceived that M. fermentans could only synthesize or degrade a limited number of the metabolites as established for other Mycoplasma species.
Evaluation of the functionality of the 58 pathways with M64 proteins via the examination of connectivity of reactions [54] found only nine pathways contained five or more connected reactions ( Table 2). Several carbohydrate metabolic pathways seemed to have sufficient connected reactions to drive the synthesis or degradation of the pertinent metabolites. The presence of a complete glycolysis/gluconeogenesis pathway (18 connected reactions) suggested that M. fermentans M64 is capable of catabolizing D-glucose to pyruvate using the Embden-Meyerhoff-Parnas (EMP) pathway. The pyruvate may then be converted to acetyl-CoA, acetate, or D-lactate with the pyruvate metabolism pathway which contained a single cluster of nine connected reactions driven by ten predicted proteins. Due to the lack of citrate cycle enzymes, it may use D-glucose by fermentative degradation. On the other hand, the pentose phosphate pathway, which contained a large cluster of 13 and a small cluster of two connected reactions, may be utilized to synthesize phosphoribosyl diphosphate (PRPP) that may then enter the purine (11 connected reactions) and pyrimidine (10 reactions) metabolism pathways for nucleotide synthesis. Although only 6 genes were mapped to the fructose and mannose metabolism pathway, these encoded proteins form a cluster of six connected reactions to drive the conversion of D-fructose to glyceraldehyde-3-phosphate, which may then enter the glycolysis pathway for further degradation. Neither the purine nor the pyrimidine pathways for de novo nucleotides synthesis seemed to be complete and thus M. fermentans M64 must rely on external supply of nucleotides or nucleotide synthesis intermediates to produce the building blocks of DNA and RNA (Table 2 and Figure S5). Apparently, M. fermentans M64 possesses the enzymes to form subnetworks of connected reactions between the RNA synthesis and the metabolism of (i) guanosine, guanine, GMP, GDP, and GTP, (ii) adenosine, adenine, AMP, ADP, and ATP, and (iii) UTP and CTP; and between DNA synthesis and the metabolisms of (iv) guanosine, guanine, deoxyguanosine, dGMP, dGDP, and dGTP, and (v) dAMP, dADP, and dATP.
M. fermentans M64, which is similar to the other Mycoplasma species, had very few amino acid metabolism proteins predicted from the genome. Only two of the amino acid metabolic pathways had connected reactions -the ''arginine and praline'' and ''cysteine and methionine'' pathways had clusters of 3 and 2, respectively, reactions. The presence of arginine deiminase (ArcA), ornithine carbamoyltransferase (ArcB), and carbamate kinase (ArcC) formed a cluster of reactions catalyzing the degradation of arginine to ammonia in the ''arginine and proline metabolism'' is in agreement with the previous study in which M. fermentans was shown to be capable of utilizing arginine (62). Notwithstanding some of the proteins did not form highly connected reaction clusters, they might still be important for the survival of the microorganism. For instance, the proline iminopeptidase (Pip) may release proline from peptide and the aspartate-ammonia ligase (AsnA) catalyzes the interconversion of aspartate and asparagine. In addition, S-adenosylmethionine synthetase (MetK) is an indispensable enzyme for the conversion of methionine to Sadenosyl-L-methionine, which can then be used as the substrate of cytosine-specific methyltransferase (Dcm) for DNA, and rRNA methylation. Therefore, the overall network and connectivity analysis suggested that M. fermentans M64 must rely on external supply of all amino acids for protein synthesis.

Conservation of Genes and Biomodules among the Mycoplasmas
Clustering analysis of the proteins from 27 sequenced Mycoplasma and Phytoplasma species indicated that most genes responsible for DNA replication, nucleotide excision, homologous recombination, transcription, and translation and a significant fraction of the membrane transporters including some of the ABC transporters and members of protein transport and bacterial secretion system are highly conserved among these species (Figure 4). The genes related to carbohydrate metabolism are conserved in almost all species except for M. haemofelis, M. suis, M. arthritidis, and M. hominis. Several genes accounting for pyruvate metabolism and other carbohydrate metabolisms were lost in them, such as pdhA, pdhB, pdhC, and pdhD. Moreover, distinct patterns of other metabolisms could also be observed for M. haemofelis and M. suis. Intriguingly, M. haemofelis and M. suis are hemotropic Mycoplasmas. They belong to a special group of Mycoplasmas, also known as hemoplasma, with a tropism for red blood cells [55,56]. This distinctive propensity may be associated with the evolution of the pool of metabolism-and cellular processrelated genes. M. arthritidis and M. hominis, whose hosts are rodent and human respectively, also form a unique phylogenetic clade, referred to as the M. hominis cluster, in Mycoplasmas. They both cannot carry out the glucose fermentation but capable of arginine hydrolysis [57,58]. All of the Mycoplasma species exhibited conserved energy metabolism-related genes including genes encoding NADH dehydrogenase and F-type ATPase aside from cytochrome-containing complex, indicating a truncated electron . Venn diagram exhibits the number (including duplications) of genes unique to each strain or shared by these strains (BLASTP E-value ,10 25 ). Superscripts represent the name initials of the strains which show the indicated numbers of genes with homologies to other strains. The numbers of genes in the nearly complete PG18 genome may be slightly underestimated due to the presence of gaps and potential unresolved duplications in genome assembly. C. Functional distribution of 378 orthologous groups of genes in the core genome. Some of the proteins have more than one KEGG pathway/biomodule annotation. doi:10.1371/journal.pone.0032940.g003 transport chain. Thus, the ATP generation proceeds via inefficient substrate-level phosphorylation in the flavin-terminated respiratory pathway rather than oxidative phosphorylation [59,60,61]. In addition, the pentose phosphate pathway, an alternative to glycolysis, may generate NADPH for reductive biosynthesis reactions in Mycoplasma species. The presence of only a limited number of recognizable genes encoding the proteins in the biomodules for metabolisms of most carbohydrates, amino acids, and cofactors and vitamins are conserved, suggesting that most, if not all, of these metabolic pathways are defective unless there are cryptic genes which can carry out the missed function [62]. This observation is in agreement with the requirements of a wide spectrum of substrates and factors for the growth of Mycoplasma.
The analysis of the proteins involved in genetic information processing unambiguously indicated that they are widely distributed among the members of the sequenced Mycoplasmataceae family ( Figure 4). Interestingly, the nucleotide metabolism and genetic information processing deservedly exhibited highly conserved profiles of involved genes, whereas M. haemofelis and M. suis seemed to be deficient in some genes. The DNA polymerase III holoenzyme of M. fermentans and the other Mycoplasma species resembles the Gram (+) type and is simpler than that of the Gram (-) model organism [63]. The Mycoplasma species polymerase III holoenzymes are made up of the same subunits as that of the Gram (+) Bacillus subtilis which comprises the a, b, c, t, d, and d9 subunits. Although the gene encoding the e subunit with 39-59 exonuclease activity could not be found in their genomes, the structure of the Mycoplasmas a subunit protein resembles the PolC-type (Gram (+) type) a subunit, which contains both the DnaE (a subunit) and DnaQ (e subunit) domains with the DNA polymerase and 39-59 exonuclease, respectively, activities suggesting a B. subtilis-like mechanism of DNA replication.
The DNA-directed RNA polymerase a, b, and b9 subunits are sufficient to form a minimal core (bb9a2) of RNA polymerase for catalyzing the polymerization of nucleoside triphosphates into RNA [64]. The RNA polymerase primary s factor, which promotes the attachment of RNA polymerase to a specific site in the promoter for transcription initiation, is highly conserved among the sequenced Mycoplasma species. In addition to these subunits, the d subunit which is ubiquitous among the Gram (+) bacteria were also found to be conserved in some of the Mycoplasma  subunit is involved in both the initiation and recycling phases of transcription, and may also serve as a virulence factor [65]. The specificity of transcription and the efficiency of RNA synthesis are increased in the presence of the d subunit because of enhanced recycling. Previous studies reported that mutation of the d subunit gene would result in extended lag phase growth [66] and a defect in starvation-induced stationary-phase survival or recovery [67]. This subunit does not exist in all Mycoplasma species but is likely responsible for the complex regulation and flexibility of gene expression in those Mycoplasma species bearing the d subunit genes. The aminoacy-tRNA synthetases and ribosomal genes also have a high degree of conservation among the sequenced Mycoplasmas. Similar to the Gram (+) B. subtilis, Mycoplasmas do not possess glutaminyl-tRNA synthetase. They may take a similar strategy as B. subtilis and other organisms which lack this enzyme by charging glutamate to both tRNA Glu and tRNA Gln with a nondiscriminating glutamyl-tRNA synthetase and then converting glutamyl-tRNA Gln to glutaminyl-tRNA Gln with glutamyl-tRNA Gln amidotransferase (GatABC complex). This is partly supported by observation that M. fermentans and the other 21 Mycoplasma species and strains have a protein homolog (MfeM64YM_0687 in M. fermentans) more similar to the nondiscriminating glutamyl-tRNA synthetase of B. subtilis (61.7% similarity) than the discriminating E. coli enzyme (53% similarity). Mycoplasma species has slightly less ribosomal subunit proteins than those of E. coli (total 55). Furthermore, the genes encoding the GatABC complex are also present in M. fermentans M64 and other sequenced Mycoplasmas. Upon the manually curated annotations from KEGG database (http://www.genome.jp/kegg/pathway/ko/ko03010.html) and validations for smaller proteins (,100 a.a.) by performing TBLASTN, a total of 47 proteins of the ribosome complex are highly conserved among the sequenced Mycoplasma species and strains. The S1, S6, S21, L9, and L32 subunits are absent in some of the species and S22, L25, and L30 subunits could not be found in any of the species. In line with our observations, Kawauchi et al. [68] employed two-dimensional polyacrylamide gel electrophoresis to analyze the M. capricolum ribosomes, and identified 30 proteins of large ribosomal subunit and 21 proteins of small ribosomal subunit. The M. capricolum ribosome lacks S1, S22, L25, and L30 proteins. S1 protein is present in seven species belonging to three close phylogenetic clusters including the M. lipophilum, M. synoviae, and M. pulmonis clusters [58] and two other species. Furthermore, S21 protein restrictively exists in nine species of the hemoplasma, pneumoniae, and spiroplasma groups [57]. Together, these observations suggested that some of these proteins may show the specificity for certain phylogenetic clades and be dispensable in Mycoplasmas as reported previously [12,69].

Potential Essential Genes in M. Fermentans
The essential genes from the closely related M. pulmonis, M. arthritidis, and M. genitalium and the distantly related B. subtilis and E. coli identified in previous studies were compared with the proteins predicted from the M. fermentans M64 genome to shed light on M. fermentans likely essential genes necessary for a minimal self-replicating cell ( Figure 5)  fermentans have yet to be validated, a subset of essential genes in Mycoplasma species that seem to be distinct from the two distantly related bacteria were identified. The higher similarity of M. fermentans with M. pulmonis than with M. arthritidis and M. genitalium is in accordance with their phylogenetic distance ( Figure S6). A relatively higher degree of conservation is observed for the essential genes participating in translation and transcription. In addition, a significant fraction of the essential genes involved in replication and repair, nucleotide metabolism, amino acid metabolism, and membrane transport also appeared to be highly conserved in the analyzed species. On the other hand, some of the proteins only seemed to be essential to one or more but not all species. It is conceivable that the genome reduction was accompanied by a shift of the essential gene sets during the evolution as shown in carbohydrate metabolism and membrane transport ( Figure 5).

Conservation of Residual Metabolism and Essential Genes among the Mycoplasmas
Analysis of the conservation of the genes among the Mycoplasmas indicated that some of the residual or partial metabolic capabilities in M. fermentans are highly conserved among the Mycoplasma species. A cross examination of the essential genes in M. pulmonis [69], M. arthritidis [12], and M. genitalium [70] suggested that many of the conserved genes in the residual metabolic functions are likely essential in Mycoplasmas. The incomplete nucleotide and carbohydrate metabolism were chosen as the examples to illustrate this property. Notwithstanding Mycoplasmas lack the full capability of de novo nucleotide synthesis, a total of 21 and 22 genes in purine and pyrimidine metabolic pathways, respectively, were found in the M. fermentans genome. Phylogenetic comparison of the nucleotide metabolic networks indicated that the genes present in M. fermentans are more conserved in purine metabolism than those in the pyrimidine pathway among the 21 sequenced Mycoplasma species ( Figure 6A). The enzymes which catalyzed clusters of connected reactions in purine metabolic pathway, except for Dgk and DeoB only present in six and twelve other species, respectively, are highly conserved among the Mycoplasmas. Most species have the enzymes for the synthesis of ATP from adenine or adenosine, and GTP and dGTP from guanine or guanosine. However, three phylogenetically related species -M. fermentans, M. agalactiae, and M. bovis in the M. lipophilum cluster and two other related species -M. arthritidis and M. hominis in the M. hominis cluster [58] lack the ribonucleotide diphosphate reductase complex for the conversion of ADP to dADP and thus dATP might be synthesized from dADP or dAMP or uptake from an extracellular source. On the other hand, the enzymes in the pyrimidine metabolic pathway are relatively less conserved among the Mycoplasmas. Due to the lack of at least one key enzyme, all sequenced species may rely on external supply of UTP, dCTP, and dTTP from the hosts or culture media. For instance, M. fermentans is short of genes encoding dCTP deaminase involved in the conversion of CTP to UTP and nucleosidediphosphate kinase responsible for the conversion of UDP to UTP, dCDP to dCTP, and dTDP to dTTP. Moreover, ribonucleosidetriphosphate reductase catalyzing the conversion of CTP to dCTP is also absent in it. Some of the species such as M. fermentans carry the pyrG gene and thus may be able to convert UTP to CTP.
Of the carbohydrate metabolism, M. fermentans M64 and the other Mycoplasma species seemed to possess a highly conserved glycolysis pathway as well as the incomplete and partially conserved pentose phosphate and pyruvate metabolic networks ( Figure 6B). Upstream of glycolysis pathway, a-D-glucose and b-D-glucose may enter the pathway through the enzyme GlcK which was only found in M. fermentans M64 and seven other species. The joint presence of AmyC and MalL in a few species including M. fermentans, M. pulmonis, M. hyorhinis, and M. crocodyli suggested that starch may be converted to dextran and then to a-D-glucose and subsequently enter the glycolysis pathway in these species. In addition, the presence of the FruA, FruK, and Fba, which have different degree of conservation, suggested that fructose may also be utilized by M. fermentans M64 and some other Mycoplasma species. In contrast to the highly conserved Fba which interconverts D-fructose 1,6-bisphosphate and glycerone  phosphate plus D-glyceraldehyde 3-phosphate, FruA does not exist in all of two hemoplasma species and six species in the M. hominis group including M. lipophilum, M. hominis, and M. synoviae clusters. FruK is present in all of three species belonging to the spiroplasma group, two species in the pneumoniae group, and two species in the hominis group including M. hyorhinis and M. fermentans. Within the pentose phosphate pathway, the enzymes Tkt, Rpe, and RpiB for the conversion of D-glyceraldehyde 3phosphate to D-ribulose 5-phosphate are conserved in all species except for the hemoplasma. The enzyme PrsA, which converts Dribulose 5-phosphate to 5-Phospho-alpha-D-ribose 1-diphosphate (PRPP) could be found in all Mycoplasma species. Many Mycoplasma species seemed to be able to catabolize pyruvate into acetyl-CoA and acetate through a conserved pathway, whereas only some have a complete set of enzymes to further metabolize pyruvate to lactate or ethanol. M. fermentans, M. agalactiae, M. bovis, and M. crocodyli in the hominis group, M. mycoides and M. leachii in the spiroplasma group, and M. penetrans have lactate dehydrogenase catalyzing the interconversion between lactate and pyruvate. Of these species, M. fermentans, M. crocodyli, and M. penetrans are also capable of converting acetate to ethanol by exerting aldehyde dehydrogenase and alcohol dehydrogenase which only exist in a couple of species. This implies that M. fermentans M64 and several other Mycoplasma species may be able to undergo lactic or alcoholic fermentation. Similarly, when the enzymes in the incomplete amino acid and vitamin and cofactor metabolic networks were examined, some of the enzymes also found to be conserved, suggesting that at least some of the residual metabolic activities may be related to their survival ( figure S7).

Genes Outside of the Mycoplasma Clade
Sequence analysis identified 15 genes in M. fermentans M64 with no homolog among all of significant BLASTP hits coming from Mycoplasma species, suggesting that they are not conserved in the Mycoplasma clade (See comment on the Table S2). These genes might be the vestiges of the genome reduction from the last common Gram-positive ancestor, or be the horizontally transferred genes. The genes were dispersedly distributed on the chromosome. Among these, 7 genes were closest to the Grampositive bacteria, and 8 genes were closest to Gram-negative bacteria. All of 7 Gram-positive bacteria belonged to Firmicutes, of which 6 belonged to Clostridia. Those 7 genes with highest homologies to Firmicutes were regarded as the possible preserved genes after the onset of the evolution of genome minimization. In addition, the 8 genes closest to Gram-negative bacteria, including 4 species belonging to Bacteroidetes/Chlorobi group, 3 to Cyanobacteria, and 1 to Spirochaetes, might have been transferred laterally to M. fermentans M64. No UGA codon was detected in these 8 putative horizontally transferred gene candidates, suggesting that these genes might not have been transferred for a long period and thus was insufficient to induce the codon bias. The proteins encoded by the 15 genes consisted of 5 conserved hypothetical proteins, 3 AAA+superfamily ATPases, and 7 different proteins. During the evolution of M. fermentans M64, AAA+superfamily ATPase might play an important role in the compensation for genome minimization because this protein is accountable for a large variety of diverse functions via molecular remodeling dependent of the energy from ATP hydrolysis, such as protein degradation, DNA recombination, replication, repair, and so on [73]. Moreover, the hsdS among the 15 genes encodes a subunit of type I restriction enzyme which could protect the hosts against foreign DNA such as those from bacteriophages. For those 15 gene candidates, the protein phylogeny was reconstructed for M. fermentans M64 homolog and the twelve most similar homologs of different organisms in the BLASTP results, and further, M. fermentans M64 and those twelve organisms were analyzed for the 23S rRNA phylogeny to elucidate their evolutionary relationship. Regarding the ORF MfeM64YM_1027, M. fermentans M64 occupied a distinct, individual branch with strong branch support and was distant from Clostridium perfringens in the 23S rRNA phylogenetic tree ( Figure S8A). However, M. fermentans M64 MfeM64YM_1027 was closest to the homolog of C. perfringens, raising the possibility that this gene might be a remnant after genome minimization ( Figure S8B). Similar result was also observed for another gene (MfeM64YM_0060) which product was closest to the homolog of a Gram-negative bacterium ( Figure  S8C and D). Taken together, the preserved genes after the onset of genome reduction and horizontally transferred gene candidates appeared to play some critical roles in the versatility of a small genome and shaping of a limited metabolism.

Conclusion
The genome of M. fermentans M64, despite the interference of a large proportion of repetitive sequences when assembling the whole genome shotgun sequences, has been successfully determined in our previous study [52]. Rare conservation of transposable elements among the Mycoplasmas suggests that most elements were integrated in the genomes after speciation. Clustering analysis of Mycoplasma genes indicated the conservations of degenerated metabolisms and cellular processes, suggesting that some of the remaining metabolic functions might be indispensable. On account of the lack of complete amino acid, nucleotide, and vitamin and cofactor synthesis pathways, these compounds or their intermediate metabolites must be obtained from external sources and entered the cells in the forms such as peptides, amino acids, and nucleotides with poorly understood transport systems. The completion of the M64 and JER strains of M. fermentans and the other Mycoplasma genomes would facilitate the revelation of these systems and the biology of this interesting group of microorganisms with reduced genomes.

Comparative Metabolomics
The metabolism of M. fermentans M64 and the other Mycoplasma species were compared by clustering analysis of the homologous genes. The clustering analysis and comparison of the presence and absence of genes in different functional groups was based on the KO (KEGG Orthology) assignments of the homologous proteins in KEGG database. M. fermentans M64 protein sequences were submitted to the KEGG Automatic Annotation Server (KAAS) [74] to obtain the KO number and pathway assignments. While similar information of the other Mycoplasma species and Phytoplasma OY were extracted from the organism-specific files in the KEGG database.

Conservation of Metabolisms
A new method for evaluating the phylogenetic conservation of metabolism in a genus/group of organisms was developed to predict the most conserved metabolism among the Mycoplasma species. This analysis is based on the hypothesis that the probability of the preservation of an enzymatic reaction in two closely related species should be different from that of two distantly related species. The M. fermentans M64 metabolic networks constructed with the KO number and KEGG pathway information served as the reference networks in this analysis. The conservation of enzymes, reactions, and metabolic pathway between M. fermentans M64 and the 19 Mycoplasma species and Phytoplasma OY were calculated with the following equations: Enzyme conservation score (ECS)~E|D, and Reaction conservation score (RCS)~X ECS~X E|D in which, ECS calculates the conservation of the enzyme of the compared species; E is the presence (1) or absence (0) of the enzyme; D is the ''Phylogenetic Distance'' between M. fermentans and the compared species which is expressed as the substitution rate (D) indicated in the 23S rRNA phylogenetic tree of the 21 Mycoplasma species and one outgroup species. The phylogenetic tree was constructed with PHYML 3.0 [75] and evaluated with the aLRT method [76]. The phylogenetic distance was considered as a weight for the enzyme conservation in this analysis. If an enzyme exists in two distantly related species, the enzyme likely has higher conservation in this genus/group of organisms, and vice versa. Each RCS represents the conservation of the enzyme orthologs in all of compared species which catalyze the same reaction.

Essential Genes
The essential genes of M. pulmonis, M. arthritidis, M. genitalium, B. subtilis, and E. coli identified in previous studies [12,69,70,71,72] and the other proteins predicted from the genomes were submitted to KAAS to obtain their KO numbers via sequence homology searches. The M. fermentans M64 essential genes were predicted by clustering analysis with the essential and putative non-essential proteins according to the KO number and through BLASTP [77] analysis of each protein against a consolidated database of the proteomes of the six analyzed species.

Genes Outside of the Mycoplasma Clade
The M. fermentans M64 genes outside of the Mycoplasma clade were identified by examining all of significant hits of the BLASTP [77] search results against the proteomes of sequenced prokaryotic organisms retrieved from the UniProt database (http://www. uniprot.org/downloads). Genes were considered to be outside of the Mycoplasma clade if no hit among all significant matches belonged to Mycoplasma species as described previously [15]. Then the closest organisms for genes outside of the Mycoplasma clade were assigned as the top non-Mycoplasma hit which had more than 70% sequence coverage in the alignment with M. fermentans sequence.