Evolutionary Convergence and Nitrogen Metabolism in Blattabacterium strain Bge, Primary Endosymbiont of the Cockroach Blattella germanica

Bacterial endosymbionts of insects play a central role in upgrading the diet of their hosts. In certain cases, such as aphids and tsetse flies, endosymbionts complement the metabolic capacity of hosts living on nutrient-deficient diets, while the bacteria harbored by omnivorous carpenter ants are involved in nitrogen recycling. In this study, we describe the genome sequence and inferred metabolism of Blattabacterium strain Bge, the primary Flavobacteria endosymbiont of the omnivorous German cockroach Blattella germanica. Through comparative genomics with other insect endosymbionts and free-living Flavobacteria we reveal that Blattabacterium strain Bge shares the same distribution of functional gene categories only with Blochmannia strains, the primary Gamma-Proteobacteria endosymbiont of carpenter ants. This is a remarkable example of evolutionary convergence during the symbiotic process, involving very distant phylogenetic bacterial taxa within hosts feeding on similar diets. Despite this similarity, different nitrogen economy strategies have emerged in each case. Both bacterial endosymbionts code for urease but display different metabolic functions: Blochmannia strains produce ammonia from dietary urea and then use it as a source of nitrogen, whereas Blattabacterium strain Bge codes for the complete urea cycle that, in combination with urease, produces ammonia as an end product. Not only does the cockroach endosymbiont play an essential role in nutrient supply to the host, but also in the catabolic use of amino acids and nitrogen excretion, as strongly suggested by the stoichiometric analysis of the inferred metabolic network. Here, we explain the metabolic reasons underlying the enigmatic return of cockroaches to the ancestral ammonotelic state.


Introduction
In 1887, Blochmann first described symbiotic bacteria in the fatty tissue of blattids [1]. Later, Buchner [2] suggested that symbionts are involved in the decomposition of metabolic endproducts from the insect host. A classic example is the cockroach. Several pioneering studies correlated the presence of cockroach endosymbionts with the metabolism of sulfate and amino acids [3,4]. These endosymbionts were classified as a genus Blattabacterium [4], belonging to the class Flavobacteria in the phylum Bacteroidetes [5] and they live in specialized cells in the host's abdominal fat body. Apart from cockroaches, they were only found in the primitive termite Mastotermes darwiniensis [6]. Phylogenetic analyses for the Blattabacterium-cockroach symbiosis supported the hypothesis of co-evolution between symbionts and hosts dating back to an ancient feature of more than 140 million years ago [7,8]. Recently, genome sizes of the Blattabacterium symbionts of three cockroach species, B. germanica, Periplaneta americana, and Blatta orientalis were determined by pulsed field gel electrophoresis as approximately 650615 kb [9]. Similarly, the authors demonstrated the sole presence of Blattabacterium strains in the fat body of those cockroach species by rRNA-targeting techniques. Phylogenetic analyses based on 16S rDNA also confirmed the affiliation of these endosymbionts to the class Flavobacteria [9]. Therefore, they are phylogenetically quite distinct from the majority of intensively studied insect endosymbionts that belong to the phylum Proteobacteria, mainly class Gamma-Proteobacteria. Recently, the highly reduced genome of ''Candidatus Sulcia muelleri'' (from now S. muelleri), an insect endosymbiont belonging to the class Flavobacteria has been also completely sequenced [10].
Primary endosymbionts such as Buchnera aphidicola or Wigglesworthia glossinidia complement the metabolic capacity of aphids or tsetse flies, respectively that feed on different nutrient-deficient diets [11]. There are also examples of metabolic complementation between two co-primary endosymbionts and their hosts. This is the case of S. muelleri, living in the sharpshooter Homalodisca vitripennis, which coexists with another Gamma-Proteobacteria endosymbiont, ''Candidatus Baumannia cicadellinicola'' (hereafter B. cicadellinicola). Both have developed a metabolic complementation to supply the host with the nutrients lacking in the limited xylem diet [12]. Another example is the case of B. aphidicola and ''Candidatus Serratia symbiotica'', co-primary endosymbionts of the cedar aphid Cinara cedri that complement each other in the provision of essential nutrients [13,14].
Omnivorous insects also harbor endosymbionts. It is the case, for example, of ants of the genus Camponotus and their primary endosymbionts, the Gamma-Proteobacteria ''Candidatus Blochmannia floridanus'' [15] and ''Candidatus Blochmannia pennsylvanicus'' [16] (from now B. floridanus and B. pennsylvanicus, respectively). In this association endosymbionts play an important role in nitrogen recycling [17].
Evolutionary convergences are generally considered as evidence of evolutionary adaptation. The study of endosymbiont evolution could provide examples of evolutionary convergences if we were able to show that very distant phylogenetic groups present similar functional repertoires and metabolic capabilities when they have evolved endosymbiosis in organisms having similar feeding behaviors. This may be the case of Blochmannia (a gammaproteobacterium) and Blattabacterium (a flavobacterium) that have independently evolved in carpenter ants and cockroaches, two omnivorous insects.
In this study, we determine the genome sequence of an endosymbiotic flavobacterium, Blattabacterium strain Bge, primary endosymbiont of the German cockroach B. germanica. We have also inferred the metabolism to try to understand why cockroaches excrete ammonia, instead of being uricotelic like other terrestrial invertebrates, thus breaking the so-called ''Needham's rule'' [18], a question that has puzzled physiologists for a long time. Finally, we compare the inferred metabolism with the corresponding one of B. floridanus, the primary endosymbiont involved in nitrogen recycling in the carpenter ant Camponotus floridanus, an insect that has also a complex diet.

Author Summary
Bacterial endosymbionts from insects are subjected to a process of genome reduction from the moment they interact with their host, especially when the symbiosis is strict (the partners live together permanently) and the endosymbiont is maternally inherited. The type of genes that are retained correlates with specific metabolic host requirements. Here, we report the genome sequence of Blattabacterium strain Bge, the primary endosymbiont of the German cockroach B. germanica. Cockroaches are omnivorous insects and Blattabacterium cooperates with their metabolism, not only with essential nutrient metabolism but also through an efficient use of amino acids and the nitrogen excretion by the combination of a urea cycle and urease activity. The repertoires of functions that are maintained in Blattabacterium are similar to those already observed in Blochmannia spp., the primary endosymbiont of carpenter ants, also an omnivorous insect. This constitutes a nice example of evolutionary convergence of two endosymbionts belonging to very different bacterial phyla that have evolved a similar repertoire of functions according to the host. However, the current set of genes and, more importantly, those that were lost in the process of genome reduction in both endosymbiont lineages have also contributed to a different involvement of Blattabacterium and Blochmannia in nitrogen metabolism.
chromosome is 637 kb, and the G+C content is 27.1%. Only 23.4 kb are not-coding and they are distributed in 480 intergenic regions with an average length of 49 bp. The overall coding density (96.3%) is the highest among insect endosymbionts known to date, indicating a highly compact genome. It is surprisingly higher than the most reduced insect endosymbiont ''Candidatus Carsonella ruddii'' (93.4%) [19]. In addition, 1.5 kb correspond to 139 overlapping regions with an average length of 11 bp. Of these overlaps, 94 (67.6%) are between genes on the same strand and 1 to 70 bp long. The other 45 cases (32.4%) involve two genes on opposite strands and are between 2 and 50 bp long. Of these, only in one case the two genes overlap with their start regions, whereas in the rest the overlap is in the terminal region of the genes. On the other hand, in ''Ca Carsonella ruddii'' 92% of the 126 overlaps are in tandem orientation, and thus on the same strand, and only five cases are between opposite strands, involving the termini and starts of the overlapping genes.
Assembly of the pyrosequencing data gave highly reliable contigs that combined with the data from Sanger sequencing resulting in a single contig, representing the entire genome. Probably due to the formation of a secondary structure, only a 33 bp stretch in an intergenic region upstream of the GroEL gene was not covered by pyrosequencing data but only by Sanger reads. Furthermore, annotation of the ORFs allowed a clear assignation of protein functions even in cases with only weak similarities with existing database entries. Not a single case of a possible host gene incorporated in the symbiont genome was found. Neither had we found coding sequences affiliated with Blattabacterium strain Bge outside the genome that could have been assigned to the host genome.
A total of 627 putative genes have been assigned ( Figure S1), 586 of which are protein coding genes (CDS), 40 are RNAspecifying genes (34 tRNAs, 3 rRNAs located in a single operon, one tmRNA, and the RNA components of RNase P and the Signal recognition particle). The only pseudogene found corresponds to the protein component of RNase P. This gene coding for 118 amino acids is disrupted by an in-frame stop codon at amino acid position 53. The RNase P proteins of the free-living F. psychrophilum [20], Flavobacterium johnsoniae (http://genome.jgi-psf. org/flajo/flajo.info.html) and Gramella forsetii [21] contain a lysine residue at that position. Therefore, it is possible that the stop codon has been generated by an A-T point mutation in position 157 of the nucleotide sequence. Despite this mutation, the RNase P could be functional as it has been described that in vitro the RNA component can act enzymatically without a functional protein component [22]. Regarding the coding genes, it is interesting that, despite the compactness of the genome, there are eight gene duplicates: miaB, rodA, serC, lpdA, ppiC, argD, hemD, and uvrD.
No specific sequence of the origin of replication (oriC), such as dnaA boxes, was found in the genome [23]. Likewise dnaA, which codes for the protein that initiates replication by binding to such sequences, was also absent. Thus, the putative origin of replication was determined by GC skew analysis. The transitional region where the GC skew changes from negative to positive one ( Figure  S2) showed the position of replication origin to be in the gene dapB. It is worth mentioning that neither dnaA nor any of the genes normally adjacent to the replication site in bacteria (dnaN, hemE, gidA, hemE, and parA) have been found in this genome. However, Blattabacterium strain Bge, has retained recA, which could trigger replication by an alternative mechanism [15,23].

Functional analysis of the predicted protein-coding genes
We have inferred the metabolism of Blattabacterium strain Bge from its complete genome ( Figure 1). Blattabacterium strain Bge possesses a limited capacity for nutrient uptake with only one ABC-type transport system, which may be specialized in fructose transport because this bacterium, contrary to the other sequenced endosymbionts, seems unable to use glucose as a nutrient. On the other hand, Blattabacterium strain Bge also codes for a glycerol uptake facilitator that enables transport of solutes, such as O 2 , CO 2 , NH 3 , glycerol, urea, and water. Therefore, it is possible that Blattabacterium strain Bge obtains carbon from glycerol as a supplementary source.
A sodium/drug antiporter, NorM, is also encoded by this genome. This system of efflux drug transport is common among enterobacteria but not among flavobacteria. In this group it is only known for the free-living bacteria F. psychrophilum and G. forsetii. This system can act as a multidrug transport as well as transporting oligosaccharidyl lipids and polysaccharide compounds.
There is an array of metal ion homeostasis transporters. In Blattabacterium strain Bge, there is a Trk transport system, a uniporter of the monovalent potassium cation, which requires a proton motive force and ATP in order to function. Only W. glossinidia has a similar transport system, although the encoded subunits differ: trkA and trkB in Blattabacterium; trkA and trkH in W. glossinidia. Other solutes are also transported by symport systems. Blattabacterium strain Bge is able to uptake glutamate and aspartate via a proton symporter. Both metabolites play an important role in the metabolism of this bacterium (see below). A phosphate/sodium symporter is also present.
Regarding electron transport, the encoded NADH-dehydrogenase (ndh) oxidizes NADH without proton translocation. There is also a succinate dehydrogenase (sdhABD). Electrons are transferred to a membrane-bound menaquinone (MQ) and a molybdenum-oxidoreductase, which accepts electrons from the MQ. With these elements, a proton motive force can be generated.
Blattabacterium strain Bge seems to be able to reduce intracellular sulfate to sulfite. A number of genes required for sulfur assimilation present in the genome, include those encoding for the two subunits of the sulfate adenylyltransferase, cysN and cysD, the adenosine phosphosulfate (APS) reductase cysH and the sulfite reductase proteins cysI,J. There is a missing step for the conversion of adenosine-59-phosphosulfate (APS) into 39-phospho adenosine-59phosphosulfate (PAPS). The generated sulfite is reduced to sulfide further on and assimilated into the sulfur-containing amino acids L-cysteine and L-methionine.
Blattabacterium strain Bge is able to synthesize its own cell wall and plasma membrane. However, it has lost the entire pathway required for lipopolysacharide (LPS) biosynthesis, like all sequenced Buchnera strains and B. cicadenillicola. This property explains why Blattabacterium strain Bge, similarly to these bacteria, are surrounded by a host vacuolar membrane, as shown in the electron-microscopy images ( Figure S3).
Regarding amino acid biosynthesis, Blattabacterium strain Bge has the genes encoding biosynthetic enzymes needed to synthesize 10 essential (His, Trp, Phe, Leu, Ile, Val, Lys, Thr, Arg, and Met) and 7 nonessential (Gly, Tyr, Cys, Ser, Glu, Asp, and Ala) amino acids. Thus, the endosymbiont metabolism relies on Pro, Gln and Asn supplied by the host. Also present is the complete machinery to synthesize nucleotides, fatty acids, and the cofactors folic acid, lipoic acid, FAD, NAD, pyridoxine, and riboflavin. Finally, genes encoding enzymes for the synthesis of siroheme and menaquinone were also identified.
With respect to the metabolism of carbohydrates, genome analysis of Blattabacterium strain Bge indicates the presence of a truncated glycolysis pathway, since the genes that encode for phosphofructokinase (pfkA) and pyruvate kinase (pyk) are missing, as well as any sugar phosphorylating system except for fructose. Therefore, the pathway begins with fructose-1 phosphate and continues with the canonical enzymatic steps until the synthesis of phosphoenolpyruvate (PEP). Given the lack of pyruvate kinase genes, Blattabacterium strain Bge must produce pyruvate via the malic enzyme (NADP + -dependent malate dehydrogenase). Additionally, a complete non-oxidative pentose phosphate pathway is encoded in Blattabacterium strain Bge. As it is the case with Wigglesworthia, the glycolytic enzymes seem to be involved in gluconeogenesis rather than glycolysis complementing the nonoxidative pentose phosphate pathway [24].
In summary, although Blattabacterium strain Bge genome shows a strong reduction in gene number in all the functional categories, compared to their free-living relatives (see below), the core of essential functions and pathways is particularly well preserved.

Comparative analysis and functional convergence
The protein genes of Blattabacterium strain Bge were classified according to COG categories ( Figure 2, Table 2). This distribution was compared with those of twelve selected bacteria: four Flavobacteria, which included three free-living species (F. psychrophilum, F. johnsoniae and G. forsetii) and the endosymbiont S. muelleri, and eight Proteobacteria endosymbionts, seven Gamma-Proteobacteria (B. floridanus, B. pennsylvanicus, B. cicadellinicola, B. aphidicola Aps, B. aphidicola Cce, S. glossinidius, and W. glossinidia) and one Alfa-Proteobacterium (Wolbachia sp. from Drosophila simulans). Taking the observed distribution of COG categories for Blattabacterium strain Bge as the expected distribution followed by each of the other bacteria examined, the hypothesis of equal distribution was rejected in all but the carpenter ant endosymbionts, Gamma-Proteobacteria B. floridanus and B. pennsylvanicus (Table 2). These results suggest that it is the hosts' diet (cockroaches and carpenter ants are both omnivores) rather than phylogenetic closeness which is more strongly linked with the type of genes retained. This appears to be a clear case of functional evolutionary convergence in a broad sense. The proximity between the endosymbionts from omnivorous hosts was also confirmed when a dendrogram was created using the matrix of Kulczynski phenetic distances ( Figure 3A). To locate the phylogenetic position of Blattabacterium strain Bge and compare it with the COG-based functional analysis, we used a phylogenetic tree based on 16S rDNA gene sequences ( Figure 3B). As expected, the 16S rDNA gene analysis clearly separate Bacteroidetes from Proteobacteria phyla. Blattabacterium strain Bge clusters monophyletically within the Bacteroidetes phylum. The functional clustering differs clearly from the phylogenetic one.

Nitrogen economy of Blattabacterium strain Bge
A striking trait of this genome is the presence of a complete urea cycle (Figure 4). This feature has been described in few bacteria, and in only one member of the Bacteroidetes phylum, the cellulolytic soil bacterium Cytophaga hutchinsonii [25]. Moreover, to date, there are no reports of a complete urea cycle in an endosymbiont. The Blattabacterium strain Bge genome also retains the genes for the catalytic core of urease and we have detected urease activity in endosymbiont-enriched extracts of cockroach fat body (see below).
The genome of Blattabacterium strain Bge has two urease genes, ureAB and ureC, coding for the catalytic subunits, but lacks all genes for the accessory proteins supposedly required to produce an active enzyme in most bacteria. The ureAB fusion is not a novel situation since fused urease genes have also been described in other bacterial genomes, as it is the case of the free-living Flavobacterium C. hutchinsonii [25]. Regarding the lack of accessory genes, a similar situation is found in Bacillus subtilis cells expressing urease activity, which are able to grow with urea as sole nitrogen source [26]. To corroborate the presence of an active urease in Blattabacterium strain Bge, we performed an enzymatic assay on crude extracts of the endosymbiont-enriched fraction of the B. germanica fat body. Figure  S4 shows a representative result for the urease assay. Although the detected specific activity under our experimental conditions was low (2 mU mg 21 protein; 1 U of urease corresponds to the formation of 1 mmol of ammonia per min), it was reproducible. Urease activity was also reproducibly detected in endosymbiont extracts from P. americana fat body (data not shown).
To further study the inferred metabolism in relation to nitrogen economy, we carried out a stoichiometric analysis of the reactions  involved in the Krebs and urea cycles as well as other directly related reactions, such as urease, the malic enzyme, and their links to amino acid utilization (Figure 1 and Figure 4). Our results strongly suggest a key involvement of the endosymbionts in nitrogen metabolism and excretion in the German cockroach, in addition to their role in providing essential amino acids and coenzymes to the host. It is also worth mentioning that the endosymbiont metabolism relies on a supply of Gln from the host to cater for all its biosynthetic needs, including the urea cycle. Stoichiometric analysis shows that eleven out of fourteen elementary modes produce ammonia (Table S1). It follows that the metabolic network of Blattabacterium strain Bge could potentially use amino acids efficiently as energy and reducingpower sources, generating nitrogen waste in the form of ammonia ( Figure 4).

Comparison of nitrogen economy in endosymbionts of omnivorous hosts
Urease genes are also present in the Blochmannia endosymbiont genome [15] and the biochemical function of the urease in the carpenter ant endosymbionts is completely different from Blattabacterium. Studies of gene expression [27] and feeding experiments with 15 N-labelled urea [17] in carpenter ants corroborate the role of urease in the transfer of nitrogen from dietary urea into the hemolymph amino acid pool. This requires an endosymbiont glutamine synthase to act as an essential step in nitrogen conservation during amino acid anabolism. Thus, although carpenter ants are omnivorous, their bacterial endosym-bionts may upgrade their diet via an efficient nitrogen economy [17]. German cockroaches are also omnivorous; however, their endosymbionts lack genes encoding a glutamine synthase-like activity, a clear indication that the metabolic function of urease is not the same in the German cockroach and carpenter ant endosymbionts because generated ammonia cannot be reassimilated. Therefore, although we have revealed a functional convergence between the cockroach and carpenter ant endosymbionts, which is probably due to their hosts' omnivorous diets, they differ greatly from a metabolic viewpoint in detail, particularly in terms of nitrogen metabolism.
Traditionally, Blattabacterium endosymbionts have been postulated to be involved in the metabolism of uric acid in cockroaches. For instance, uric acid accumulation has been observed in aposymbiotic cockroaches [28,29]. Metabolic use of nitrogen derived from fat body urates has been observed in B. germanica under certain conditions (e.g., in females on low-protein diet [30] and consumption of empty spermatophores by starved females [31]). Interestingly, fat body endosymbionts have been involved in uric acid degradation to CO 2 in experiments with the wood cockroach Parcoblatta fulvescens injected with 14 C-hypoxanthine [32]. Although involvement of gut microbiota cannot be completely ruled out, endosymbiont metabolism seemed more likely [33]. However, our results show that the endosymbiont genome does not code for any activity related to either the synthesis or the catabolism of urates. Therefore, and contrary to early reports based on putative cultured endosymbiotic bacteria [29], Blattabacterium strain Bge cannot participate in the metabo-  Table 2). In all cases, except one, the null hypothesis of getting by chance the corresponding cluster was rejected (bootstrap values were equal or higher than 90%). lism of this nitrogen compound directly. Since uricase activity has been detected in the fat body of the cockroach [28,34,35], the host could contribute with uric-derived metabolites to the nitrogen economy of the endosymbiont which, in turn, would produce ammonia and carbon dioxide as final catabolic products.

The question of ammonotelism
The genome sequencing, metabolic inference, detection of a urease in the endosymbiont and the stoichiometric analysis of the central pathways of Blattabacterium strain Bge shed light on a whole series of hitherto unexplained classical physiological studies on ammonotelism in cockroaches [33,36,37]. Contrary to the speculation that some terrestrial invertebrates, like gastropods, annelids [36] and isopods [38], exploit ammonia excretion as ''a return to the cheapest way'' [38] to eliminate nitrogen, the case of the German cockroach and its bacterial endosymbionts indicates that this might not be the case. The evolution of terrestrial-living metazoa has favored the emergence of uricotely (e.g. the majority of insects) and ureotely (e.g. mammals) as water-saving strategies. Meanwhile, ammonotely, the ancestral character present in aquatic animals, has classically been considered maladaptive for terrestrial animals [18]. Symbiosis seems to play a role in this ''return'' of cockroaches to ammonotely by providing new enzymes required for this new nitrogen metabolism. Thus the metabolic capabilities acquired by symbiogenesis [39] afford to explore new ecological niches and dietary regimes.

Materials and Methods
Blattabacterium strain Bge genomic DNA preparation B. germanica (Blattaria: Blattellidae) was reared in the Entomology laboratory (Cavanilles Institute for Biodiversity and Evolutionary Biology, University of Valencia). The cockroaches were kept in the laboratory at 25uC and fed with a mixture of dog food (2/3) and sucrose (1/3).
The bacterial endosymbionts were extracted from the fat body of B. germanica females. To do so, cockroaches were killed by a 15 to 20 min treatment with ethyl acetate and the bacterial cells were separated from the fat body as in [15]. An enriched fraction of bacteriocytes is then obtained that is used to extract total DNA following a CTAB (Cetyltrimethylammonium bromide) method.

Sequencing of Blattabacterium strain Bge genome
The complete genome sequence of Blattabacterium strain Bge was obtained by a hybrid sequencing approach based on ABI 3730 sequencers and the pyrosequencing system (454; Life Science). To construct shotgun libraries, DNA fragments were generated by random mechanical shearing with a sonicator and posterior separation in a pulsed field gel electrophoresis. Insert sizes of 1-2 kb and 3-5 kb were purified and cloned into vector from XL-TOPO PCR cloning kit. Plasmid DNA was extracted using 96well plates (Millipore) with the PerkinElmer MULTIPROBE II robot according to the manufacturers. DNA sequencing was performed on an ABI PRISM 3730 Genetic Analyzer (Applied Biosystems). In the initial random sequencing phase 9,227 sequences were obtained with 1.5-fold sequence coverage. Given the lack of joining between sequences, which may have been due to a large number of sequences from the host, a strict sequence analysis was performed with a specific bioinformatic tool called a Categorizer. It carries out a sequence classification method based on n-mers composition to correctly distinguish between Blattabacterium strain Bge and contaminating host sequences. This classifier was trained with sets of sequences identified from Blattabacterium strain Bge and the host. With these sets, we constructed a feature vector or model representing the 4-to 7-mers usage pattern of each organism. Then the n-mers composition of each read was compared with these generated models with a k-nearest neighbor clustering algorithm (KNN).
Although the number of retrieved host sequence reads was higher than the one of Blattabacterium strain Bge sequences for both sequencing approaches, the pyrosequencing approach generated enough sequences to close the gaps identified with the first method. The tool Gap4 from Staden Package [40] was used for the total assembly.

Electron microscopy of Blattabacterium strain Bge
Fat body of B. germanica was isolated and prefixed in a 2.5% paraglutaraldehyde fixative mixture buffered with 0.1 M phosphate at pH 7.2 (PB). Prefixation was performed at 4uC for 24 h and then rinsed several times in PB. To avoid the loss of this dispersed tissue, the fat body was placed in agar (2%) forming small blocks. After prefixation, these blocks were fixed in 2% osmium tetroxide for one hour, dehydrated in graded alcohol and propylene oxide, stained in a saturated uranyl acetate solution 2% and embedded in araldite to form the definitive blocks. Thin sections (0.05 mm) were made using the Reichert-Jung ULTRA-CUT E (Leica) ultramicrotome, and then were stained with uranyl acetate and lead citrate. A JEOL-JEM 1010 electron microscope was used for the analysis.

ORF prediction and gene annotation
The putative coding regions (CDSs) in the Blattabacterium strain Bge genome were identified with the GLIMMER3 program [41]. This program was first trained with closely related organism sequences from the Flavobacteria group. The coding sequence model obtained was then used by GLIMMER3 to scan the genome to predict potential coding regions by considering the putative existence of initiation codons and ORF length. Start and stop codons of each putative CDS were curated manually through visual inspection of the Blattabacterium strain Bge Genome Browser, a database specially designed for this symbiont. The putative coding proteins were initially analyzed by reciprocal best hits to determine orthology between genes of the Blattabacterium and those from bacteria belonging to the Flavobacteria group. According to these criteria, two genes are orthologs when a gene in one genome matches as the best hit with a gene in the other genome. Sequences that could not be assigned to any function in comparison with flavobacterial genomes were identified by searching a non-redundant protein database using BLASTX [42]. Final annotation was performed using BLASTP comparison with proteins in the NCBI and Pfam domains identified using the Sanger Centre Pfam search website. Non-coding RNAs were identified by different approaches. The tRNAscan program was used to predict tRNAs, as well as other small RNAs, like tmRNA, the RNA component of the RNase P. Signal Recognition Particle RNA were identified by programs like ARAGORN, BRUCE and SRPscan, as well as consulting the Rfam database [43][44][45].
In the absence of a diagnostic cluster of DnaA boxes, the origin of replication was identified by GC-skew calculated as (C2G)/ (C+G) using the program OriginX [46]. The origin is located in the transitional region where the GC-skew changes from negative to positive values.

Inferred metabolism of Blattabacterium strain Bge
The ORFs orthologous to known genes in other species were catalogued based on non-redundant classification schemes, such as COG (Clusters of Orthologous Groups of Proteins). A metabolic network was reconstructed using the automatic annotator server from KAAS-KEEG [47]. According to our genome annotation, each pathway was examined checking the BRENDA [48] and EcoCyc databases [49].

COG categories: statistical tests
Comparison between the COGs distribution of each species with that of the Blattabacterium strain Bge was carried using chisquare tests. To avoid the problem of multiple testing, we applied the Bonferroni correction so that for each individual test the significance level was 0.05/12 = 0.0042. That is, if the p-value is lower than 0.0042 then the hypothesis is rejected. The first p-value corresponds to the standard chi-square test (Chi2 p-value, df = 19). Due to the asymptotic nature of this test, expected frequencies should be higher than 5. However, we might expect some frequencies with low values. To correct this situation we also performed a Monte-Carlo version of this test (MC p-value). We performed 19,999 simulations under the null hypothesis, which together with the observed Chi2 statistics constituted a set of 20,000 values. The MC p-value cannot be lower than 1/20,000 = 5.00E-5.

Kulczynski distance matrix and dendrogram
The Kulczynski distance between species 1 and 2 is given by 120.5(S j min(y 1j ,y 2j )/S j y 1j + S j min(y 1j ,y 2j )/S j y 2j ) where j (from 1 to 20) refers to the corresponding normalized COG categories (from 0 to 1). The dendrogram was derived from the corresponding distance matrix by applying a complete clustering method in which the distance between clusters A and B is given by the highest distance between any two species belonging to A and B, respectively. The statistical significance of the clusters of the dendrogram was evaluated by bootstrap analysis based on 100,000 replicates.

Phylogenetic analyses
The sequences of 16S rDNA were aligned with MAFFT (v6.240) [50] program. The positions for the phylogenetic analysis were derived by Gblocks v0.91b [51]. In total, 1530 nucleotides were selected. The phylogenetic reconstruction was carried out by maximum likelihood using the PHYML program [52]. The best evolutionary model chosen by MODELTEST [53] was a GTR + Gamma (G) + I (Proportion invariant). Bootstrap values were based on 1000 replicates.

Urease assay
Abdominal fat bodies from dissected B. germanica adult females were homogenized with a Douce homogenizer adding a 50 mM HEPES buffer containing 1 mM EDTA, pH 7.5. The crude extract was centrifuged for 25 min at 6000 rpm at 4uC, and the pellet was resuspended with the homogenization buffer. The supernatant and a crude extract of cockroach heads (host tissue without endosymbionts) were used in control experiments. The resuspended pellet or bacteria-enriched fraction was treated with lysozyme (3.5 U mL 21 ) for 30 min at 4uC and sonicated for 5 sec. Urease activity was determined incubating the extract at 37uC with 110 mM urea. At different time intervals the reaction was stopped by adding 1 vol. 10% trichloroacetic acid and the produced ammonia was measured by the colorimetric Berthelot method [54] as described in [55]. The protein content was measured with a Nanodrop ND1000 equipment.

Stoichiometric analysis
Stoichiometric analysis (using METATOOL) [56] was performed on the central pathways directly involved in amino acid catabolism, including the Krebs and urea cycles. Information about the reversibility of reactions was checked in the BRENDA database [48]. The input file for METATOOL is available upon request to the corresponding author.

Database submission
The genome was sent to GenBank and has been assigned accession number CP001487. Figure S1 Circular map of the Blattabacterium strain Bge genome. From outer to inner circles: Genome length (in bp), COG categories separately for both strands, GC content (red: % value above average of 27.1%, green: below average), GC skew (red: positive skew, blue: negative skew), and tRNA genes for both strands. Table S1 Stoichiometric analysis. The results correspond to the stoichiometric analysis of the set of reactions represented in Figure 4. The METATOOL program calculates the stoichiometric matrix and several structural properties of the metabolic network under study. We indicate the Convex Basis (i.e., the dimension of the vectorial space in which all the system solutions can be represented) and the Elementary Modes (i.e., all the flux patterns which can be accomplished at steady state and cannot be decomposed into simpler flux distributions). Any steady-state solution can be represented as a linear combination of elements of the convex basis. In every case the balanced overall reaction and the involved enzymes are indicated.