Codon usage bias reveals genomic adaptations to environmental conditions in an acidophilic consortium

The analysis of codon usage bias has been widely used to characterize different communities of microorganisms. In this context, the aim of this work was to study the codon usage bias in a natural consortium of five acidophilic bacteria used for biomining. The codon usage bias of the consortium was contrasted with genes from an alternative collection of acidophilic reference strains and metagenome samples. Results indicate that acidophilic bacteria preferentially have low codon usage bias, consistent with both their capacity to live in a wide range of habitats and their slow growth rate, a characteristic probably acquired independently from their phylogenetic relationships. In addition, the analysis showed significant differences in the unique sets of genes from the autotrophic species of the consortium in relation to other acidophilic organisms, principally in genes which code for proteins involved in metal and oxidative stress resistance. The lower values of codon usage bias obtained in this unique set of genes suggest higher transcriptional adaptation to living in extreme conditions, which was probably acquired as a measure for resisting the elevated metal conditions present in the mine.


Introduction
A total of 61 sense codons translate into 20 different amino acids, which is known as the redundancy of the genetic code or degeneracy of codons.Codon usage bias (CUB) refers to differences in the relative frequencies of synonymous codons within a coding sequence, differences which have been correlated with functional and adaptive properties [1][2][3].The absence of CUB means that synonymous codons are used randomly without preference to code for their corresponding amino acids.A coding sequence is said to have low or weak CUB when synonymous codons are employed in a mostly random way.In contrast, high or strong CUB ensues when synonymous codons are used in a preferential manner to code for amino acids, the most extreme case being when exactly one codon is used to represent each amino acid.
The analysis of CUB has been used to characterize both specific and general properties of genes from communities of microorganisms [4].Botzman et al determined an association between the lifestyles of several prokaryotic organisms and variations in their CUB [5].Their results indicated that species living in a wide range of habitats have low CUB, which is consistent with the need to adapt to different environments.In addition, results also suggest that species may more readily adjust to metabolic variability by maintaining low CUB.
Bacteria which use a small subset of optimal codons (high CUB) also present fast growing rates [6], supporting the idea that optimization of the translation machinery is correlated with the maximization of growth rate.Complementing these studies, the analysis of 11 sequenced microbial samples showed that organisms living in the same ecological niche share a common preference for CUB, regardless of their phylogenetic diversity [7].Such evidence highlights the importance of analyzing CUB in order to characterize bacterial communities, studies not hitherto addressed in acidophilic species.
Acidophilic bacteria are characterized by their survival under low pH and high concentration of metal cations.They are some of the most studied microorganisms living in extreme environments and are widely employed for the recovery of precious metals from mineral ores.During the process of extracting metal ions from different ores or concentrates, several microbial species work in concert in order to convert insoluble metal sulfides into water-soluble metal sulfates [8,9].Currently, it is known that biomining communities of extremophile microorganisms that act in a coordinated manner are able to achieve higher levels of performance in metal extraction processes [10][11][12].
Several efforts have been made to isolate and characterize bacterial species and communities from differing extreme environmental sites [13][14][15][16][17][18].At the molecular level, most criteria have been focused on identifying and quantifying particular components of each bacterium, such as proteins involved in iron/sulfur oxidation, metal resistance and biofilm formation [19][20][21].While this strategy is able to suggest direct correlations between some of these components and a greater capacity for mineral bioleaching, only a few global-scale studies with the objective of imputing genomic advantages or common properties to such communities have been undertaken.
The first study to shed light on community gene structures in a mine environment was presented in 2004 by Tyson et al. [22].Performing a metagenomic analysis, it was determined that a microbial community inhabiting acid mine drainage combines carbon and nitrogen fixation pathways in order to survive in such an extreme environment.With the aim of investigating genomic properties of a bacterial community from an industrially bioleached mine, a metagenome analysis of a surface layer of low grade copper tailings was recently undertaken at the Dexing Copper Mine in China [23,24].The results illustrated that metal cation transport and DNA repair are highly represented processes inside the community, highlighting the presence of Acidithiobacillus and Acidiphilium species.In addition, the afore-mentioned studies provide a complete dataset of genes from acidophilic bacterial species, opening the possibility to study, characterize and classify extreme communities of microorganisms according to their CUB.
Recently, a consortium of five natural copper-bioleaching acidophilic bacteria was presented [25].The consortium is made up of the bacteria Acidithiobacillus thiooxidans Licanantay, Acidiphilium multivorum Yenapatur, Leptospirillum ferriphilum Pañiwe, Acidithiobacillus ferrooxidans Wenelen and Sulfobacillus thermosulfidooxidans Cutipay, which were directly isolated from copper mines and selected based on their high capacity to solubilize copper and resist high concentrations of metal cations.In addition, this consortium is currently employed in a fully operational biotechnology system at CODELCO, Radomiro Tomic Division (Patent Registration No. CL 48319, Antofagasta, Chile).
In order to determine if this natural consortium of extreme acidophilic bacteria exhibits any particular genomic advantages, the CUB of genes belonging to the consortium were contrasted with: i) an acidophilic biomining consortium (metagenomic data) from a surface layer of low grade copper tailings (Dexing Copper Mine, China), ii) an alternative (non-consortium) collection of reference acidophilic bacterial strains (which were independently isolated from different geographic locations around the world) and iii) a global bacterial CUB profile generated from a set of reference genes compiled from the 2014 COG database.Considering the particular niches they inhabit, the consortium, non-consortium and metagenomic data were compared with a view towards determining if discrepancies in patterns of CUB are correlated with the extreme environments they inhabit.

Orthologous genes
Each bacterium in the consortium group was paired with one of the same species in the nonconsortium group: Sb. thermosulfidooxidans Cutipay and CBAR-13; At. thiooxidans Licanantay and ATCC19377; A. multivorum Yenapatur and AIU301; L. ferriphilum Pañiwue and ML-04; and At.ferroxidans Wenelen and ATCC23270.For each pair, orthologous genes were calculated using both ORTHOMCL v1.4 [42] and Inparanoid v4.1 [43].Only gene pairs predicted as orthologous by both tools were kept.

COG category assignment
COG categories for all groups were assigned based on a protein BLAST search against the 2014 COG database with e-value and identity cutoffs of 1e -5 and 40% respectively.In the case of protein sequences from the metagenomic sample, only those with a length of at least 90% of the hit length were considered (35000 sequences in total).This set was taken as the metagenome group.

COG database to gene database
A gene sequence database based on the 2014 COG protein database was generated for use as a reference set of non-lifestyle-specific genes.GenBank accessions for proteins in the COG database were retrieved from NCBI (ftp://ftp.ncbi.nih.gov/pub/COG/COG2014/data).Using those accession codes the associated genome gbk files were downloaded from NCBI.Finally, gene sequences corresponding to COG proteins were retrieved from these files and a gene sequence database was constructed containing a total of 1,737,559 DNA sequences.

Kullback-Leibler codon information bias (CIB)
We use the Kullback-Leibler codon information bias (CIB) defined in [44] as a way of quantifying the use of synonymous codons in genes relative to the reference scenario in which each synonymous codon is used equally often to code for its corresponding amino acid (see [45], for an examination of various measures of CUB based on other principles).More explicitly, CIB is a measure of codon usage bias based on information theoretic concepts, namely entropy, which takes account of how amino acids are distributed.As such, CIB is a natural and intuitively appealing quantity for measuring the departure of a coding sequence from equal usage of synonymous codons (details in S1 Appendix).CIB is zero if and only if the codons that code for each amino acid are used equally often to represent that amino acid, that is, there is unbiased synonymous codon usage.It attains its maximum value, which is determined by the relative frequencies of all the amino acids, precisely when each amino acid is represented by exactly one codon.Small values of CIB correspond to low (less selective or weak) codon usage bias while larger values of CIB correspond to a greater concentration of the codon relative frequencies on fewer codons (stronger or more selective codon usage bias).For this study, CIB was rescaled to have a value in the range 0-1.

Data analysis and statistical tools
The value of CIB was computed for every gene annotated for all bacterial species under consideration and for every putative gene belonging to the metagenomic sample.In addition, CIB was calculated for the 1,737,559 genes in the gene database derived from the COG database.This constitutes 97.3% of the 1,785,722 genes listed in the 2014 COG database.The remaining 2.7% of genes in the COG database were excluded from this study as it was not technically possible to recover the coding sequences needed for calculating the codon relative frequencies; the computation of CIB requires both the amino acid relative frequencies and the codon relative frequencies.
Differences in the pattern of CIB were analyzed between strains of the same organism, as well as between individual organisms and the gene database generated from the COG database.This was accomplished as follows.Consider two groups of CIB values, for instance, genes of A. multivorum Yenapatur with COG category P and genes from A. multivorum AIU301 that also have COG category P. Firstly, the distributions of CIB were tested for equivalence using the two-sample Anderson-Darling test [46].The Anderson-Darling test is similar to the more familiar two-sample Kolmogorov-Smirnov test, but is generally more powerful with greater sensitivity to discrepancies in the tails of the distributions.It has null hypothesis "the two groups have the same distribution" and alternative hypothesis "the two groups have different distributions".Secondly, if a difference was detected by the Anderson-Darling test, a further test was performed to see if the values of CIB in one group stochastically dominate those in the other group.
Stochastic dominance, also known as simple stochastic ordering or strong stochastic ordering [47], means that the probability of observing a value of CIB greater than a specified threshold in one group is always greater than the probability of seeing a value greater than the same threshold in the other group.Equivalently, one group will stochastically dominate the other if graphs of their cumulative distribution functions do not cross, though they may touch.When it applies, stochastic dominance establishes a strong relationship between two statistical samples and provides a method of comparison, in which case it can be said that one sample is stochastically smaller or larger than the other.Two groups satisfying this relationship can be ranked, say, according to their mean values, without the need to consider measures of dispersion.For the analysis in this paper, a permutation test for stochastic dominance using Monte Carlo estimation to compute the p-value was implemented using version 11 of the C++ programming language in conjunction with the R statistical computing software V3.3.2 (refer to S1 Appendix).
Unless otherwise indicated, the computation of CIB and all statistical analyses were carried out using the R statistical computing software V3.3.2.The kSamples package was used for the Anderson-Darling test and the Bioconductor Biostrings package was used to process DNA sequence data.
Hierarchical clustering of CIB using average linkage was carried out by means of the TM4 MeV v4.9.0 stand-alone local client using the Pearson product correlation coefficient as the distance metric [48].

Codon usage bias in biomining organisms
Mining sites are characterized by the presence of low-pH and the prevalence of aerobic environments.These extreme conditions induce selective pressures which have an impact on indigenous organisms, for example, the principal acidophilic organisms are autotrophic, able to use ferrous iron and reduced sulfur compounds as electron donors which are released from sulfide minerals during oxidative dissolution [8].Apart from nutritional selection, it is plausible to hypothesize that genes from such organisms have also been genetically selected in order to improve the ability of the organism to survive under extreme conditions [49].
To assess putative gene sequence differences between copper-bioleaching acidophilic species and other organisms, we considered a set of five such bacteria which inhabit the same niche (the consortium group) [25] and compared these with three specifically selected groups of bacteria (Table 1).The first group (non-consortium) includes a total of 18 previously sequenced acidophilic bacteria, which were isolated from different mining sites.The second comparison group (metagenome) is made up of 35000 sequences from a copper mine metagenomic sample including 274 bacterial families.This group is that subset of the complete metagenomic sample to which a COG category could be assigned and which had a length of at least 90% of the size of the match in the COG database.The third and final comparison group (COG), was constructed from almost all genes present in the 2014 edition of the COG database [50].In order to make these comparisons, we used the codon information bias (CIB) as a measure of codon usage bias (see Materials and Methods).
First, all the genes in each bacterial group were assigned COG categories.Then, the distribution of CIB values calculated for genes in each category were compared with the distribution of CIB values computed for genes in the corresponding category in the COG database.Results indicate that the consortium and non-consortium groups of acidophilic species showed significant differences in the distribution of CIB relative to the COG database in almost every category (see Fig 1 and S1 Table), with the largest differences observed in processes related to protein and nucleotide metabolism, cell motility and inorganic ion transport (COG categories E, F, N, and P).In particular, autotrophic species shown in As indicated, this is a typical characteristic found in organisms that are able to live in a wide range of habitats and which require the ability to efficiently adapt their metabolisms to different environments [51].Also, acidophilic organisms show wide and versatile metabolic diversity, coupled with an extraordinary physiological capacity to live under extreme conditions [52].The lower CIB seen in both acidophilic groups coincides with their capacity to adjust their metabolic variability, which correlates with previous analyses of codon usage made in other communities of microorganisms [5].In addition, the acidophilic strains studied here are characterized by low growth rates [25], supporting the hypothesis that bacterial species with low codon usage experience slow growth [6].
All three groups of genes, (consortium, non-consortium and metagenome) exhibit distributions of CIB that differ from genes in the COG database (S1 Table ).It is remarkable that genes in the first two groups have CIB values that are stochastically smaller than genes in the COG database in almost all COG categories: this is much stronger than merely saying that they have smaller CIB on average.In contrast, while the metagenome group and COG database have different distributions of CIB, only COG categories D (cell division) and T (signal transduction mechanisms) from the metagenomics sample display a clear stochastic relationship to genes in the COG database (S1 Table ), despite the mean CIB for genes in the metagenome group exceeding the mean CIB of

Specific codon usage bias in the acidophilic-bacteria consortium used for biomining
In order to study whether or not a higher capacity to survive in extreme environments is correlated with a particular pattern of codon usage, a fourth group of species were selected from the non-consortium group as counterpart strains to the acidophilic bacteria consortium used for biomining.This new group was composed by: Sb. thermosulfidooxidans CBAR-13, At. thiooxidans ATCC19377, At. ferrooxidans ATCC23270, L. ferriphilum ML-04 and A. multivorum AIU301, all of which were isolated from different mining sites.
In general, the observed differences in CIB are similar in both groups (Fig 2 and S2 Table), indicating that these acidophilic organisms probably share some aspects of codon usage bias independently of the place where they were isolated or their phylogenetic relationship.This is suggestive of co-evolution of the genetic code in these species [53].
In particular, the heterotrophic A. multivorum strains showed larger values of CIB on average compared to the remaining species, which are autotrophic.This was seen mainly in COG categories related to translation, transcription, signaling and general metabolism.Unlike the other members of the consortium, A. multivorum has the specific role of degrading organic metabolites highly toxic to autotrophic organisms [54].High CIB is associated with high functional specialization and faster translational rates [55], which in this case probably improves the ability of A. multivorum to sense, metabolize and degrade organic compounds.
Unexpectedly, the low growth rate of the two A. multivorum strains [30] does not correspond to their higher CIB.However, in their extreme environmental niche (mining site), the growth of A. multivorum depends on the presence of other members of the community to produce the organic sources the bacterium consumes and to oxidize the thiosulfate compounds toxic to it [54,56].This establishes mutual dependence within the consortium which is reflected by the similar growth rates observed in these species [56].
The next step was to divide all the genes belonging to each species into two sets, those that are conserved in the two strains of the species and those that are unique to one of the strains.For each of these sets, the two-sample Anderson-Darling test was used to decide whether or not the two strains of each species had the same distribution of CIB among the genes in each COG category.
The results reveal that the conserved genes in all the pairs of biomining strains studied exhibit essentially the same distribution of CIB (S3 Table ), supporting the previous observation that the conserved genes in biomining lifestyle organisms apparently co-evolved in order to survive in extreme environments.In addition, the clustering of conserved genes in Fig 3A shows that, biomining strains from the same species fall naturally into the same clade and hence the distribution of CIB for conserved genes among the various COG categories clusters the bacteria in a way similar to phylogenetic distance [25].This indicates that the CIB of the conserved genes conforms to a phylogenetic relationship and has not been significantly affected by the specific niche they inhabit.
On the other hand, significant differences were identified within the group of unique genes (S4 Table ).The smallest values of CIB were observed in the autotrophic species in the consortium, corresponding to low codon usage bias.Clustering based on unique genes revealed a different organization (species and COG categories) compared to clustering in terms of conserved genes (Fig 3B).Interestingly, unlike the situation observed for the conserved group of genes, consortium species At. thiooxidans Licanantay and L. ferriphilum Pañiwe (both of which stand out due to their high copper-bioleaching performance) [25], cluster together in the same clade.This supports the idea that the CIB of unique genes is most likely affected by the common niche (copper mine site) rather than any phylogenetic relationship.Note that although the two strains of Sb. thermosulfidooxidans share the same clade, both of these strains were isolated from Chilean copper deposits, which is also consistent with unique genes causing these species to be clustered according to geographic/environmental effects.
As mentioned, low CIB may suggest better environmental adaptation as a product of higher metabolic variability, which probably (directly or indirectly) affects the adaptation of the consortium to the extreme environmental conditions it inhabits.In this context, mining sites are also characterized by a high concentration of metal cations.The COG groups directly related to the adaptation to these conditions [25], such as categories L, P and V, which involve metal resistance, and iron and sulfur oxidation, presented the greatest differences between species in the consortium and their non-consortium counterparts.
Within this group of unique genes (Table 2), the following genes presented some of the lowest CIB values recorded: the protein complex TonB, which participates in iron acquisition through siderophore mechanisms [57]; the cation efflux systems, phosphate transporter and Cu-ATPase, principal proteins related to copper resistance [58,59], and the enzymes RecN and MutS which are involved in double-strand break and mismatch DNA repair respectively [60,61].
One of the principal characteristics of the acidophilic consortium used for bioleaching in comparison to other acidophilic organisms, is the ability of its members to resist elevated concentrations of heavy metal cations [25].Most of the consortium species resist at least twice the external copper concentration in relation to their counterpart biomining species (S5 Table ).Next, a collection of genes previously classified in copper resistance and oxidative stress protection were selected from the acidophilic consortium [25], together with homologs from their non-consortium counterparts.Inside this group, copper efflux proteins CopA and Cus, antioxidant defense enzymes thioredoxin reductase (NadpH), superoxide dismutase (sodA) and peroxidereoxin (bcp) components exhibit lower CIB values in relation to the homologous genes of their acidophilic counterpart species (Table 3).The species belonging to the consortium not only contain a higher number of these components [25], but these genes also present with a low value of CIB, suggesting higher transcriptional adaptation to living in a wide range of habitats, which was probably acquired as a measure for resisting the extreme, elevated copper conditions present in the mine.

Conclusions
In general, acidophilic organisms present similar patterns of CIB independently of the place where they were isolated or their phylogenetic relationships and this is mainly characterized by low CIB in autotrophic species.However, the particular copper mining environment influences the CIB in unique genes and genes for copper-resistance, probably conferring the acidophilic consortium with a greater capacity to resist high concentrations of metal cations.Finally, studies of CIB in acidophilic organisms provide an alternative application for identifying and characterizing new strains with higher capacities for bioleaching metal ores.

Fig 2 (
At. thiooxidans, At. ferrooxidans, L. ferriphilum and Sb.thermosulfidooxidans) exhibited smaller values of CIB on average compared to the COG database, independently of gene length.

Fig 1 .
Fig 1.Average value of CIB for genes belonging to the consortium biomining species and the selected comparison groups under study.Each value is the average CIB calculated over all the species from each independent group classified according to COG category.The asterisks mark the four COG categories for which the greatest difference was observed between the mean CIB for the consortium and the mean CIB for the 2014 COG database.https://doi.org/10.1371/journal.pone.0195869.g001

Fig 2 .
Fig 2. Average value of CIB for genes belonging to ten strains of bacteria (consortium and non-consortium) and the COG database binned by gene length in bases.Each bin contains genes from ℓ-499 to ℓ bases in length, where ℓ can be read off the x-axis.The y-axis indicates the mean value of CIB for genes belonging to the bin indicated on the x-axis.Bacterial strains belonging to the same species are plotted using the same point shape.Strains belonging to the biomining consortium are distinguished by filled points linked by solid lines while non-consortium strains are hollow points linked by dotted lines.The average CIB values for the 2014 COG database are plotted as crosses linked by a red solid line.https://doi.org/10.1371/journal.pone.0195869.g002