Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Codon usage bias reveals genomic adaptations to environmental conditions in an acidophilic consortium

  • Andrew Hart,

    Roles Investigation, Methodology, Writing – original draft

    Affiliation UMI 2071 CNRS-UCHILE, Facultad de Ciencias Físicas y Matemáticas, Centro de Modelamiento Matemático, Universidad de Chile, Casilla 170, Correo 3, Santiago, Chile

  • María Paz Cortés,

    Roles Formal analysis, Investigation, Software

    Affiliations Mathomics, Centro de Modelamiento Matemático, Universidad de Chile, Santiago, Chile, Fondap-Center of Genome Regulation, Facultad de Ciencias, Universidad de Chile, Santiago, Chile

  • Mauricio Latorre ,

    Roles Data curation, Formal analysis, Funding acquisition, Writing – original draft (ML); (SM)

    Affiliations Mathomics, Centro de Modelamiento Matemático, Universidad de Chile, Santiago, Chile, Fondap-Center of Genome Regulation, Facultad de Ciencias, Universidad de Chile, Santiago, Chile, Laboratorio de Bioinformática y Expresión Génica, INTA, Universidad de Chile, Macul, Santiago, Chile, Universidad de O'Higgins, Instituto de Ciencias de la Ingeniería, Rancagua, Chile

  • Servet Martinez

    Roles Investigation, Writing – original draft (ML); (SM)

    Affiliation Departamento de Ingeniería Matemática, UMI 2071 CNRS-UCHILE, Facultad de Ciencias Físicas y Matemáticas, Centro de Modelamiento Matemático, Universidad de Chile, Casilla 170, Correo 3, Santiago, Chile

Codon usage bias reveals genomic adaptations to environmental conditions in an acidophilic consortium

  • Andrew Hart, 
  • María Paz Cortés, 
  • Mauricio Latorre, 
  • Servet Martinez


The analysis of codon usage bias has been widely used to characterize different communities of microorganisms. In this context, the aim of this work was to study the codon usage bias in a natural consortium of five acidophilic bacteria used for biomining. The codon usage bias of the consortium was contrasted with genes from an alternative collection of acidophilic reference strains and metagenome samples. Results indicate that acidophilic bacteria preferentially have low codon usage bias, consistent with both their capacity to live in a wide range of habitats and their slow growth rate, a characteristic probably acquired independently from their phylogenetic relationships. In addition, the analysis showed significant differences in the unique sets of genes from the autotrophic species of the consortium in relation to other acidophilic organisms, principally in genes which code for proteins involved in metal and oxidative stress resistance. The lower values of codon usage bias obtained in this unique set of genes suggest higher transcriptional adaptation to living in extreme conditions, which was probably acquired as a measure for resisting the elevated metal conditions present in the mine.


A total of 61 sense codons translate into 20 different amino acids, which is known as the redundancy of the genetic code or degeneracy of codons. Codon usage bias (CUB) refers to differences in the relative frequencies of synonymous codons within a coding sequence, differences which have been correlated with functional and adaptive properties [13]. The absence of CUB means that synonymous codons are used randomly without preference to code for their corresponding amino acids. A coding sequence is said to have low or weak CUB when synonymous codons are employed in a mostly random way. In contrast, high or strong CUB ensues when synonymous codons are used in a preferential manner to code for amino acids, the most extreme case being when exactly one codon is used to represent each amino acid.

The analysis of CUB has been used to characterize both specific and general properties of genes from communities of microorganisms [4]. Botzman et al determined an association between the lifestyles of several prokaryotic organisms and variations in their CUB [5]. Their results indicated that species living in a wide range of habitats have low CUB, which is consistent with the need to adapt to different environments. In addition, results also suggest that species may more readily adjust to metabolic variability by maintaining low CUB.

Bacteria which use a small subset of optimal codons (high CUB) also present fast growing rates [6], supporting the idea that optimization of the translation machinery is correlated with the maximization of growth rate. Complementing these studies, the analysis of 11 sequenced microbial samples showed that organisms living in the same ecological niche share a common preference for CUB, regardless of their phylogenetic diversity [7]. Such evidence highlights the importance of analyzing CUB in order to characterize bacterial communities, studies not hitherto addressed in acidophilic species.

Acidophilic bacteria are characterized by their survival under low pH and high concentration of metal cations. They are some of the most studied microorganisms living in extreme environments and are widely employed for the recovery of precious metals from mineral ores. During the process of extracting metal ions from different ores or concentrates, several microbial species work in concert in order to convert insoluble metal sulfides into water-soluble metal sulfates [8,9]. Currently, it is known that biomining communities of extremophile microorganisms that act in a coordinated manner are able to achieve higher levels of performance in metal extraction processes [1012].

Several efforts have been made to isolate and characterize bacterial species and communities from differing extreme environmental sites [1318]. At the molecular level, most criteria have been focused on identifying and quantifying particular components of each bacterium, such as proteins involved in iron/sulfur oxidation, metal resistance and biofilm formation [1921]. While this strategy is able to suggest direct correlations between some of these components and a greater capacity for mineral bioleaching, only a few global-scale studies with the objective of imputing genomic advantages or common properties to such communities have been undertaken.

The first study to shed light on community gene structures in a mine environment was presented in 2004 by Tyson et al. [22]. Performing a metagenomic analysis, it was determined that a microbial community inhabiting acid mine drainage combines carbon and nitrogen fixation pathways in order to survive in such an extreme environment. With the aim of investigating genomic properties of a bacterial community from an industrially bioleached mine, a metagenome analysis of a surface layer of low grade copper tailings was recently undertaken at the Dexing Copper Mine in China [23,24]. The results illustrated that metal cation transport and DNA repair are highly represented processes inside the community, highlighting the presence of Acidithiobacillus and Acidiphilium species. In addition, the afore-mentioned studies provide a complete dataset of genes from acidophilic bacterial species, opening the possibility to study, characterize and classify extreme communities of microorganisms according to their CUB.

Recently, a consortium of five natural copper-bioleaching acidophilic bacteria was presented [25]. The consortium is made up of the bacteria Acidithiobacillus thiooxidans Licanantay, Acidiphilium multivorum Yenapatur, Leptospirillum ferriphilum Pañiwe, Acidithiobacillus ferrooxidans Wenelen and Sulfobacillus thermosulfidooxidans Cutipay, which were directly isolated from copper mines and selected based on their high capacity to solubilize copper and resist high concentrations of metal cations. In addition, this consortium is currently employed in a fully operational biotechnology system at CODELCO, Radomiro Tomic Division (Patent Registration No. CL 48319, Antofagasta, Chile).

In order to determine if this natural consortium of extreme acidophilic bacteria exhibits any particular genomic advantages, the CUB of genes belonging to the consortium were contrasted with: i) an acidophilic biomining consortium (metagenomic data) from a surface layer of low grade copper tailings (Dexing Copper Mine, China), ii) an alternative (non-consortium) collection of reference acidophilic bacterial strains (which were independently isolated from different geographic locations around the world) and iii) a global bacterial CUB profile generated from a set of reference genes compiled from the 2014 COG database. Considering the particular niches they inhabit, the consortium, non-consortium and metagenomic data were compared with a view towards determining if discrepancies in patterns of CUB are correlated with the extreme environments they inhabit.

Materials and methods

Genome sequences

The gene and protein sequences for Sulfobacillus thermosulfidooxidans strain Cutipay, Acidithiobacillus thiooxidans strain Licanantay, Acidiphilum multivorum strain Yenapatur, Leptospirillum ferriphilum strain Pañiwue and Acidithiobacillus ferrooxidans strain Wenelen, which constitute the bacterial species belonging to a Chilean biomining consortium, were previously described [2527]. Their sequences are available at Sequences for the group of non-consortium species were downloaded from the NCBI. This group consists of Acidimicrobium ferrooxidans DSM 10331, Acidiphilium cryptum JF-5, Acidiphilium multivorum AIU301, Acidiphilium sp. PM, Acidithiobacillus caldus ATCC 51756, At. caldus SM-1, At. ferrivorans SS3, At. ferrooxidans ATCC 23270, At. ferrooxidans ATCC 53993, At. thiooxidans ATCC19377, Desulfosporosinus acidiphilus DSM 22704, Leptospirillum ferriphilum ML-04, L. ferrooxidans C2-3, Sulfobacillus acidophilus TPY, Sb. thermosulfidooxidans DSM 9293, Sb. thermosulfidooxidans CBAR-13, Thiomonas intermedia K12 and Thiomonas sp 3As (Accessions NC_013124.1; NC_009484.1, NC_009467.1-NC_009474.1; NC_015186.1; NZ_AFPR01000001.1-NZ_AFPR01000627.1; NZ_CP005986.1-NZ_CP005989.1; NC_015850.1-NC_015854.1; NC_015942.1; NC_011761.1; NC_011206.1; NZ_AFOH00000000.1; NC_018066.1-NC_018068.1; NC_018649.1; NC_017094.1; NC_015757.1; FWWY01000001.1-FWWY01000002.1; NZ_LGRO00000000.1; NC_014153.1-NC_014155.1 and NC_014144.1-NC_014145.1 respectively)[2840]. Metagenomics data from a bioleaching heap sample presented in Zhang, X. et al.[23] were downloaded from the MG-RAST repository (Accession 4664533.3)[41]. Putative gene sequences in this sample and their taxonomic categories were also obtained from this repository (328571 sequences in total with taxonomic assignments made through similarity searches against RefSeq proteins with cutoffs: 15bp alignment length; e-5 e-value; 60% identity).

Orthologous genes

Each bacterium in the consortium group was paired with one of the same species in the non-consortium group: Sb. thermosulfidooxidans Cutipay and CBAR-13; At. thiooxidans Licanantay and ATCC19377; A. multivorum Yenapatur and AIU301; L. ferriphilum Pañiwue and ML-04; and At. ferroxidans Wenelen and ATCC23270. For each pair, orthologous genes were calculated using both ORTHOMCL v1.4 [42] and Inparanoid v4.1 [43]. Only gene pairs predicted as orthologous by both tools were kept.

COG category assignment

COG categories for all groups were assigned based on a protein BLAST search against the 2014 COG database with e-value and identity cutoffs of 1e-5 and 40% respectively. In the case of protein sequences from the metagenomic sample, only those with a length of at least 90% of the hit length were considered (35000 sequences in total). This set was taken as the metagenome group.

COG database to gene database

A gene sequence database based on the 2014 COG protein database was generated for use as a reference set of non-lifestyle-specific genes. GenBank accessions for proteins in the COG database were retrieved from NCBI ( Using those accession codes the associated genome gbk files were downloaded from NCBI. Finally, gene sequences corresponding to COG proteins were retrieved from these files and a gene sequence database was constructed containing a total of 1,737,559 DNA sequences.

Kullback–Leibler codon information bias (CIB)

We use the Kullback-Leibler codon information bias (CIB) defined in [44] as a way of quantifying the use of synonymous codons in genes relative to the reference scenario in which each synonymous codon is used equally often to code for its corresponding amino acid (see [45], for an examination of various measures of CUB based on other principles). More explicitly, CIB is a measure of codon usage bias based on information theoretic concepts, namely entropy, which takes account of how amino acids are distributed. As such, CIB is a natural and intuitively appealing quantity for measuring the departure of a coding sequence from equal usage of synonymous codons (details in S1 Appendix). CIB is zero if and only if the codons that code for each amino acid are used equally often to represent that amino acid, that is, there is unbiased synonymous codon usage. It attains its maximum value, which is determined by the relative frequencies of all the amino acids, precisely when each amino acid is represented by exactly one codon. Small values of CIB correspond to low (less selective or weak) codon usage bias while larger values of CIB correspond to a greater concentration of the codon relative frequencies on fewer codons (stronger or more selective codon usage bias). For this study, CIB was rescaled to have a value in the range 0–1.

Data analysis and statistical tools

The value of CIB was computed for every gene annotated for all bacterial species under consideration and for every putative gene belonging to the metagenomic sample. In addition, CIB was calculated for the 1,737,559 genes in the gene database derived from the COG database. This constitutes 97.3% of the 1,785,722 genes listed in the 2014 COG database. The remaining 2.7% of genes in the COG database were excluded from this study as it was not technically possible to recover the coding sequences needed for calculating the codon relative frequencies; the computation of CIB requires both the amino acid relative frequencies and the codon relative frequencies.

Differences in the pattern of CIB were analyzed between strains of the same organism, as well as between individual organisms and the gene database generated from the COG database. This was accomplished as follows. Consider two groups of CIB values, for instance, genes of A. multivorum Yenapatur with COG category P and genes from A. multivorum AIU301 that also have COG category P. Firstly, the distributions of CIB were tested for equivalence using the two-sample Anderson-Darling test [46]. The Anderson- Darling test is similar to the more familiar two- sample Kolmogorov-Smirnov test, but is generally more powerful with greater sensitivity to discrepancies in the tails of the distributions. It has null hypothesis “the two groups have the same distribution” and alternative hypothesis “the two groups have different distributions”. Secondly, if a difference was detected by the Anderson-Darling test, a further test was performed to see if the values of CIB in one group stochastically dominate those in the other group.

Stochastic dominance, also known as simple stochastic ordering or strong stochastic ordering [47], means that the probability of observing a value of CIB greater than a specified threshold in one group is always greater than the probability of seeing a value greater than the same threshold in the other group. Equivalently, one group will stochastically dominate the other if graphs of their cumulative distribution functions do not cross, though they may touch. When it applies, stochastic dominance establishes a strong relationship between two statistical samples and provides a method of comparison, in which case it can be said that one sample is stochastically smaller or larger than the other. Two groups satisfying this relationship can be ranked, say, according to their mean values, without the need to consider measures of dispersion. For the analysis in this paper, a permutation test for stochastic dominance using Monte Carlo estimation to compute the p-value was implemented using version 11 of the C++ programming language in conjunction with the R statistical computing software V3.3.2 (refer to S1 Appendix).

Unless otherwise indicated, the computation of CIB and all statistical analyses were carried out using the R statistical computing software V3.3.2. The kSamples package was used for the Anderson-Darling test and the Bioconductor Biostrings package was used to process DNA sequence data.

Hierarchical clustering of CIB using average linkage was carried out by means of the TM4 MeV v4.9.0 stand-alone local client using the Pearson product correlation coefficient as the distance metric [48].

Results and discussion

Codon usage bias in biomining organisms

Mining sites are characterized by the presence of low-pH and the prevalence of aerobic environments. These extreme conditions induce selective pressures which have an impact on indigenous organisms, for example, the principal acidophilic organisms are autotrophic, able to use ferrous iron and reduced sulfur compounds as electron donors which are released from sulfide minerals during oxidative dissolution [8]. Apart from nutritional selection, it is plausible to hypothesize that genes from such organisms have also been genetically selected in order to improve the ability of the organism to survive under extreme conditions [49].

To assess putative gene sequence differences between copper-bioleaching acidophilic species and other organisms, we considered a set of five such bacteria which inhabit the same niche (the consortium group) [25] and compared these with three specifically selected groups of bacteria (Table 1). The first group (non-consortium) includes a total of 18 previously sequenced acidophilic bacteria, which were isolated from different mining sites. The second comparison group (metagenome) is made up of 35000 sequences from a copper mine metagenomic sample including 274 bacterial families. This group is that subset of the complete metagenomic sample to which a COG category could be assigned and which had a length of at least 90% of the size of the match in the COG database. The third and final comparison group (COG), was constructed from almost all genes present in the 2014 edition of the COG database [50]. In order to make these comparisons, we used the codon information bias (CIB) as a measure of codon usage bias (see Materials and Methods).

First, all the genes in each bacterial group were assigned COG categories. Then, the distribution of CIB values calculated for genes in each category were compared with the distribution of CIB values computed for genes in the corresponding category in the COG database. Results indicate that the consortium and non-consortium groups of acidophilic species showed significant differences in the distribution of CIB relative to the COG database in almost every category (see Fig 1 and S1 Table), with the largest differences observed in processes related to protein and nucleotide metabolism, cell motility and inorganic ion transport (COG categories E, F, N, and P). In particular, autotrophic species shown in Fig 2 (At. thiooxidans, At. ferrooxidans, L. ferriphilum and Sb. thermosulfidooxidans) exhibited smaller values of CIB on average compared to the COG database, independently of gene length.

Fig 1. Average value of CIB for genes belonging to the consortium biomining species and the selected comparison groups under study.

Each value is the average CIB calculated over all the species from each independent group classified according to COG category. The asterisks mark the four COG categories for which the greatest difference was observed between the mean CIB for the consortium and the mean CIB for the 2014 COG database.

Fig 2. Average value of CIB for genes belonging to ten strains of bacteria (consortium and non-consortium) and the COG database binned by gene length in bases.

Each bin contains genes from -499 to bases in length, where can be read off the x-axis. The y-axis indicates the mean value of CIB for genes belonging to the bin indicated on the x-axis. Bacterial strains belonging to the same species are plotted using the same point shape. Strains belonging to the biomining consortium are distinguished by filled points linked by solid lines while non-consortium strains are hollow points linked by dotted lines. The average CIB values for the 2014 COG database are plotted as crosses linked by a red solid line.

As indicated, this is a typical characteristic found in organisms that are able to live in a wide range of habitats and which require the ability to efficiently adapt their metabolisms to different environments [51]. Also, acidophilic organisms show wide and versatile metabolic diversity, coupled with an extraordinary physiological capacity to live under extreme conditions [52]. The lower CIB seen in both acidophilic groups coincides with their capacity to adjust their metabolic variability, which correlates with previous analyses of codon usage made in other communities of microorganisms [5]. In addition, the acidophilic strains studied here are characterized by low growth rates [25], supporting the hypothesis that bacterial species with low codon usage experience slow growth [6].

All three groups of genes, (consortium, non-consortium and metagenome) exhibit distributions of CIB that differ from genes in the COG database (S1 Table). It is remarkable that genes in the first two groups have CIB values that are stochastically smaller than genes in the COG database in almost all COG categories: this is much stronger than merely saying that they have smaller CIB on average.

In contrast, while the metagenome group and COG database have different distributions of CIB, only COG categories D (cell division) and T (signal transduction mechanisms) from the metagenomics sample display a clear stochastic relationship to genes in the COG database (S1 Table), despite the mean CIB for genes in the metagenome group exceeding the mean CIB of COG database genes (Fig 1). Considering that this metagenomics sample covers a total of 274 bacterial families, it is plausible to argue that the high diversity of microorganisms in this community covers an extensive variety of patterns of CIB, which is reflected in the statistical results obtained.

Specific codon usage bias in the acidophilic-bacteria consortium used for biomining

In order to study whether or not a higher capacity to survive in extreme environments is correlated with a particular pattern of codon usage, a fourth group of species were selected from the non-consortium group as counterpart strains to the acidophilic bacteria consortium used for biomining. This new group was composed by: Sb. thermosulfidooxidans CBAR-13, At. thiooxidans ATCC19377, At. ferrooxidans ATCC23270, L. ferriphilum ML-04 and A. multivorum AIU301, all of which were isolated from different mining sites.

In general, the observed differences in CIB are similar in both groups (Fig 2 and S2 Table), indicating that these acidophilic organisms probably share some aspects of codon usage bias independently of the place where they were isolated or their phylogenetic relationship. This is suggestive of co-evolution of the genetic code in these species [53].

In particular, the heterotrophic A. multivorum strains showed larger values of CIB on average compared to the remaining species, which are autotrophic. This was seen mainly in COG categories related to translation, transcription, signaling and general metabolism. Unlike the other members of the consortium, A. multivorum has the specific role of degrading organic metabolites highly toxic to autotrophic organisms [54]. High CIB is associated with high functional specialization and faster translational rates [55], which in this case probably improves the ability of A. multivorum to sense, metabolize and degrade organic compounds.

Unexpectedly, the low growth rate of the two A. multivorum strains [30] does not correspond to their higher CIB. However, in their extreme environmental niche (mining site), the growth of A. multivorum depends on the presence of other members of the community to produce the organic sources the bacterium consumes and to oxidize the thiosulfate compounds toxic to it [54,56]. This establishes mutual dependence within the consortium which is reflected by the similar growth rates observed in these species [56].

The next step was to divide all the genes belonging to each species into two sets, those that are conserved in the two strains of the species and those that are unique to one of the strains. For each of these sets, the two-sample Anderson-Darling test was used to decide whether or not the two strains of each species had the same distribution of CIB among the genes in each COG category.

The results reveal that the conserved genes in all the pairs of biomining strains studied exhibit essentially the same distribution of CIB (S3 Table), supporting the previous observation that the conserved genes in biomining lifestyle organisms apparently co-evolved in order to survive in extreme environments. In addition, the clustering of conserved genes in Fig 3A shows that, biomining strains from the same species fall naturally into the same clade and hence the distribution of CIB for conserved genes among the various COG categories clusters the bacteria in a way similar to phylogenetic distance [25]. This indicates that the CIB of the conserved genes conforms to a phylogenetic relationship and has not been significantly affected by the specific niche they inhabit.

Fig 3. Hierarchical clustering of 10 biomining bacteria according to CIB values grouped by COG category.

A. Conserved genes, B. Unique genes. For each strain, the mean value of CIB was calculated for genes (both conserved and unique) within each COG category. The strains were then hierarchically clustered using average linkage with the Pearson product correlation coefficient measuring the distance between the vectors of mean CIB per COG category. The color bar ranges from green (low CIB, 0.0) to red (high CIB, 0.5).

On the other hand, significant differences were identified within the group of unique genes (S4 Table). The smallest values of CIB were observed in the autotrophic species in the consortium, corresponding to low codon usage bias. Clustering based on unique genes revealed a different organization (species and COG categories) compared to clustering in terms of conserved genes (Fig 3B). Interestingly, unlike the situation observed for the conserved group of genes, consortium species At. thiooxidans Licanantay and L. ferriphilum Pañiwe (both of which stand out due to their high copper-bioleaching performance) [25], cluster together in the same clade. This supports the idea that the CIB of unique genes is most likely affected by the common niche (copper mine site) rather than any phylogenetic relationship. Note that although the two strains of Sb. thermosulfidooxidans share the same clade, both of these strains were isolated from Chilean copper deposits, which is also consistent with unique genes causing these species to be clustered according to geographic/environmental effects.

As mentioned, low CIB may suggest better environmental adaptation as a product of higher metabolic variability, which probably (directly or indirectly) affects the adaptation of the consortium to the extreme environmental conditions it inhabits. In this context, mining sites are also characterized by a high concentration of metal cations. The COG groups directly related to the adaptation to these conditions [25], such as categories L, P and V, which involve metal resistance, and iron and sulfur oxidation, presented the greatest differences between species in the consortium and their non-consortium counterparts.

Within this group of unique genes (Table 2), the following genes presented some of the lowest CIB values recorded: the protein complex TonB, which participates in iron acquisition through siderophore mechanisms [57]; the cation efflux systems, phosphate transporter and Cu-ATPase, principal proteins related to copper resistance [58,59], and the enzymes RecN and MutS which are involved in double-strand break and mismatch DNA repair respectively [60,61].

Table 2. Unique genes from the consortium involved in copper bioleaching with the smallest CIB values.

One of the principal characteristics of the acidophilic consortium used for bioleaching in comparison to other acidophilic organisms, is the ability of its members to resist elevated concentrations of heavy metal cations [25]. Most of the consortium species resist at least twice the external copper concentration in relation to their counterpart biomining species (S5 Table). Next, a collection of genes previously classified in copper resistance and oxidative stress protection were selected from the acidophilic consortium [25], together with homologs from their non-consortium counterparts. Inside this group, copper efflux proteins CopA and Cus, antioxidant defense enzymes thioredoxin reductase (NadpH), superoxide dismutase (sodA) and peroxidereoxin (bcp) components exhibit lower CIB values in relation to the homologous genes of their acidophilic counterpart species (Table 3). The species belonging to the consortium not only contain a higher number of these components [25], but these genes also present with a low value of CIB, suggesting higher transcriptional adaptation to living in a wide range of habitats, which was probably acquired as a measure for resisting the extreme, elevated copper conditions present in the mine.

Table 3. List of genes from the consortium with lower CIB values compared to its non-consortium counterpart.


In general, acidophilic organisms present similar patterns of CIB independently of the place where they were isolated or their phylogenetic relationships and this is mainly characterized by low CIB in autotrophic species. However, the particular copper mining environment influences the CIB in unique genes and genes for copper-resistance, probably conferring the acidophilic consortium with a greater capacity to resist high concentrations of metal cations. Finally, studies of CIB in acidophilic organisms provide an alternative application for identifying and characterizing new strains with higher capacities for bioleaching metal ores.

Supporting information

S1 Table. Summary of statistical relationships between the CIB distribution in each group under study and genes from the COG2014 database for each COG category.

Each table entry displays the p-value from the permutation test for stochastic dominance between genes from the indicated group and COG category and genes assigned the same COG category in the COG database.


S2 Table. Summary of statistically significant differences in CIB distribution between biomining strains (the consortium species and their non-consortium counterparts) and genes from the COG2014 database for each COG category.

Each table entry displays the difference in the mean value of CIB for genes of the indicated strain and COG category, and the mean value of CIB for genes assigned the same COG category in the COG database.


S3 Table. Statistically significant differences in CIB distribution of conserved genes between the consortium strain and its counterpart for each species and COG category.

Each table entry displays the p-value for the two-sample Anderson-Darling test applied to genes conserved in the consortium and non-consortium strains for the indicated species and COG category.


S4 Table. Statistically significant differences in CIB distribution of unique genes between the consortium strain and its non-consortium counterpart for each species and COG category.

Each table entry displays the p-value for the two-sample Anderson-Darling test applied to unique genes in the consortium and non-consortium strains for the indicated species and COG category.


S5 Table. Minimum inhibitory concentration (MIC) of copper between each consortium strain and its non-consortium counterpart.

*data from [25].


S1 Appendix. Kullback–Leibler codon information bias (CIB) and stochastic dominance.



All the authors also acknowledge the support of the National Laboratory of High Performance Computing (NLHPC) at the CMM (PIA ECM-02.-CONICYT).


  1. 1. Ikemura T (1985) Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol 2: 13–34. pmid:3916708
  2. 2. Wang H, Liu S, Zhang B, Wei W (2016) Analysis of synonymous codon usage bias of Zika virus and its adaption to the hosts. PLoS One 11: e0166260. pmid:27893824
  3. 3. Shen X, Huang T, Wang G, Li G (2015) How the sequence of a gene specifies structural symmetry in proteins. PLoS One 10: e0144473. pmid:26641668
  4. 4. Ran W, Higgs PG (2012) Contributions of speed and accuracy to translational selection in bacteria. PLoS One 7: e51652. pmid:23272132
  5. 5. Botzman M, Margalit H (2011) Variation in global codon usage bias among prokaryotic organisms is associated with their lifestyles. Genome Biol 12: R109. pmid:22032172
  6. 6. Rocha EP (2004) Codon usage bias from tRNA's point of view: redundancy, specialization, and efficient decoding for translation optimization. Genome Res 14: 2279–2286. pmid:15479947
  7. 7. Roller M, Lucic V, Nagy I, Perica T, Vlahovicek K (2013) Environmental shaping of codon usage and functional adaptation across microbial communities. Nucleic Acids Res 41: 8842–8852. pmid:23921637
  8. 8. Rawlings DE (2005) Characteristics and adaptability of iron- and sulfur-oxidizing microorganisms used for the recovery of metals from minerals and their concentrates. Microb Cell Fact 4: 13. pmid:15877814
  9. 9. Panda S, Akcil A, Pradhan N, Deveci H (2015) Current scenario of chalcopyrite bioleaching: a review on the recent advances to its heap-leach technology. Bioresour Technol 196: 694–706. pmid:26318845
  10. 10. Li S, Zhong H, Hu Y, Zhao J, He Z, Gu G (2014) Bioleaching of a low-grade nickel-copper sulfide by mixture of four thermophiles. Bioresour Technol 153: 300–306. pmid:24374030
  11. 11. Yang H, Feng S, Xin Y, Wang W (2014) Community dynamics of attached and free cells and the effects of attached cells on chalcopyrite bioleaching by Acidithiobacillus sp. Bioresour Technol 154: 185–191. pmid:24389460
  12. 12. Goebel BM, Stackebrandt E (1994) Cultural and phylogenetic analysis of mixed microbial populations found in natural and commercial bioleaching environments. Appl Environ Microbiol 60: 1614–1621. pmid:7517131
  13. 13. Hodar C, Moreno P, di Genova A, Latorre M, Reyes-Jara A, Maass A, et al. (2012) Genome wide identification of Acidithiobacillus ferrooxidans (ATCC 23270) transcription factors and comparative analysis of ArsR and MerR metal regulators. Biometals 25: 75–93. pmid:21830017
  14. 14. Latorre M, Ehrenfeld N, Cortes MP, Travisany D, Budinich M, Aravena A, et al. (2016) Global transcriptional responses of Acidithiobacillus ferrooxidans Wenelen under different sulfide minerals. Bioresour Technol 200: 29–34. pmid:26476161
  15. 15. Liu Y, Yin H, Zeng W, Liang Y, Baba N, Qiu G, et al. (2011) The effect of the introduction of exogenous strain Acidithiobacillus thiooxidans A01 on functional gene expression, structure and function of indigenous consortium during pyrite bioleaching. Bioresour Technol 102: 8092–8098. pmid:21705214
  16. 16. Acuna LG, Cardenas JP, Covarrubias PC, Haristoy JJ, Flores R, Nuñez H, et al. (2013) Architecture and gene repertoire of the flexible genome of the extreme acidophile Acidithiobacillus caldus. PLoS One 8: e78237. pmid:24250794
  17. 17. Peng T, Ma L, Feng X, Tao J, Nan M, Liu Y, et al. (2017) Genomic and transcriptomic analyses reveal adaptation mechanisms of an Acidithiobacillus ferrivorans strain YL15 to alpine acid mine drainage. PLoS One 12: e0178008. pmid:28542527
  18. 18. Chen L, Ren Y, Lin J, Liu X, Pang X, Lin J (2012) Acidithiobacillus caldus sulfur oxidation model based on transcriptome analysis between the wild type and sulfur oxygenase reductase defective mutant. PLoS One 7: e39470. pmid:22984393
  19. 19. Farah C, Vera M, Morin D, Haras D, Jerez CA, Guiliani N (2005) Evidence for a functional quorum-sensing type AI-1 system in the extremophilic bacterium Acidithiobacillus ferrooxidans. Appl Environ Microbiol 71: 7033–7040. pmid:16269739
  20. 20. Ramirez P, Guiliani N, Valenzuela L, Beard S, Jerez CA (2004) Differential protein expression during growth of Acidithiobacillus ferrooxidans on ferrous iron, sulfur compounds, or metal sulfides. Appl Environ Microbiol 70: 4491–4498. pmid:15294777
  21. 21. Frankel ML, Demeter MA, Lemire JA, Turner RJ (2016) Evaluating the metal tolerance capacity of microbial communities isolated from Alberta oil sands process water. PLoS One 11: e0148682. pmid:26849649
  22. 22. Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, et al. (2004) Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428: 37–43. pmid:14961025
  23. 23. Zhang X, Niu J, Liang Y, Liu X, Yin H (2016) Metagenome-scale analysis yields insights into the structure and function of microbial communities in a copper bioleaching heap. BMC Genet 17: 21. pmid:26781463
  24. 24. Hu Q, Guo X, Liang Y, Hao X, Ma L, Yin H, et al. (2015) Comparative metagenomics reveals microbial community differentiation in a biological heap leaching system. Res Microbiol 166: 525–534. pmid:26117598
  25. 25. Latorre M, Cortes MP, Travisany D, Di Genova A, Budinich M, Reyes-Jara A, et al. (2016) The bioleaching potential of a bacterial consortium. Bioresour Technol 218: 659–666. pmid:27416516
  26. 26. Travisany D, Cortes MP, Latorre M, Di Genova A, Budinich M, Bobadilla-Fazzini RA, et al. (2014) A new genome of Acidithiobacillus thiooxidans provides insights into adaptation to a bioleaching environment. Res Microbiol 165: 743–752. pmid:25148779
  27. 27. Travisany D, Di Genova A, Sepulveda A, Bobadilla-Fazzini RA, Parada P, Maass A (2012) Draft genome sequence of the Sulfobacillus thermosulfidooxidans Cutipay strain, an indigenous bacterium isolated from a naturally extreme mining environment in Northern Chile. J Bacteriol 194: 6327–6328. pmid:23105067
  28. 28. Mi S, Song J, Lin J, Che Y, Zheng H, Lin J (2011) Complete genome of Leptospirillum ferriphilum ML-04 provides insight into its physiology and environmental adaptation. J Microbiol 49: 890–901. pmid:22203551
  29. 29. Kelly DP, Wood AP (2000) Reclassification of some species of Thiobacillus to the newly designated genera Acidithiobacillus gen. nov., Halothiobacillus gen. nov. and Thermithiobacillus gen. nov. Int J Syst Evol Microbiol 50 Pt 2: 511–516.
  30. 30. Wakao N, Nagasawa N, Matsuura T, Matsukura H, Matsumoto T, Hiraishi A, et al. (1994) Acidiphilium multivorum sp. nov., an acidophilic chemoorganotrophic bacterium from pyritic acid mine drainage. Journal of General and Applied Microbiology 40: 143–159.
  31. 31. Colmer AR, Temple KL, Hinkle ME (1950) An iron-oxidizing bacterium from the acid drainage of some bituminous coal mines. J Bacteriol 59: 317–328. pmid:15436401
  32. 32. Magnuson TS, Swenson MW, Paszczynski AJ, Deobald LA, Kerk D, Cummings DE (2010) Proteogenomic and functional analysis of chromate reduction in Acidiphilium cryptum JF-5, an Fe(III)-respiring acidophile. Biometals 23: 1129–1138. pmid:20593301
  33. 33. San Martin-Uriz P, Gomez MJ, Arcas A, Bargiela R, Amils R (2011) Draft genome sequence of the electricigen Acidiphilium sp. strain PM (DSM 24941). J Bacteriol 193: 5585–5586. pmid:21914891
  34. 34. Valdes J, Quatrini R, Hallberg K, Dopson M, Valenzuela PD, Holmes DS (2009) Draft genome sequence of the extremely acidophilic bacterium Acidithiobacillus caldus ATCC 51756 reveals metabolic versatility in the genus Acidithiobacillus. J Bacteriol 191: 5877–5878. pmid:19617360
  35. 35. Orellana LH, Jerez CA (2011) A genomic island provides Acidithiobacillus ferrooxidans ATCC 53993 additional copper resistance: a possible competitive advantage. Appl Microbiol Biotechnol 92: 761–767. pmid:21789491
  36. 36. Pester M, Brambilla E, Alazard D, Rattei T, Weinmaier T, Han J, et al. (2012) Complete genome sequences of Desulfosporosinus orientis DSM765T, Desulfosporosinus youngiae DSM17734T, Desulfosporosinus meridiei DSM13257T, and Desulfosporosinus acidiphilus DSM22704T. J Bacteriol 194: 6300–6301. pmid:23105050
  37. 37. Fujimura R, Sato Y, Nishizawa T, Oshima K, Kim SW, Hattori M, at al. (2012) Complete genome sequence of Leptospirillum ferrooxidans strain C2-3, isolated from a fresh volcanic ash deposit on the island of Miyake, Japan. J Bacteriol 194: 4122–4123. pmid:22815442
  38. 38. Anderson I, Chertkov O, Chen A, Saunders E, Lapidus A, Nolan M, et al. (2012) Complete genome sequence of the moderately thermophilic mineral-sulfide-oxidizing firmicute Sulfobacillus acidophilus type strain (NAL(T)). Stand Genomic Sci 6: 1–13.
  39. 39. Li B, Chen Y, Liu Q, Hu S, Chen X (2011) Complete genome analysis of Sulfobacillus acidophilus strain TPY, isolated from a hydrothermal vent in the Pacific Ocean. J Bacteriol 193: 5555–5556. pmid:21914875
  40. 40. Arsene-Ploetze F, Koechler S, Marchal M, Coppee JY, Chandler M, Bonnefoy V, et al. (2010) Structure, function, and evolution of the Thiomonas spp. genome. PLoS Genet 6: e1000859. pmid:20195515
  41. 41. Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, et al. (2008) The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9: 386. pmid:18803844
  42. 42. Li L, Stoeckert CJ Jr., Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13: 2178–2189. pmid:12952885
  43. 43. Remm M, Storm CE, Sonnhammer EL (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 314: 1041–1052. pmid:11743721
  44. 44. Comeron JM, Aguade M (1998) An evaluation of measures of synonymous codon usage bias. J Mol Evol 47: 268–274. pmid:9732453
  45. 45. Hart A, Martínez S (2016) An entropy-based technique for classifying bacterial chromosomes according to synonymous codon usage. Journal of Mathematical Biology: 1–15.
  46. 46. Scholz FW, Stephens M. A. (1987) K-Sample Anderson-Darling Tests. Journal of the American Statistical Association 82: 6.
  47. 47. Szekli R (1995) Stochastic ordering and dependence in applied probability: Springer-Verlag.
  48. 48. Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, et al. (2003) TM4: a free, open-source system for microarray data management and analysis. Biotechniques 34: 374–378. pmid:12613259
  49. 49. Zhang X, Liu X, Liang Y, Guo X, Xiao Y, Ma L, et al. (2017) Adaptive evolution of extreme acidophile Sulfobacillus thermosulfidooxidans potentially driven by horizontal gene transfer and gene loss. Appl Environ Microbiol 17:83.
  50. 50. Tatusov RL, Galperin MY, Natale DA, Koonin EV (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28: 33–36. pmid:10592175
  51. 51. Carbone A, Kepes F, Zinovyev A (2005) Codon bias signatures, organization of microorganisms in codon space, and lifestyle. Mol Biol Evol 22: 547–561. pmid:15537809
  52. 52. Prakash T, Taylor TD (2012) Functional assignment of metagenomic data: challenges and applications. Brief Bioinform 13: 711–727. pmid:22772835
  53. 53. Wong JT (1975) A co-evolution theory of the genetic code. Proc Natl Acad Sci U S A 72: 1909–1912. pmid:1057181
  54. 54. Okabe S, Odagiri M, Ito T, Satoh H (2007) Succession of sulfur-oxidizing bacteria in the microbial community on corroding concrete in sewer systems. Appl Environ Microbiol 73: 971–980. pmid:17142362
  55. 55. Brandis G, Hughes D (2016) The Selective Advantage of Synonymous Codon Usage Bias in Salmonella. PLoS Genet 12: e1005926. pmid:26963725
  56. 56. Okibe N, Johnson DB (2004) Biooxidation of pyrite by defined mixed cultures of moderately thermophilic acidophiles in pH-controlled bioreactors: significance of microbial interactions. Biotechnol Bioeng 87: 574–583. pmid:15352055
  57. 57. Moeck GS, Coulton JW (1998) TonB-dependent iron acquisition: mechanisms of siderophore-mediated active transport. Mol Microbiol 28: 675–681. pmid:9643536
  58. 58. Navarro CA, Orellana LH, Mauriaca C, Jerez CA (2009) Transcriptional and functional studies of Acidithiobacillus ferrooxidans genes related to survival in the presence of copper. Appl Environ Microbiol 75: 6102–6109. pmid:19666734
  59. 59. Orell A, Navarro CA, Arancibia R, Mobarec JC, Jerez CA (2010) Life in blue: copper resistance mechanisms of bacteria and archaea used in industrial biomining of minerals. Biotechnol Adv 28: 839–848. pmid:20627124
  60. 60. Pellegrino S, Radzimanowski J, de Sanctis D, Boeri Erba E, McSweeney S, Timmins J (2012) Structural and functional characterization of an SMC-like protein RecN: new insights into double-strand break repair. Structure 20: 2076–2089. pmid:23085075
  61. 61. Kunkel TA, Erie DA (2005) DNA mismatch repair. Annu Rev Biochem 74: 681–710. pmid:15952900
  62. 62. Clum A, Nolan M, Lang E, Glavina Del Rio T, Tice H, Copeland A, et al. (2009) Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICP). Stand Genomic Sci 20:38–45.
  63. 63. Wakao N, Nagasawa N, Matsuura T, Matsukura H, Matsumoto T, Hiraishi A, et al. (1994) Acidiphilium multivorum sp. nov., an acidophilic chemoorganotrophic bacterium from pyritic acid mine drainage. J Gen Appl Microbiol 40. 143–159.
  64. 64. You XY, Guo X, Zheng HJ, Zhang MJ, Liu LJ, Zhu YQ, et al. (2011) Unraveling the Acidithiobacillus caldus complete genome and its central metabolisms for carbon assimilation. J Genet Genomics 38:243–52. pmid:21703548
  65. 65. Liljeqvist M, Valdes J, Holmes DS, Dopson M (2011) Draft genome of the psychrotolerant acidophile Acidithiobacillus ferrivorans SS3. J Bacteriol 193:4304–5. pmid:21705598
  66. 66. Valdés J, Pedroso I, Quatrini R, Dodson RJ, Tettelin H, Blake R 2nd, Eisen JA, Holmes DS (2008) Acidithiobacillus ferrooxidans metabolism: from genome sequence to industrial applications. BMC Genomics 11:597.
  67. 67. Valdes J, Ossandon F, Quatrini R, Dopson M, Holmes DS (2011) Draft genome sequence of the extremely acidophilic biomining bacterium Acidithiobacillus thiooxidans ATCC 19377 provides insights into the evolution of the Acidithiobacillus genus. J Bacteriol 193:7003–4. pmid:22123759
  68. 68. Mi S, Song J, Lin J, Che Y, Zheng H, Lin J (2011) Complete genome of Leptospirillum ferriphilum ML-04 provides insight into its physiology and environmental adaptation. J Microbiol 49:890–901. pmid:22203551
  69. 69. Li B1, Chen Y, Liu Q, Hu S, Chen X 2011 Complete genome analysis of Sulfobacillus acidophilus strain TPY, isolated from a hydrothermal vent in the Pacific Ocean. J Bacteriol 193:5555–6. pmid:21914875