Analysis and Phylogeny of Small Heat Shock Proteins from Marine Viruses and Their Cyanobacteria Host

Small heat shock proteins (sHSPs) are oligomeric stress proteins characterized by an α-crystallin domain (ACD) surrounded by a N-terminal arm and C-terminal extension. Publications on sHSPs have reported that they exist in prokaryotes and eukaryotes but, to our knowledge, not in viruses. Here we show that sHSPs are present in some cyanophages that infect the marine unicellular cyanobacteria, Synechococcus and Prochlorococcus. These phage sHSPs contain a conserved ACD flanked by a relatively conserved N-terminal arm and a short C-terminal extension with or without the conserved C-terminal anchoring module (CAM) L-X-I/V, suggested to be implicated in the oligomerization. In addition, cyanophage sHSPs have the signature pattern, P-P-[YF]-N-[ILV]-[IV]-x(9)-[EQ], in the predicted β2 and β3 strands of the ACD. Phylogenetically, cyanophage sHSPs form a monophyletic clade closer to bacterial class A sHSPs than to cyanobacterial sHSPs. Furthermore, three sHSPs from their cellular host, Synechococcus, are phylogenetically close to plants sHSPs. Implications of evolutionary relationships between the sHSPs of cyanophages, bacterial class A, cyanobacteria, and plants are discussed.


Introduction
The small heat shock proteins (sHSPs) are a family of stress proteins, found in archaea, bacteria, fungi, plants and animals [1][2][3][4]. sHSPs monomers  are characterized by a conserved domain of approximately 90 amino acids called αcrystallin domain (ACD), consisting of eight beta strands which form a ß-sandwich fold (Pfam PF00011: Hsp20/alphacrystallin). This domain is flanked by an N-terminal arm and Cterminal extension variable in both length and sequence between orthologues and may reflect functional specificity and/or preferential chaperone activity [5,6]. sHSPs generally exist as oligomers that are usually polydisperse and change size and organization on exposure to stress and when interacting with substrate [6]. In vitro sHSPs have been shown to prevent the irreversible aggregation of non-native proteins during heat shock. Mutations in sHSPs are associated with a variety of severe diseases, including myopathies, dystrophies, and cataracts [7,8]. Phylogenetic analyses indicated that sHSPs were already present in the last common ancestor of prokaryotes and eukaryotes [9,10].
Phages are very important in marine systems. They are the most abundant forms of life in the Earth's oceans with concentrations exceeding 10 million per milliliter of seawater [11]. They influence marine biogeochemical cycles by controlling host abundance and community composition as well as recycling photosynthetically fixed organic carbon as dissolved organic material via viral lysis [12]. Cyanophages infect the marine unicellular cyanobacteria, Synechococcus and its sister group Prochlorococcus which dominate the picophytoplankton in the oceans [13,14]. To date, the vast majority of phages that are known to infect cyanobacteria are myoviruses [15,16], which are related to phage T4 [17,18]. It has been reported that the sequenced genomes of Synechococcus and Prochlorococcus phages contain genes with an hsp20/alpha-crystallin domain (PF00011) [18][19][20].

Sequence databases, alignment and phylogeny
We searched the presence of sHSPs in the complete sequenced genomes of viruses from the biological databases (GenBank, protein database, and genomes database) using BLASTp, tBLASTn and HMM profile. We have also searched sHSPs in complete sequenced genomes of their host cyanobacteria, Synechococcus and Prochlorococcus. We aligned sequences of small heat shock proteins (sHSPs) from several species with ClustalW. Secondary structures indicated in the alignment are assigned according to the determined crystal structure of wheat HSP16.9 [21]. GeneBank accession numbers of sequences of cyanophages and cyanobacteria used in this alignment are listed in the Tables 1 and 2, respectively. Phylogenetic tree was constructed using PhyML [22] and BioNJ [23]. Only the ACD and C-terminal extension were used for the phylogenetic analysis. For PhyML, WAG Substitution model and the statistical confidence of the nodes was calculated by aLRT test.

Results and Discussion
Publications on sHSPs have reported that they are present in archaea, bacteria, fungi, plants and animals but not in viruses. Here, we searched for sHSPs in the complete sequenced genomes of viruses from the biological databases (GenBank, protein database, and genomes database) using BLASTp, tBLASTn and HMM profile. These searches showed that sHSPs are present only in marine viruses (cyanophages) that infect the unicellular cyanobacteria, Synechococcus and Prochlorococcus (Table 1). We found that the genomes of many, but not all, of these cyanophages contain a single-copy sHSPs gene. Small cyanophage genomes such as Synechococcus phage P60 (47872 bp) and Synechococcus phage Syn5 (46214 bp) do not contain any sHSP genes. It is interesting to note that Prochlorococcus phage P-SSM2 and P-SSM4 lack core T4-like chaperonin genes (rnlA, 31, and 57A), although, both phages contain sHSPs [19]. sHSPs could play the same function as core T4-like chaperonin genes intervening in scaffolding during maturation of the capsid [27]. Protein sequence analysis of cyanophage sHSPs showed that they contain a conserved ACD (~ 92 amino acids) flanked by a relatively conserved N-terminal arm and a short C-terminal extension. The length of the arm and the extension is variable. Conserved C-terminal anchoring motif (CAM) L-X-I/L/V, implicated in the inter-dimer interactions is present in 12 of 19 Synechococcus phages ( Figure 1). The Prochlorococcus phages do not contain a classical CAM but A-X-P, L-X-G and L-X-A motives are present in the C-terminal extension of Prochlorococcus phages Syn33, P-SSM2 and P-SSM7, respectively. It was reported that sHSP Tsp36 also contains a non-classical CAM, I-X-P [28]. The end of N-terminal arm contains a double conserved proline and another conserved proline is present at the beginning of the C-terminal extension ( Figure 1). Furthermore, an A-G doublet characteristic of bacterial class A sHSPs is also present in cyanophage sHSPs [29,30] . This doublet is sandwiched by hydrophobic residues, aliphatic residue L and aromatic F/Y/W. Aromatic residues in this position are found only in bacterial classA and animals sHSPs [29]. Cyanophages also have a conserved arginine, important for dimerization and associated with human diseases in the predicted β7 strand (Figure 1). Synechococcus phage S-PM2, S-CAM1 and Prochlorococcus phage Syn1 contain a hydrophilic amino acid asparagine in the place of arginine, and Synechococcus phage S-CRM01 contains a lysine. The ACD contains a variable region corresponding to the L57 loop (residues 109-121) (Figure 1). Arg in beta7 strand could form salt bridge with Asp or Glu in the L57 loop (residues 109-121) of the neighbor monomer, probably with Asp or Glu in position 117 ( Figure 1). Using I-TASSER, we have constructed a 3D model of the sHSP from Synechococcus phage S-MbCM6 (HspSP-MbCM6). Figure 2A shows that 3D model is similar to the structure of wheat Hsp16.9 [21]. 3D structure alignment between HspSP-MbCM6 and wheat Hsp16.9 ( Figure 2B) showed that the best conserved region is the ACD domain. 3D alignment by PDBeFold of the 3D model against PDB database revealed a high similarity (RMSD of 1.40 Å and 20% of identity) with 1gme.
We have also searched for sHSPs in the genomes of their host cyanobacteria, Synechococcus and Prochlorococcus, in order to know if sHSPs in cyanophages are the result of lateral gene transfer (LGT) from cyanobacteria to phage.
LGT from cyanobacteria to cyanophages is well documented for photosynthesis genes [31]. Fifteen sequenced genomes of Synechococcus contain 1, 2 or 3 sHSP genes ( Table 2)    is in cyan and green background, respectively. Alignment was generated using ClustalW. Secondary structures indicated above are assigned according to the crystal structure of wheat HSP16.9 (1gme) [21]. GeneBank accession numbers of sequences used in this alignment are listed in the Table 1.
hydrophobic residues L142, V144 and I147 of HspSP-MbCM6 and V148, V150 and L152 of HspS-PCC7335.1 occupied the three pockets ( Figure 4B and 4C). These results suggest that sHSPs of cyanophage and cyanobacteria could form heterooligomers provided they have compatible N-terminal interactions ( Figure 4D). To establish the phylogenetic relationships between sHSPs of cyanophages and those of prokaryotes and eukaryotes, we aligned sequences of the ACD from bacteria, archaea, cyanobacteria, fungi, plants and animals with ClustalW and constructed a phylogenetic tree using PhyML [22] and BioNJ [23]. The multiple sequences alignment of Figure 3 shows that the pattern P-P-[YF]-N-[ILV]-[IV]-x(9)-[EQ] is a signature of cyanophage sHSPs. This pattern can be used to specifically search for cyanophage sHSPs in metagenomic databases by using PHI-BLAST. Furthermore, the relatively conserved sequences of N-terminal arms of cyanophage sHSPs make it possible to build an HMM profile which can also be employed specifically to extract sHSPs of cyanophages from metagenomic databases. Figure 5 shows that sHSPs form two groups, bacterial class A, cyanophages and animals are one group and bacterial class B, archaea, cyanobacteria, fungi and plants are the second group. The same result is obtained using BioNJ (not shown). In addition, cyanophages sHSPs form a monophyletic clade closer to bacterial class A than to cyanobacteria. This suggests that cyanophages acquired sHSPs gene from a bacterial class A ancestor by LGT.
According to the work of Fu et al. [30] based on the relationship between phylogeny and oligomeric polydispersity, we could suppose that cyanophage sHSPs exist in oligomeric polydispersity as in their bacterial class A ancestor sHSP. It is important to note that three sHSPs from their cellular host, Synechococcus, form a monophyletic clade that is phylogenetically close to plants ( Figure 5). Cyanobacteria are among the most ancient organisms on Earth, and fossils of these photosynthetic bacteria indicate a striking resemblance between current species and ones extant over 2 billion years ago [32]. Thus, the ACD of sHSP gene family must be at least 2 billion years old. We could suppose that plants acquired sHSPs gene from cyanobacterial endosymbionts that gave rise to the chloroplast.

Conclusions
This study revealed the presence of sHSPs in viruses and highlighted their structural characteristics and phylogenetic relationships with those of prokaryotes and eukaryotes. We expect that the study of sHSPs in a simple system such as viruses and cyanobacteria will help answer many questions not yet resolved such as the mechanism of their interaction with the substrate. Moreover, they could help to know the origin and evolution of this ancient, at least 2 billion years old, gene family.