The Coiled Coils of Cohesin Are Conserved in Animals, but Not In Yeast

Background The SMC proteins are involved in DNA repair, chromosome condensation, and sister chromatid cohesion throughout Eukaryota. Long, anti-parallel coiled coils are a prominent feature of SMC proteins, and are thought to serve as spacer rods to provide an elongated structure and to separate domains. We reported recently that the coiled coils of mammalian condensin (SMC2/4) showed moderate sequence divergence (≈10–15%) consistent with their functioning as spacer rods. The coiled coils of mammalian cohesins (SMC1/3), however, were very highly constrained, with amino acid sequence divergence typically <0.5%. These coiled coils are among the most highly conserved mammalian proteins, suggesting that they make extensive contacts over their entire surface. Methodology/Principal Findings Here, we broaden our initial analysis of condensin and cohesin to include additional vertebrate and invertebrate organisms and multiple species of yeast. We found that the coiled coils of SMC1/3 are highly constrained in Drosophila and other insects, and more generally across all animal species. However, in yeast they are no more constrained than the coils of SMC2/4 and Ndc80/Nuf2p, suggesting that they are serving primarily as spacer rods. Conclusions/Significance SMC1/3 functions for sister chromatid cohesion in all species. Since its coiled coils apparently serve only as spacer rods in yeast, it is likely that this is sufficient for sister chromatid cohesion in all species. This suggests an additional function in animals that constrains the sequence of the coiled coils. Several recent studies have demonstrated that cohesin has a role in gene expression in post-mitotic neurons of Drosophila, and other animal cells. Some variants of human Cornelia de Lange Syndrome involve mutations in human SMC1/3. We suggest that the role of cohesin in gene expression may involve intimate contact of the coiled coils of SMC1/3, and impose the constraint on sequence divergence.


Introduction
The structural maintenance of chromosome (SMC) proteins interact with DNA to carry out several critical functions within the cell including DNA repair, chromosome condensation, and sister chromatid cohesion during mitosis [1][2][3][4][5][6][7][8]. Three SMC protein complexes have been identified in eukaryotes: SMC1/3 (cohesin), SMC2/4 (condensin), and SMC5/6. Each SMC complex consists of two SMC protein subunits and a varying number of accessory proteins. The SMCs are structurally characterized by a long, antiparallel coiled coil that exhibits the hydrophobic packing of Crick [9]. The SMC subunits fold at the central hinge, and the coiled coils bring the N-and C-terminal domains together to form the ATPase head domain (Fig. 1). The two SMC subunits of each pair form a dimer, with the hinges associated in the middle and the ATPase domains at the ends of the long coiled-coil rods (Fig. 1).
Research efforts on the function of cohesin (SMC1/3 and two associated proteins) have focused primarily on its well-documented role in holding sister chromatids together until they are ready for separation in anaphase [10][11][12][13][14][15], and recently to a role in mitotic spindle pole formation [16]. During mitosis, SMC1 complexes with SMC3 to form cohesin (we use SMC1 to designate SMC1p (yeast) and SMC1A or SMC1L1 (animals)). During meiosis in animals, a second isoform (SMC1b) complexes with SMC3 to form a second cohesin [17,18]. An attractive model for the mechanism of sister chromatid cohesion proposes that the ATPase heads come together and the long coiled coils form a ring that could trap the DNA of each sister chromatid ( [5,19], but see also [20]). In this model, and indeed in most thinking about the possible function of cohesin and condensin, the coiled coils are thought to serve a structural role as long spacer arms. However, we recently reported that the mammalian cohesins possess the most highly conserved coiled-coil domains in the genome [21], suggesting that they are not just spacer arms (discussed below).
Emerging evidence demonstrates that cohesin also has important functions in interphase cells, independent of its role in cohesion. Cohesin binds to specific chromosomal sites across the genome, and is involved in regulating gene expression in postmitotic cells. In yeast, the cleavage of the Scc1/Mdc1p subunits of cohesin play a role in transcriptional silencing [22] and SMC1 and SMC3 may be a part of the mechanism defining the boundaries of the transcriptionally silenced HMR locus [23]. Cohesin mutations in the coiled coils of both the SMC1 and SMC3 subunits were discovered in patients with a mild variant of Cornelia de Lange Syndrome (CdLS) [24][25][26][27] -a human neurological developmental disorder associated with mental retardation. In Drosophila, studies with cohesin mutants demonstrated that cohesin is required for developmental axonal pruning [28] and abnormal cohesin complex cleavage alters wild type larval locomotion [29]. Finally, cohesin binding sites and those of the zinc-finger protein CCCTCbinding factor (CTCF), which acts as an insulator protein that blocks enhancer-promoter interactions, overlap significantly in mammalian chromosomes ( [30][31][32][33][34]; see [33,34] for commentary). CTCF is found in Drosophila and vertebrates [35]. 9,000 cohesinbinding sites were found in the human genome, and 90% of them also bound CTCF [31][32][33]. Additionally, this post-mitotic function for cohesin appears not to be limited to neurons as immunoblots demonstrate that cohesin is differentially expressed in a variety of murine tissue extracts [32]. Overall, these studies demonstrate that cohesin functions in regulating gene expression in addition to its role in sister chromatid cohesion.
In a previous study, we noted that the coiled-coil segments of SMC1 and SMC3 were among the most highly conserved mammalian proteins, showing sequence divergence across different mammalian species of only 0-1% over ,700 amino acids [21]. To put this in context, we analyzed a variety of coiled-coil proteins. Some coiled coils (Ndc80, Nuf2p, giantin) showed sequence divergence of ,20% across different species of mammals, and we concluded that this was a typical sequence divergence of coiled coils serving primarily as inert spacer rods. Ndc80/Nuf2p appear to be found in all eukaryotes and form a heterodimeric coiled coil that is involved in linking the kinetochore to microtubules [36][37][38]. The coiled coils of condensins (SMC2/4) showed 10-15% sequence divergence, suggesting that they are also serving primarily as rods. Proteins that are known to use their coiled coils for packing into filaments, such as skeletal muscle myosin II and intermediate filament proteins, showed a divergence of only 1-3%, reflecting the constraints to maintain the proteinprotein contacts over their surface. The coiled coils of SMC1/3 were even more highly conserved than myosin II and intermediate filament proteins, which implied these coiled coils are not serving just as spacer rods. We concluded that the coiled coils of SMC1/3 probably have interactions over their entire length and circumference that constrain sequence divergence.
We also found that the coiled coils of SMC1/3 were much more highly conserved than those of SMC2/4 when we compared human sequences to avian, amphibian, and Drosophila orthologs [21]. This suggested that the mechanism imposing the constraint in SMC1/3 coiled coils was found in all vertebrates, and perhaps all animal species, and raised the question whether this mechanism is found in all eukaryotes. As genomic sequences have become increasingly available for a number of yeast, nematode, Drosophila, and other invertebrate species [39][40][41][42][43], we decided to extend our analysis to yeast and additional animal species. We determined that the coiled coils of SMC1/3 are more conserved than those of SMC2/4 and Ndc80/Nuf2p in vertebrates and several species of invertebrates, but are no more conserved than those of SMC2/4 and Ndc80/Nuf2p across multiple yeast species. In addition, the meiotic SMC1b coils are not constrained in animals.
Tropomyosin, an actin binding coiled-coil protein found throughout Eukaryota served as our control reference sequence of a coiled coil that is universally highly conserved [9,44,45]. Tropomyosin is an essential element of the thin filament of striated muscle while its non-muscle isoforms have been implicated in a number of functions including actin filament stability and cytokinesis [46,47]. Its sequence conservation is apparently due to its binding to actin and the troponin proteins, involving most of the surface of the coiled coil. Our finding that SMC1/3 coils are not constrained in yeast suggests that there is a special mechanism involving the SMC1/3 coils that operates in metazoans but not in yeast nor in animal cells undergoing meiosis.

Protein Sequence Acquisition and Coiled-Coil Domain Determination
Seven coiled-coil proteins (SMC1/3, SMC2/4, Ndc80, Nuf2p, tropomyosin (Tm)) from Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens served as the reference sequences for this study. Protein sequences for each were obtained from GenBank with the majority annotated as Reference Sequences. The remaining sequences were obtained by using these sequences as the query in either a BLAST search of GenBank's protein database, or a Eukaryota Genomic BLAST (tBLASTn) search of the whole genome shotgun sequence databases (http://www.ncbi.nlm.nih.gov/sutils/genom_table. cgi?organism = euk). For those species which only had whole genome shotgun sequences available, orthologous proteins were identified based upon the highest % identity with the query sequence for that organism, and that their coiled-coiled domains, as identified using the COILS prediction program (see below), closely approximated those of the query sequences. The gene sequences were then translated using the SIXFRAME tool at Biology Workbench at the San Diego Super Computer (http:// seqtool.sdsc.edu/CGI/BW.cgi).
The coiled-coil domains of each protein were identified using the 28-residue window output from the COILS program (http:// www.ch.embnet.org/software/COILS_form.html) and to maintain consistency with our previous study, only those regions of the sequence where the score was $0.5 were used for the analysis of the coils [21,48]. All SMC proteins in this study had disruptions of their coiled coils [49]. The boundaries of the individual coiled-coil segments were identified and the non-coil segments were removed. For a select subset of SMC proteins, coiled coil boundaries were determined using other coiled coil prediction software including PairCoil2 ( [50]; http://groups.csail.mit.edu/cb/paircoil2/) and PCOILS ( [51]; http://toolkit.tuebingen.mpg.de/pcoils). All three programs predicted slightly different boundaries for the coiled-coil segments, and using our analysis criteria, resulted in slightly different percent divergences. However, the sequence divergences never changed more than 1 or 2 percent, so the trends we report using the COILS program (in this and our previous study), are not affected by the small differences in boundaries.
For ease of viewing, the SMC1/3, SMC2/4, Ndc80, Nuf2p and tropomyosin sequences are grouped by organism. For those sequences that served as the reference sequences in this study, the following additional information is provided: 1) protein length (or noted as a partial sequence), and 2) the boundaries of the coiledcoil segments as determined by COILS. All remaining sequences are listed by their GenBank accession number only with translated ORFs and partial length sequences reported as such. In some instances, the protein sequence was reconstructed from multiple genomic segments, and that is noted.
As all sequences (except for those from Ciona intestinalis) were obtained from GenBank, only their accession numbers are provided.

Amino Acid Sequence Divergence
After identification of the coiled-coil domains for each protein, all non-coil segments were removed from the sequence. The Nand C-terminal coiled coils were then combined and analyzed as one continuous coil. Orthologs were always compared to the reference sequence for each organism. A preliminary alignment of the proteins was used to identify sequence gaps that were removed prior to the determination of sequence divergence. Each pair of sequences was aligned using the ''BLAST 2 sequences'' tool accessed through the Biology WorkBench of the San Diego Super Computer (http://seqtool.sdsc.edu/CGI/BW.cgi). The percent amino acid sequence divergence was calculated from the output of the paired alignment. As with previous work [21], we used the simplest measure of sequence divergence, i.e., the percent of amino acid changes between the two sequences. Conserved amino acid substitutions were not considered.

Results
We analyzed the sequence divergence of the coiled-coil domains of Tm, Ndc80/Nuf2p, SMC2/4, and SMC1/3 in multiple species of yeast, nematodes, insects, and other animals. Overall, the most important comparison is the sequence divergence of the coiled coils of SMC1/3 compared to SMC2/4, which we believe function primarily as spacer rods. Table 1 shows the sequence divergence across 10 species of yeast, each compared to S. cerevisiae. The analysis is most informative for the first four, which are the closest to S. cerevisiae (members of Saccharomyces sensu stricto [40], and which diverged approximately 10-20 MYA (5.26K generations/year) [52,53]), but the conclusions are similar for the more divergent species. The coils of SMC1/3 are no more conserved than the coils of SMC2/4 or Ndc80/Nuf2p. Our reference coiled-coil protein tropomyosin, however, is highly constrained across these same species, providing a benchmark coiled coil sequence within Saccharomycotina. The fact that tropomyosin's coils are highly constrained seems consistent with its recently demonstrated essential role in cytokinesis [47], and the involvement of much of its surface in binding to actin. Table 2 shows the same comparisons for insects and nematodes. Based upon the Drosophila phylogeny recently generated from the analysis of whole genome shotgun sequences, three species of Drosophila were selected for comparison to D. melanogaster: D. sechellia (time to last common ancestor (TLCA)<1.2 MYA (10 generations/year)), D. pseudoobscura (TLCA<24 MYA), and D. grimshawi (TLCA<40 MYA) [43,54,55]. In several insect species, the Ndc80/Nuf2p sequences returned from the tBLASTn search of the whole genome shotgun sequences showed either exceptionally high or low sequence divergence compared to D. melanogaster. We included in the tables all sequences that seemed to have the correct domain structure, in particular coiled coils of approximately the right length and position. We note that Ndc80 shows a very high divergence and Nuf2p a very low divergence across Drosophila species, but have not explored this further. Curiously, when other animal Ndc80/Nuf2p sequences are compared to Ndc80/Nuf2p from humans, it is the Ndc80 coil that is constrained relative to the Nuf2p [21]. The functional significance of these results is not clear at this time. Finally, tropomyosin coils are constrained in these insects but at different levels in Drosophila and mosquitoes.
In contrast to yeast, insect species including Drosophila and mosquitoes (Drosophila/Anopheles TLCA<250 MYA [56]) show a strong conservation of SMC1/3 coils relative to SMC2/4. This suggests that there is a mechanism constraining sequence divergence of cohesin's coiled coils, and this mechanism is found across all insect species. The nematode comparisons are less definitive but provide similar results. Their SMC1/3 coils are more conserved than are those of SMC2/4 and Ndc80/Nuf2p (C. elegans/C. briggsae TLCA<24 MYA (6 generations/year in soil) [55]), but the difference is not as striking as in Drosophila and mammals, perhaps reflecting their lack of CTCF [35]. Table 3 extends our previous analysis of mammals and vertebrates to include more distant members of the animal kingdom. The conservation of SMC1/3 coils is especially striking for mammals (0-0.6% divergence), but the SMC1/3 coils are also much more highly conserved than the SMC2/4 coils when comparing H. sapiens to other vertebrates. It is only when H. sapiens SMC1/3 are compared to invertebrates such as sea urchin, sea squirt, sea anemone, Drosophila, and C. elegans that high sequence divergences are observed. Nevertheless, these invertebrate SMC1/ 3 coils are still more conserved than SMC2/4 or Ndc80/Nuf2p coils. This constraint upon SMC1 coils in animals does not include SMC1b, whose coils show divergences consistent with a spacer rod function. This suggests that the special mechanism involving the coiled coils of SMC1/3 cohesin is found across the entire animal kingdom but is not associated with meiotic cohesin function.

Analysis of SMC1/SMC3 Coiled Coil Mutations in Cornelia de Lange Syndrome
Most CdLS mutations are not in the SMC1/3, but in the accessory protein NIPBL (Nipped-B-Like) [57]. This protein is not a part of the cohesin complex but is involved in loading the cohesin complex onto chromosomes. As noted in the Introduction, however, recent studies have demonstrated that mutations in the coiled-coil domains of both SMC1 and SMC3 can cause CdLS as well [24,26]. Deardorff et al. [24] reported that the COILS program predicts that the mutations found in the coiled-coil of SMC1 change the probability of the formation of the coiled-coil domains over localized segments of the protein. We repeated this analysis in detail, and in Table 4 summarize the COILS prediction regarding the potential impact of each of these SMC1/3 mutations. The majority (6/7) reduce the probability of coiledcoil formation over relatively small segments. Two mutations: R711W and D831E-Q832Del are predicted to have the greatest impact. The R711W mutation reduces the probability of coiledcoil formation earlier in this segment, and COILS also predicts a shifting of the end of this segment of the coil by 15 amino acids towards the N-terminus of the protein. The D831E-Q832Del mutation would shift the helical groove over the remaining 104 amino acids of this segment of the coil, but produced only a small, localized weakening of the coils. One mutation: R790Q is predicted to increase the probability of coiled-coil formation of this segment by reducing the size of a small upstream interruption. In contrast to the SMC1 mutations, for the only SMC3 coil mutation reported for CdLS patients (E488Del), COILS predicts it will not impact the coil negatively. Overall, it seems that the deleterious effects of the mutations may not involve significant alteration of the coiled-coil structure.

Discussion
As noted in the Introduction, cohesin in yeast has been implicated in transcriptional silencing and in defining the boundaries of the silenced HMR locus [22,23], but no role for SMC1b/3 in regulating gene expression has been discovered yet. In animals, however, the CdLS mutations and Drosophila data are  compelling evidence for a post-mitotic function for SMC1/3 cohesin in the nervous system. In Drosophila, SMC1 and Nipped-B are co-localized at multiple locations on the chromosomes, primarily at the promoter regions of active genes [58]. Curiously, reduced levels of Nipped-B reduced expression of the cut gene (a homeobox protein important in morphogenesis), while reduced SMC1 increased its expression [25,59,60]. Furthermore, the role of cohesin in gene regulation may not be limited to neurons and may not always require an intact cohesin complex. Ghiselli and Iozzo found that overexpressing SMC3 alone approximately 3-fold in NIH and Balb/c 3T3 cells caused them to adopt a transformed phenotype. They also found that SMC3 (SMC1 was not examined) was elevated in 70% of human colon carcinoma samples [61]. Human 293 cells stably transformed by SMC3 overexpression upregulated the expression of at least 65 genes [62]. Though the mechanism by which SMC3 overexpression causes these changes remains unknown, the data suggest further that SMC3 can modulate gene expression without being part of the cohesin complex. It would be interesting to test this for SMC1, and for the isolated coiled-coil segments of the two subunits.
Cohesin's role in regulating gene expression in post-mitotic cells has only recently been added to its list of biological functions. In neurons, the mechanism apparently involves the whole cohesin complex: cleavage of the Rad21 subunit in post-mitotic neurons causes severe defects in axon pruning and larval locomotion [29], and most CdLS defects are due to mutations in the loading factor Nipped-B [57]. In addition, SMC3 appears to have a gene regulatory function on its own, and our results in Table 3 show that SMC3's coiled coils are slightly more constrained than SMC1's from humans to zebrafish. Our work now suggests that the high conservation of the coiled coils of cohesin across the animal kingdom may be an important part of these mechanisms. It is interesting that five out of seven CdLS mutations in SMC1 change positively charged arginines to neutral residues. These arginines may be involved in binding to the negatively charged phosphates of DNA.

Conclusions
Our previous analysis showed that the coiled coils of cohesin are very highly conserved across vertebrates, which implied a function involving the entire length and circumference of the coiled coil. At that time we suggested that the surfaces of the coils would be involved in sister chromatid cohesion, which was the only known function for cohesin. A recent study has provided evidence for lateral interactions of yeast cohesins during that process, presumably along the length of their coiled coils [63]. This mechanism is apparently not one that requires extreme conservation of the coiled coil sequence, since we found that the coils are not highly conserved in Saccharomycotina. However, several recent studies have demonstrated that cohesin also has a separate function involving regulation of gene expression in post-mitotic neurons and other cells. We now suggest that the coiled coils of cohesin may play a key role in this second mechanism. The high conservation of the coiled coils of cohesin in metazoans supports the hypothesis that the entire surface of the coils may be involved in binding interactions, perhaps to the DNA of the various genes they regulate.