Structure and function of archaeal histones

The genomes of all organisms throughout the tree of life are compacted and organized in chromatin by association of chromatin proteins. Eukaryotic genomes encode histones, which are assembled on the genome into octamers, yielding nucleosomes. Post-translational modifications of the histones, which occur mostly on their N-terminal tails, define the functional state of chromatin. Like eukaryotes, most archaeal genomes encode histones, which are believed to be involved in the compaction and organization of their genomes. Instead of discrete multimers, in vivo data suggest assembly of “nucleosomes” of variable size, consisting of multiples of dimers, which are able to induce repression of transcription. Based on these data and a model derived from X-ray crystallography, it was recently proposed that archaeal histones assemble on DNA into “endless” hypernucleosomes. In this review, we discuss the amino acid determinants of hypernucleosome formation and highlight differences with the canonical eukaryotic octamer. We identify archaeal histones differing from the consensus, which are expected to be unable to assemble into hypernucleosomes. Finally, we identify atypical archaeal histones with short N- or C-terminal extensions and C-terminal tails similar to the tails of eukaryotic histones, which are subject to post-translational modification. Based on the expected characteristics of these archaeal histones, we discuss possibilities of involvement of histones in archaeal transcription regulation.


Introduction
Architectural chromatin proteins are found in every domain of life. Bacteria express DNAbending and DNA-bridging proteins, such as histone-like protein from Escherichia coli strain U93 (HU) and histone-like nucleoid-structuring protein (H-NS), to structure and functionally organize the genome and to regulate genome activity [1,2]. In eukaryotes and most archaeal lineages, histones are responsible for packaging and compaction of the DNA (Table 1). Genomic comparisons demonstrate that the Bacteria and Archaea share a common ancestor; eukaryotes are to date classified as being part of the archaeal branch [3][4][5]. The archaeal Table 1. Phylogenetic subdivision of the archaeal domain.

Superphylum
Phylum Class Histones domain comprises single-cellular organisms found in diverse habitats. Although Archaea and Bacteria have common features, such as a circular genome and the absence of a nucleus, at the genetic level, Archaea seem to be more related to eukaryotes. Amongst others, archaeal RNA polymerase, a key component of cellular life in all domains, is more similar to RNA polymerase from eukaryotes than bacterial RNA polymerase [6,7]. Archaeal ribosomes share their size and structural core with bacterial ribosomes but are more similar to eukaryotic ribosomes when it comes to protein and rRNA sequence and some specific domains [8][9][10]. Also, some cellular processes thought to be unique to eukaryotes, such as endosomal sorting and the ubiquitin system, have been identified in some archaea [11]. These observations raise the intriguing possibility that chromatin organization as we have come to understand in eukaryotes has evolved from that of the archaeal lineage. Before we describe our analysis, we briefly review current knowledge on chromatin organization in eukaryotes and Archaea and the current paradigms in the evolution of histones, the main chromatin organizing proteins.

The eukaryotic histone
In eukaryotes, octameric histone cores compact DNA by wrapping an approximately 150-bp unit twice around its surface, forming a nucleosome [12,13]. Nucleosomes interact with each other, yielding an additional level of DNA organization in the form of a fibre. Besides a role in compaction, histones also play roles in genome organization, replication, repair, and expression, which highlights the nucleosome as a very important complex affecting a vast array of cellular processes. Characteristic of core histone proteins of all different origins is a common "histone fold": two short and one long α-helix, separated by loops [14][15][16][17][18]. In eukaryotes, the histone core consists of two H2A-H2B dimers and a H3-H4 tetramer, around which approximately 146 bp of DNA is wrapped twice (Fig 1A). It has been suggested that smaller histone assemblies, such as tetrasomes (H3-H4 tetramers), hexasomes (H3-H4 tetramers plus one H2A/H2B dimer), and hemisomes (a H3-H4 dimer plus one H2A/H2B dimer), have functional roles as intermediate structures during, for example, transcription elongation [19][20][21][22]. The linker histone H1 (which lacks the characteristic histone fold) binds at the entry and exit points of the DNA wrapped around the octameric histone core [23,24]. The association of histone H1 constrains an additional 20 bp of DNA and allows for the formation of the 30-nm fibre, which results in tighter compaction [25,26]. Also, flexible N-terminal tails that protrude from eukaryotic histones contribute to tighter DNA packaging. These tails may interact with either the DNA or the histone surface on another nucleosome, which stabilizes the close association of nucleosomes [27][28][29]. Furthermore, post-translational modifications of amino acid residues in the N-terminal tails, such as acetylation, methylation, phosphorylation, ubiquitination, and biotinylation, are a key instrument for the cell to regulate gene expression, the DNA damage response, and many other processes [30][31][32]. For instance, while heterochromatin (tightly packed DNA) is typically devoid of acetylated lysines, euchromatic (lightly packed) regions typically contain histones with acetylated lysines. In general, euchromatin contains actively transcribed genes. Histone acetylation is believed to cause a locally less condensed chromatin structure in vivo, which is permissive to transcription. In particular the lysine-rich histone H4 tail seems to be crucial in the modulation of chromatin structure

Architectural DNA-binding proteins in Archaea
Archaeal genomes also encode proteins that are involved in shaping DNA architecture. Genes coding for histones are found in many species throughout the domain ( The histones found in Archaea are widespread throughout the domain but are absent in most Crenarchaeota. They have the same histone fold as eukaryotic histones, but N-terminal histone tails have not been identified (Fig 1B). Linker histones, homologous to eukaryotic H1, have not been found. Archaeal histones exist as dimers in solution, which have been shown to bend DNA [56,57]. These histone dimers can be homodimeric or heterodimeric [58], as many archaeal species express, or at least encode, more than one histone variant. In Methanothermus fervidus (class Methanobacteria), the two histone variants are expressed at different levels and ratios at different growth phases, suggesting a distinct function for both proteins [59]. In addition to binding as dimers, archaeal histones have been reported in vivo and in vitro to bind DNA as tetramers [60][61][62], wrapping the DNA once. However, micrococcal nuclease (MNase) digestion patterns of Thermococcus kodakarensis (class Thermococci) chromatin suggest that histone-DNA complexes consist of discrete multiples of a dimeric histone subunit (i.e., not limited to dimers and tetramers) in vivo without obvious dependence on the DNA sequence [63]. Based on the latter observations, it was proposed that histone dimers multimerize and wrap DNA into a filament of variable length [17,63]. The crystallography study of Luger and coworkers on histone HMfB from M. fervidus indicates that these histones assemble into an endless left-handed rod in vitro, which we propose to call a "hypernucleosome" (Fig 2). Note that these complexes were assembled on SELEX-optimized DNA previously shown to favor tetrameric nucleosome assembly [64]. The number of wraps in the hypernucleosome, which is the DNA bending 360˚around the histone multimer, scales linearly with the number of histone subunits, resulting in a tight packaging of DNA. The authors also provide evidence that mutation-directed perturbation of hypernucleosome function in vivo alters response to nutrient change in T. kodakarensis, suggesting a role in transcription. Both eukaryotes and Archaea encode histone proteins, which seem to be involved in response to environmental cues by their involvement in transcription regulation.

Evolution of the histone protein class
It has been suggested that eukaryotic histones evolved from archaeal histones [65]. This hypothesis is supported by the high similarity at the amino acid sequence level and in secondary structure [66,67]. Suggestive of an archaeal origin of eukaryotic histones is also the dimeric nature of archaeal histones; archaeal histone complexes are built from dimers, but members of the archaeal class Halobacteria express a "tandem histone." In these tandem histones, the histone folds are linked end-to-end [68][69][70]. This implies that the histone folds always occupy the same position and role in the naturally linked dimer. This leads to the relaxation of evolutionary constraints in parts of the histone, an example of subfunctionalization [71,72]. According to this hypothesis, the histone folds further evolved in a divergent way, leading to an asymmetric dimer. This may have been an ancestor of H3-H4, which later separated to become two individual proteins and corresponding genes [66]. The eukaryotic H3-H4 tetramer resembles the tetramer found in Archaea, and it has been suggested that H2A and H2B have arisen from H3 and H4 later on in histone evolution [66]. Indeed, H3 and H4 are more similar to archaeal histones than H2A and H2B, supporting this hypothesis. From this point, eukaryotic histones have further evolved into histone variants, highly homologous substitutes of canonical eukaryotic histones, which often play a specialist role in a wide variety of cellular processes [73]. Unlike canonical histones, which are mainly expressed during DNA replication, histone variants are expressed in a replication-independent manner [74,75]. Histone variants of H2A and H3 are widely known and studied, whereas only a few examples have been found of diversified H2B and H4 [76]. The evolutionary pressure for the evolution of dimerbased histones to octameric histones and their subsequent variants was long believed to be DNA compaction [66]. The fact that eukaryotic cells undergo mitosis, in which chromosomes are highly compacted, together with the abundance of gene-poor regions may have favored a histone conformation that wraps DNA twice (eukaryotic octamer) instead of once (archaeal tetramer) and that via its N-terminal tails has the ability to compact DNA at a higher order. Open questions that remain are how histone evolution was driven and what the roles of archaeal histones and their variants are in genome packaging and regulation.
Here, we discuss the amino acid residues that are responsible for the formation of the hypernucleosome based on a sequence analysis of a subset of archaeal histones that includes histones from all phyla that contain genes coding for histones (Fig 3). Also, we analyze the ability of histones to form a hypernucleosome and the effects of N-or C-termini longer or shorter than the consensus on histone multimerization and transcription regulation. We emphasize the histones in species from recently discovered phyla, which are believed to be an evolutionary link to eukaryotes [11,77]. Based on elements that archaeal histones have in common and elements that differ from that consensus, we discuss some of the open questions regarding gene regulation by archaeal histones.

Histones are found in some newly discovered Archaea
With the widespread use of metagenomic sequencing, entire new branches within the archaeal domain have been discovered. Next to Euryarchaeota, the phylum that has been known since the establishment of Archaea as one of the domains of life [78], the superphyla Thaumarchaeota, Aigarchaeota, Crenarchaeota, Korarchaeota (TACK), Diapherotrites, Pacearchaeota, Aenigmarchaeota, Nanoarchaeota, Nanohaloarchaeota (DPANN), and Asgard Archaea are part of the most recent representation of the tree of life [79]. Genomes of the recently discovered archaeal superphylum Asgard Archaea and candidate phyla Bathyarchaeota, Woesearchaeota, Pacearchaeota, Aenigmarchaeota, Diapherotrites, Huberarchaea, and Micrarchaeota encode histones [11,77,[80][81][82] (Table 1). With the publication of the genome sequences of these organisms, we were able to scrutinize the sequence divergence of histones by comparing sequences of histones from Archaea throughout the domain (Fig 3). The selection of histones shown here is based on the presence of histone-coding genes in different phyla. Since many of those phyla were only discovered in the last three years, our selection includes a relatively large number of histones that have not yet been studied in vivo or in vitro.  We found that in genome LC_3 of the candidate phylum Heimdallarchaeota, 10 different histones are encoded, which is the highest number of histones found in one archaeal genome [83]. We have not found any histones in the genomes of candidate phyla Parvarchaeota, Geothermarchaeota, and Verstraetearchaeota, although it should be noted that abundance of available genomes and completeness of the genomes differs. The majority of available genomes from the phyla that do not seem to encode histones have an estimated completeness of between 70% and 99% [84][85][86][87]. This means that we cannot rule out the possibility that any of those genomes does contain one or more genes coding for histones. The absence of histones suggests that other NAPs may be involved in genome compaction. In that light, it is notable that in genomes from Candidatus Parvarchaeota, as well as in genomes from the candidate phyla from Asgard Archaea, Woesearchaeota, Bathyarchaeota, Pacearchaeota, Aenigmarchaeota, and Micrarchaeota, genes coding for the DNA-bridging protein Alba1 (and in some cases, Alba2) are present. Like histones, Alba (or Sso10b) proteins are likely involved in transcription repression. They are highly abundant in the nonhistone-coding Crenarchaeota [88], possibly taking the functional role of histones as found in other Archaea. Some Parvarchaeota genomes encode Alba but not histones, and their genomes may therefore be shaped or regulated in a similar way as in Crenarchaeota. For Geothermarchaeota and Verstraetearchaeota, we were not able to identify any protein that clearly resembles known chromatin proteins. Furthermore, we found that only Candidatus Thorarchaeota contains an HU gene (a DNA-bending protein generally found in Bacteria and Archaea without histones only [67]). The genome of Candidatus Huberarchaea encodes an MC1 homologue, which is a monomeric DNA-bending protein often found in organisms from the euryarchaeal class Halobacteria. Genes coding for other known archaeal NAPs [50, 89,90] were not found.

Some archaeal histones have eukaryote-like N-terminal tails
A striking finding based on the amino acid sequence comparison reveals that two histones from Candidatus Heimdallarchaeota archaeon LC_3 (only Histone A [HA] shown in Fig 3), one from Candidatus Huberarchaea archaeon CG_4_9_14_3_um_filter_31_125 and one from Candidatus Bathyarchaeota archaeon B23, contain an N-terminal tail, which was previously thought to exist only in eukaryotic histones and only recently reported for Heimdallarchaeota [64]. In eukaryotes, these tails stabilize a higher order of compaction by interacting with either the DNA or another nucleosome. The tails of the two histones from Heimdallarchaeota and Huberarchaea are of roughly the same length and sequence composition as eukaryotic H4 tails (see Fig 3). Prompted by the importance of the eukaryotic histone tails in modulating chromatin structure and function [27, 32], we constructed a molecular model of a hypernucleosome formed by Histone A (HA) from Heimdallarchaeota LC_3 to investigate its potential function (see Methods section).
The model illustrates how three subsequent arginines (R17-R19) could facilitate passing of the tails through the DNA gyres (Fig 4). The tails exit the hypernucleosome through DNA minor grooves, similar to eukaryotic histone tails, and might position their lysine side chains to bind to the hypernucleosomal DNA or to other DNA close by, facilitating (long-range) genomic interactions in trans. Like the H4 tail that is subject to acetylation of lysines K5, K8, K12, and K16 [91], lysines in the Heimdallarchaeal histone tail may well be subject to acetylation. Archaeal genomes are known to have several candidate lysine acetyltransferase and deacetylase enzymes, including proteins belonging to the ELP3 superfamily, to which transcription elongation factor and histone acetyltransferase ELP3 belongs [92][93][94]. Searches using the ProSite database (http://prosite.expasy.org, [95]) and Protein Information Resource (http://pir.georgetown.edu, [96]) further reveal that the Heimdallarchaeota LC_3 genome contains multiple gene products containing the Gcn5-related N-acetyltransferase domain, which is present in many histone acetyltransferases [97]. Interestingly, a potential "reader" protein that binds modified lysines can also be identified. This protein, HeimC3_47440, contains a YEATS-domain, which has recently been shown to bind histone tails that carry acetylated or crotonylated lysines [98][99][100][101]. Comparison with the closest homolog of known 3D structure, YEATS2 (35% identity, PDB-id 5IQL, [102]), shows that the binding site for the modified lysine side chain is strictly conserved in the archaeal protein. Notably, only Candidatus Bathyarchaeota, which also features tailed histones, contains a detectable homolog of HeimC3_47440. The presence of lysine-containing N-terminal tails in combination with histone modification writers and readers suggests that Archaea use post-translational modifications in a similar way to Eukaryotes as modulators of genome compaction and gene activity. The tail of the Huberarchaea histone also contains lysine residues that are found at the same position as some of the lysines of the H4 tail. However, no proteins involved in post-translational modification of histone tails have been identified in this phylum.
Other histones, for example from Candidatus Lokiarchaeota CR_4, Candidatus Odinarchaeota LBC_4, Nanoarchaeum equitans, and Thermofilum pendens, contain a short N-terminal tail of 5-10 residues. Also, histones with a C-terminal tail have been found. The histone from the euryarchaeal species Methanocaldococcus jannaschii (class Methanococci) has a 28-residue tail, which seems to be unique among archaeal histones. Other C-terminal tails are up to 11 residues long (as compared to Methanothermus fervidus HMfB) and appear in Caldiarchaeum subterraneum, Candidatus Bathyarchaeota SMTZ-80, Candidatus Heimdallarchaeota LC_3, Candidatus Lokiarchaeota CR_4, and all histones found in Crenarchaeota. These short C-terminal tails are similar in length to the H4 C-terminal tail, that is reported to play a role in the promotion of histone octamer formation in eukaryotes [103]. The genomes of some archaeal species contain genes for histone truncates. The histone from Haloredivivus sp. G17, member of the candidate phylum Nanohaloarchaeota, and the histone from Candidatus Bathyarchaeota archaeon B24 both lack part of the N-terminal α-helix (α1), and one histone from Candidatus Lokiarchaeota GC14-75 is reduced in length at the C-terminus. The remainder of the C-terminal amino acids likely does not form a C-terminal helix (α3) in this histone from Candidatus Lokiarchaeota. Although histones of reduced length or containing tails lack part of the histone fold, they likely still possess DNA-binding properties. Therefore, they possibly have functional roles in the regulation of genes.

Multimerization of histones
Both eukaryotic histones and HMfB form dimers, a process that is driven by a hydrophobic core (involving residues A24, L28, L32, I39, and A43 in HMfB) as well as a crucial salt bridge for a stable histone fold (R52-D59 in HMfB) [14]. These hydrophobic residues and the salt bridge are conserved among Archaea. This indicates that archaeal histones have very similar tertiary structures [14,104]. Also, residues that play an important role in DNA binding are present in all examined histones, including the arginines that anchor archaeal histone dimers to the DNA minor grooves (R10 and R19 in HMfB) [14]. Both eukaryotic H3-H4-dimers and HMfB dimers can form tetramers by hydrogen bonding of H49 and D59 (HMfB) and additional hydrophobic interactions in the interface (L46 and L62 in HMfB) [105], pairs of residues that, too, are generally conserved among archaeal histones (Fig 3).
The HMfB-DNA cocrystal structure reveals left-handed wrapping of DNA around a histone-multimer core [64] (Fig 2). This structure supports the model in which HMfB dimers multimerize along DNA into an "infinite" hypernucleosome, thereby linearly compacting the DNA approximately ten-fold. It is likely that hypernucleosomes grow or shrink by association or dissociation of dimers at both ends. The resolution of the crystal structure allowed us to identify several interacting residues between layers of dimers that may be important for stabilizing the complex (Fig 5). Based on this structural information, the propensity of different archaeal histones to multimerize can be predicted.
In Table 2, we set out three criteria for hypernucleosome formation by archaeal histones. Firstly, conservation of residues in the dimer-dimer interface (L46, H49, D59, and L62 in HMfB) is required, as forming a tetramer is the first step in multimerization. Secondly, residue G16, which is positioned at the stacking interface of the hypernucleosome (Fig 5), is crucial in permitting formation of the hypernucleosome [64]. Bulkier residues at this position interfere with multimerization [64]. Lastly, favorable interactions between histone dimers i and i+2 and i+3, here termed stacking interactions, will contribute to stability of the compacted hypernucleosome. The HMfB hypernucleosome crystal structure shows three stacking interactions, hydrogen bonds from K30 to E61, E34 to R65, and R48 to D14 (Figs 3 and 5).
Scrutiny of histone sequences reveals that most archaeal histones meet these criteria and are thus likely to form hypernucleosomes ( Table 2, marked +). We identified two to seven potential stacking interactions for this group of histones, which may affect hypernucleosome stability and compactness. Fewer interactions may allow for more "breathing" of the hypernucleosome structure, yielding hypernucleosomes that are more flexible or "floppy." We predict such structures to be formed also by a number of archaeal histones that do not fully meet our criteria ( Table 2, marked ±). For example, Candidatus Heimdallarchaeota LC_3 HA and Candidatus Lokiarchaeota GC14_75 HLkE have H49N and D59S substitutions, respectively, which likely weakens the crucial hydrogen-bonding interaction at the dimer-dimer interface [105]. Similarly, substitution of the hydrophobic residues 46 and 62 for more hydrophilic or bulkier ones would lead to a less stable dimer-dimer interface, as for Candidatus Heimdallarchaeota LC_3 HC and Candidatus Bathyarchaeota B23. In the presence of the canonical dimer-dimer interface, bulky substitutions at position 16 likely also result in a more open hypernucleosome structure, as for Candidatus Odinarchaeota LCB_4.
Three archaeal histone species fail multiple criteria in our analysis, indicating that these cannot form hypernucleosomes. These histone species are Haloredivivus sp G17, Nanosalina J07AB43 HB, and Euryarchaeal Methanococcoides methylutens (class Methanomicrobia) that all combine defects in the dimer interface with a bulky substitution at position 16 and few potential stacking interactions ( Table 2, marked-). In particular, Nanosalina J07AB43 Histone B (HB) shows a H49D substitution and a glutamic acid at position 62, making the dimer surface highly negatively charged and thus very unlikely to interact with another dimer.
It is remarkable that most of the histones having N-or C-terminal tails or N-or C-terminal truncations additionally have substitutions in the dimer-dimer and/or stacking interface that will affect hypernucleosome formation. Histones with reduced ability to form compact hypernucleosomes are expected to exhibit different roles in shaping the genome, like simple DNA bending or site-specific interference with histone multimerization. Interestingly, the genomes of several organisms encode histones that we predict are able to multimerize as well as histones that probably do not multimerize. This suggests that they may, in addition to directly binding to promoters, also be able to affect gene regulation by multimerization.

Histones in genome regulation
MNase-seq experiments have shown that histones position upstream and downstream of a promoter region [106]. This, in combination with knock-out studies showing both up-and down-regulation of transcription levels, leads to the hypothesis that histones are important for    and may play a similar role in other histone-coding phyla. The exact mechanisms by which histones act in regulation are at this moment largely unknown. What is the mechanistic role of histones in the regulation of gene expression? Is the hypernucleosome, with a mechanism analogous to that in bacterial gene repression, able to block promoter regions and other regulatory elements, thereby making them inaccessible to the transcription machinery [109][110][111][112]? In Bacteria, such a mechanism exists for H-NS and partition protein B (ParB) proteins, in which filaments laterally spread from a nucleation site, often a high-affinity DNA sequence [113][114][115][116]. Specific high-affinity sites have been identified both in vivo and in vitro in Archaea [61,106,117,118]. The role of such high-affinity sites may be to position the hypernucleosome on the genome and could be a key feature in archaeal genome regulation. In Archaea, cooperative lateral spreading of filaments has been reported for Alba proteins [40,42,119,120]. Also, promoter occlusion mechanisms and competitive binding of archaeal NAPs and transcription factors have been reported [45,121,122]. In addition, how dynamic are hypernucleosomes, and how does the cell control the size of the hypernucleosome in order for it to be functional? Is up-and down-regulation of histone expression important in fine tuning this process? Another option for control of hypernucleosome size is heteromerization of histone variants with different stacking propensity. Heteromerization of such histone variants, for instance HA and HB from Nanosalina J07AB43 (Table 2), could restrict hypernucleosome size to fewer subunits. Distinct expression patterns of histone variants at different growth phases or as a result of environmental cues such as osmolarity [59,107], may alter the composition and size of the hypernucleosome. However, so far, histone variants have been poorly studied in Archaea. The results of our predictions on hypernucleosome formation clearly point out the need for in vitro and in vivo studies explicitly addressing all of these questions.

Conclusion
Histones from Archaea and eukaryotes are similar in tertiary but not in quaternary structure when bound to DNA. While eukaryotic histones form octamers on the DNA, archaeal histones form filaments of variable size: hypernucleosomes. Important residues responsible for DNA Dimer-dimer interactions in the tetrameric interface are expected to be essential for hypernucleosome formation. Absence of bulky residues in the first loop and a high number of potential hydrogen bonds in the stacking interface will enhance the compactness and stability of the hypernucleosome. Likely, uncertain and unlikely stacking ability is indicated with +, ±, and −, respectively. a Dimer-dimer interface includes residues at positions 46, 49, 59, and 62. binding, dimer-dimer interactions, and stacking interactions are mostly conserved among Archaea, including Asgard Archaea, Bathyarchaeota, and other newly discovered Archaea. In these recently discovered Archaeal phyla, histone tails and truncated histone variants were also found. In terms of evolution, it appears that, based on fragmentary data derived from extant lineages, the hypernucleosome has progressively become more flexible as histones with N-terminal and C-terminal tails and additional terminal helices (like in H2A and H2B in the nucleosome) developed. Furthermore, the appearance of additional DNA-binding residues and positively charged N-terminal tails may have increased the affinity of histones for DNA [123]. These changes in dimer structure and DNA affinity may have stabilized octameric nucleosomes and disfavored multimerization. Specifically, the emergence of the eukaryotic H2A-H2B heterodimer blocked hypernucleosome formation since H2A lacks the dimerdimer interface, and H2B contains an additional helix at its C-terminus that blocks the stacking interface. The histone tails from Candidatus Heimdallarchaeota are likely to function in similar ways as those of eukaryotic histones. They are lysine rich and potentially subject to post-translational modification, thereby possibly affecting the histone's interactions with other actors. Alternatively, they may provide stabilization of the hypernucleosome via interactions with DNA in cis or in trans. Since it is believed that eukaryotes share their latest common ancestor with Candidatus Heimdallarchaeota, eukaryotic histones may have evolved from the predecessors of the tail-containing Heimdallarchaeal histones. As some histone proteins that have an N-terminal tail (Candidatus Heimdallarchaeota LC_3 HA and Bathyarchaeota archaeon B23) seem to form less stable hypernucleosomes, these histones may represent an evolutionary transition towards a different mechanism of gene regulation, switching from regulation by multimerization and compaction toward regulation by histone tail modifications.
Although the hypernucleosome structure is suggestive of stacking interactions between dimers in adjacent turns, experimental evidence for such interactions is lacking. Also, the functional role of tails, as well as truncates, has yet to be proven experimentally. In vitro hypernucleosome reconstitution experiments and in vivo foot-printing assays of species expressing nonstandard histones combined with mutation of the residues proposed to be involved in stacking interactions could answer these questions. Lastly, the existence of post-translational modifications of residues in archaeal histone tails, as well as their effect on transcription regulation, remains to be discovered and would give an important insight into the evolution of transcription regulation and genome folding from Archaea to eukaryotes.

Selection and alignment of archaeal histone sequences
We have included histones from every histone-encoding (candidate) phylum within the archaeal domain in our analysis. We show different histones from the same organism if the predicted stacking properties are very dissimilar. Sequences were aligned with Clustal Omega [124] using default parameters, removing gaps.

Analysis of potential hypernucleosome formation
Structural analysis of the selected archaeal histones and assessment of potential hypernucleosome formation was done by inspecting the conservation of residues that are important for multimerization in the published HMfB hypernucleosome structure [64]. Comparative multichain modeling was performed in MODELLER [125] using default parameters to construct dimer models of the archaeal histones. These models were superimposed onto HMfB dimers in the hypernucleosome crystal structure to assess whether alternative or additional interactions were possible in the different archaeal histone complexes.

Model of Heimdall HA tails in hypernucleosome
The molecular model of the histone HA dimer from the Heimdallarchaeota LC_3 genome was constructed by multitemplate modeling in MODELLER [125] using otherwise default parameters. The HMfB dimer in the hypernucleosome [64] was used as a structural template for the histone fold and eukaryotic histone H3 and H4 as structural templates for the N-terminal tails. An initial model for the Heimdall HA hypernucleosome was obtained by superimposing the HA dimer model onto HMfB in the hypernucleosome crystal structure, with either an H3-like or an H4-like tail conformation. To optimize the path of the tails through the DNA gyres and remove major steric clashes, the HA dimer model and surrounding DNA was excised from the initial model and water refined separately using High-Ambiguity Driven Docking (HAD-DOCK) [126], imposing ambiguous interaction restraints between HA residues 14-19 and the surrounding 3-bp section of DNA, using otherwise default parameters.