A Unique HMG-Box Domain of Mouse Maelstrom Binds Structured RNA but Not Double Stranded DNA

Piwi-interacting piRNAs are a major and essential class of small RNAs in the animal germ cells with a prominent role in transposon control. Efficient piRNA biogenesis and function require a cohort of proteins conserved throughout the animal kingdom. Here we studied Maelstrom (MAEL), which is essential for piRNA biogenesis and germ cell differentiation in flies and mice. MAEL contains a high mobility group (HMG)-box domain and a Maelstrom-specific domain with a presumptive RNase H-fold. We employed a combination of sequence analyses, structural and biochemical approaches to evaluate and compare nucleic acid binding of mouse MAEL HMG-box to that of canonical HMG-box domain proteins (SRY and HMGB1a). MAEL HMG-box failed to bind double-stranded (ds)DNA but bound to structured RNA. We also identified important roles of a novel cluster of arginine residues in MAEL HMG-box in these interactions. Cumulatively, our results suggest that the MAEL HMG-box domain may contribute to MAEL function in selective processing of retrotransposon RNA into piRNAs. In this regard, a cellular role of MAEL HMG-box domain is reminiscent of that of HMGB1 as a sentinel of immunogenic nucleic acids in the innate immune response.


Introduction
Integrity of the germ cell genome is central to sexual reproduction. Gamete development presents an ideal environment for the selfish propagation of transposable elements (TEs) such as LINE-1 (L1) [1][2][3][4][5]. In mammals, retrotransposon expression peaks during a period of genomewide reprogramming of the embryonic germline but is subsequently extinguished in a sex-specific manner [6][7][8][9][10][11]. Retrotransposon dysregulation is associated with an accumulation of DNA damage, meiotic abnormalities, chromosome segregation errors and embryo lethality [12][13][14][15][16]. A prominent role in transposon control belongs to the piRNA pathway that operates through a conserved group of primarily germline-restricted factors, including Piwi proteins, Tudor domain containing proteins, and a large set of accessory proteins required for various aspects of piRNA biogenesis and function [17][18][19][20][21][22]. for 12-16 hours. The cells were collected following day by centrifugation, washed once with 1xPBS and resuspended in lysis buffer (1x PBS, 5% glycerol, 1mM PMSF, 1mM TCEP, 1mM MgCl 2 , protease inhibitors (Pierce)). The cell suspension was supplemented with Lysozyme (Sigma) to the final concentration of 1-2 μg/ml and incubated on ice for 30 minutes with occasional mixing. Lysed material was then sonicated (4 repeats, 20 second sonication, 50% duty, Misonix 3000) with one-minute incubation on ice between repeats. The sonicated mixture was spun (4°C, SS-34 rotor, 30 minutes, 18000 rpm) and the supernatant carefully moved to syringe and filtered with Millex HV filters (Millipore) to remove contaminants. The GST-fusion protein was purified by gravity on the glutathione agarose resin (Sigma) at 4°C unless otherwise noted. The filtered lysate was bound to the glutathione resin, washed with 5 column volumes of low salt buffer (LSB: 1xPBS, 5% glycerol), 5 column volumes of high salt buffer (HSB: 1xPBS, 5% glycerol, 1M NaCl) and again with 2 column volumes of LSB. The protein was eluted with 10 mM reduced glutathione in LSB (pH *8.5). To remove the GST tag, the eluate was supplemented with 1mM EDTA, 1mM TCEP, PreScission protease (GE) and incubated at 4°C for 12-16 hours. Phospho-cellulose (PC) columns (2.5 ml) were prepared from dry PC resin (Whatman P-11) following procedures provided by Lorsch Lab (NIGMS). Briefly, 0.8g of the resin was stirred into 125ml of 0.5N NaOH for 5 minutes. After that the resin was washed with water until pH < 11 at which point 125 ml of 0.5N HCl was added and the solution was stirred for 5 minutes again. The mixture was then washed with water until pH > 4, at which point the resin was poured into disposable columns and equilibrated with desired buffer until pH IN = pH OUT . The PreScission digested glutathione column eluate was diluted with B0 buffer (B0: 20mM Hepes pH 7.4, 10% glycerol, 2 mM DTT, 0.1 mM EDTA) to lower the salt below 75 mM. The diluted digest was then loaded onto 2.5 ml PC columns and allowed to bind by gravity. The column was washed with 5 column volumes of B100 buffer (B0 + 100mM NaCl). The protein was eluted with B500 buffer (B0 + 0.5M NaCl). The fractions with A 280 > 0.05 were pooled and their buffer exchanged (PD-10 desalting columns) to storage buffer (SB: 1x PBS, 5% glycerol, 1 mM TCEP). To remove residual PreScission protease and un-cleaved protein, the eluate was passed over 0.5 ml glutathione/0.5 ml Ni-NTA column. The final eluate was concentrated using Vivaspin 6 centrifugal concentrators (Satorius) with 3 MWCO to desired protein concentration (>1 mg/ml). The concentration was measured at A 280 in 6M Gnd-HCl, 20 mM Sodium Phosphate pH7.5 using calculated extinction coefficient and molecular weight (http://www.expasy.org) on NanoDrop 2000c. The final protein was aliquoted, flash frozen and stored at −80°C until use.

Circular dichroism (CD) spectroscopy
CD measurements were collected on Aviv 420 instrument (Aviv Biomedical). Far-UV spectra were collected in 0.1 cm cuvette at 25°C. All proteins were 94-residues long at 0.1mg/ml concentration. The Samples were in 1x PBS, 5% glycerol at room temperature. The data were processed in Numbers (iWork, Apple Inc.) as previously described [47].

Simple substrate preparation
The DNA oligonucleotides for each substrate were purchased desalted without purification (Operon) (S2 Table). The RNA oligonucleotides were ordered desalted with HPLC purification (Sigma Proligo) or prepared by in vitro transcription of PCR products with T7 promoter using HiScribe T7 high yield RNA synthesis kit (NEB) following manufacturers' protocol (S3, S4, S5 Tables). Briefly, the synthesized RNA was purified with acid phenol:chloroform and precipitated with isopropanol. The precipitate was diluted in 1x CutSmart buffer (NEB), supplemented with RNAseIN inhibitors (Ambion) and de-phosphorylated with alkaline phosphatase (NEB).
The precipitation was repeated. The precipitate was then purified on TBE-UREA polyacrylamide gels (Live Technologies) and the RNA was purified following crush-and-soak method [48]. Briefly, the appropriate size band was excised from the gel, crushed in the presence of PAGE elution buffer (0.3M NaOAc, 10 mM Tris 8.0, 1 mM EDTA pH 8.0) and frozen at −80°C for 30 minutes. The RNA was eluted by shaking the mixture overnight at 37°C and precipitated with isopropanol. The molar concentration was calculated based on the A 280 readings in 10 mM Tris pH 8.0. The RNA was stored at −80°C until use.

Complex substrate preparation
The oligonucleotides were designed with sufficient overlap and homology to specifically anneal. To create a four-way junction, 10 μl (100 μM) of each oligonucleotide was mixed in the annealing buffer (1x: 70 mM Tris pH 7.5, 10 mM MgCl 2 , 100 mM NaCl) to a final volume of 200 μl. The mixture was incubated in 95°C water bath for 5 minutes and allowed to slowly cool to room temperature. The annealed substrate was precipitated with EtOH (DNA) or isopropanol (RNA). Approximately 10 μg of annealed substrate was diluted in the binding reaction without protein (1x: 10 mM Potassium Phosphate pH 7.5, 50 mM KCl, 5% glycerol, 1 mM TCEP, 2.5 mM MgCl 2 ), loaded into single lane of 12% Native page gel and ran at 105V for 1-2 hours. Lower concentration of the acrylamide was used for the RNA substrates > 75 bases. The bands were visualized using short wavelength UV shadowing and appropriate bands were excised and purified following crush-and-soak method described earlier. The molar concentration of each substrate was calculated using molecular weight and A 280 readings. The oligonucleotides were aliquoted at desired concentration and stored at −80°C until used. All double stranded (RNA, DNA) substrates were annealed and purified in the same fashion. The RNA oligonucleotides for hairpin substrates were ordered HPLC-purified (Sigma Proligo).

RNA substrate structural considerations
To simplify interpretations, all the substrates made from ssRNA were designed with potential secondary structural characteristics in mind. The sequences were submitted to the Mfold server [49], using standard settings to identify thermodynamically favorable confirmations. All structures with negative free energy (ΔG) were considered as likely within the ensemble of tested RNA. The structures with +ΔG were considered as unlikely. This is based on the fact that base pairing provides −ΔG to RNA molecule allowing for spontaneous folding and secondary structure formation [50].Therefore, in Mfold analysis, sequences that produce structures with only +ΔG are considered single-stranded, whereas an ideal hairpin sequence would produce only single structure with large −ΔG. S3 Table contains the free energies and structures of the tested substrates identified by Mfold.

Gel shift assays
The substrates were diluted to desired concentrations and end-labeled with γ-P 32 using PNK (NEB). To account for number of ends, 5 μM of the hot ATP were used per 1 μM of DNA four-way junction. Unincorporated label was removed on P30 columns (Bio-Rad). To control for the loss of the shorter substrates on the P30 column, multiple substrates were labeled at the same time and their concentrations were normalized to the DNA 4WJ (largest substrate) using relative incorporated scintillation counts. Such prepared substrates were stored at 4°C until use, unless folding was required. To fold, the RNA substrates were supplemented with salts (50 mM NaCl, 2.5 mM MgCl 2 ) and heated to either 55°C (< 50 bases) or 95°C (>50 bases) for 3 minutes and allowed to slowly cool to RT. The folded RNA was stored at 4°C until use. The protein was thawed on ice and then serially diluted to desired concentrations in water. The binding reaction was assembled by mixing the protein in binding reaction consisting of (1x) 10 mM Potassium Phosphate 7.5, 50 mM KCl, 5% glycerol, 1 mM TCEP, 0.1 mg/ml BSA, 2.5 mM MgCl 2 . The labeled substrate was added last to *1nM concentration in 10 μl final volume. The reaction was then incubated at room temperature for 30-60 minutes to equilibrate. The 12% native polyacrylamide (29:1), 1 mm thick TBE (1x) mini-gels gels were pre-run wit 0.5x TBE running buffer for 30 minutes at 105V in ice water-bath. The wells were briefly rinsed, and 5 μl of the binding reaction was then carefully loaded onto running gels. The gels were run at constant 105V for long enough (1-4 hours) to achieve sufficient complex separation. At the end of the run, gels were extracted, rinsed, dried onto 3 mm Whatman paper at 80°C for 90 minutes, and exposed to storage phosphor screen for 12-24 hours. The image was acquired using Storm 860 molecular imager (Molecular Dynamics) with 100-micron resolution. The large RNA substrates were treated the same way but the complexes were resolved on large 6% native gels. In the competition experiments, the binding reactions were setup in the same manner as above but with protein concentration held constant and sufficient to achieve between 60-90% binding. Serially diluted unlabeled (cold) substrate was added up to 1 uM final concentration prior to addition of the radioactively labeled (hot) substrate.

Data analysis
The images obtained from the Storm 860 were analyzed in FIJI (GPL). The region of the gel was extracted, the pixels inverted onto black background, and background subtracted uniformly amongst all images. For each lane the region free and the region bound were selected using gel analysis feature and the area under the curve quantified using wand tool. Multiple complexes were all included in the region bound. The fraction bound (F b ) was calculated using Equation (1) and data plotted as the fraction bound versus protein concentration using Prism6 software (GraphPad). To calculate dissociation constant (K D ), data was fit to modified Hill Equation (2). The cold competition data was plotted as the fraction of bound hot substrate versus the concentration of the cold substrate using Equation (3) and the dissociation constant of competitor (K C ) was calculated with Equation (4). All parameters in Equations (2,3,4) were described previously [51].
HMG alignment and modeling The full-length mouse Maelstrom sequence (434 residues) was submitted for tertiary structure prediction to the Robetta online server [52]. The .pdb files were retrieved and analyzed in PyMOL. The same process was followed for the Drosophila melanogaster Maelstrom HMGbox domain (residues 1-86). The .pdb files examined are provided in supplement. Nucleotide sequences of the candidate sequence-specific (SRY, SOX) and non-sequencespecific (HMGB, Dsp1) HMG domains were obtained from NCBI and 86 residue region encompassing HMG box was selected for the alignment. The sequence id indicates protein name + species + start residue + number of consecutive residues extracted. Codon alignment was performed using the ClustalW algorithm built-in MEGA6 package, without changing pre-set parameters. The aligned nucleotides were translated to protein using standard genetic code and the alignment of protein repeated using built in ClustalW algorithm. No changes to codon alignment occurred. This alignment was manually refined using experimentally determined structural information to account for the secondary characteristics such as helices and loops. Following PDB structures were used: SRY-1j46 [36], HMGB1a-1ckt [41], MAEL HMG-2cto [45]. The final alignment was exported and the residues colored according to Taylor color scheme to reflect biochemical characteristics of various residues [53]. The final alignment along with annotation of the secondary structural elements is shown in S1A Fig. The alignment was then used to generate maximum likelihood tree using MEGA6 [54,55] built-in algorithms with the following settings: 1000 Bootstrap replicates, Jones-Taylor-Thornton model of amino acid substitutions, uniform site-rates, complete deletion of gaps and missing data, Subtree-Pruning-Regrafting-Extensive at level 5, very strong branch swap filter. The generated tree was visually adjusted in built-in tree editor and is presented in Fig. 1C. The log likelihood of this tree is −1943.6 and each branch is annotated with the bootstrap values representing the percentage of trees where the associated sequence clustered together. The tree branch scale represents number of substitution per site based on the considered 73 completely conserved positions amongst 17 compared sequences.
Large RNA structure determination Previously described MAEL RIP-Seq data sets mapped to mm9 assembly of the mouse genome, shown to be enriched in transposon RNA, were used for the identification of over-represented regions [30]. The sets corresponding to control Igg, MAEL_A RIP and MAEL_B were analyzed with macs software (version 1.4.2) [56] with the standard settings to identify regions enriched in replicates A and B over Igg. The identified regions between the two replicates were pooled and intersected using bedtools (v2.20.1) [57] to identify only the common regions. All intervals were then annotated using annotatePeaks program from HOMER suite [58]. The regions annotated as LINE1 elements were extracted and their coordinates examined in IGV (Broad Institute), considering only the regions within annotated LINE1 elements. Multiple coordinates corresponding to regions with a peak appearance at least 250 nucleotides-wide were selected, and their nucleotide sequences extracted from the UCSC genome browser. These were then aligned using ClustalW (EMBL-EBI) and the alignment manually curated until the region of high sequence conservation was identified. The final alignment had 5 regions corresponding to LINE1 elements of Md_F2 family that were located on different chromosomes (S5A Fig.). Coverage across each identified region was calculated using its coordinates and the bedtools multicov program [57]. The results were plotted in Numbers (iWork, Apple Inc.) (S5B Fig.). This alignment was used for determination of the secondary structure according to previously described methodology [59]. The covarying nucleotides used to constrain Mfold [49] are provided in S6 Table. The region with lowest dG (chr10) was tested in gel shift assays.

Structural overview of Maelstrom
To gain insights into the function of mouse Maelstrom (MAEL) protein, we first used Robetta protein structure prediction server [52] to predict its tertiary structure. The server utilizes sequence homology with previously determined structures as parents for the structure prediction. De novo methods are used if these are not available. In accordance with Maelstrom gene annotation (UniProt) and previous analysis [32,33], we have annotated the resulting structure with two domains: an N-terminal HMG-box and a MAEL-specific domain (MSD) (Fig. 1A). The predicted structure of mouse MAEL HMG-box domain is based on a previously obtained H-NMR structure of HMG-box domain of human MAEL protein (PDBID: 2cto) [45]. The MSD domain has been previously computationally predicted to assume an RNase H-like fold [32,33]. In agreement with these studies, Robetta utilized an exonuclease structure (PDBID: 1zbh-chain A) as a parent molecule for this domain. The C-terminal sequence of MAEL protein was modeled de novo as it appears unique. The predicted structure shows the HMGbox domain on the surface and not encapsulated by the rest of the molecule. Instead, it is connected with the MSD domain by an approximately 30-residue linker region that appears devoid of any secondary structural elements (Fig. 1A). Based on the sequence composition, this linker region is predicted to have high propensity for intrinsic disorder, which could account for insolubility that we have encountered while attempting to purify recombinant full-length or truncated Maelstrom proteins. The fact that the MAEL HMG-box domain is not buried, but instead connected to rest of the protein with an unstructured linker reaffirmed our interest in understanding its function.

The HMG-box domain of Maelstrom
MAEL is the only known HMG-box domain-containing protein in the piRNA pathway. Structurally, all HMG-box domains have a characteristic L-shape fold of three helices (Fig. 1B) [39]. Like the HMG-box domains of SRY and HMGB1 proteins, mouse MAEL HMG-box domain also possesses this basic fold. However, it has acquired novel features, most prominently a distinguishable bend in helix-2 apparent from a simple structural alignment (Fig. 1B). This change of geometry gives helix-2 the appearance of a "hook". Because the equivalent region is known PDB ID numbers are shown. (B) Comparison of MAEL HMG-box and canonical HMG-box domains. Determined structures of candidate HMG-box proteins (SRY: 1j46-sequence specific binding; HMGB1a: 1ckt-structure specific binding; MAEL: 2cto-unknown binding) were visualized and structures aligned in PyMOL. MAEL HMG-box domain has a conserved canonical L-shape fold but with a bend in helix-2, creating a novel region termed "hook" (red). (C) Distribution of charged residues of mouse MAEL HMG-box domain. Positive residues-Arg, Lys, His are blue; negative residues-Asp, Glu are red. Charged residues are concentrated on side B. Unlike other HMG-box domains, in MAEL three arginine (R) residues are concentrated at the end of the helix-1, and protrude outwards, forming a "propeller"-like shape. (D) Phylogenic comparison of MAEL HMG-box domain with well-studied candidate HMG-box domains from sequence-specific (single HMG-box, SRY, Sox) and non-sequence-specific (two HMG-boxes, HMGB's, Dsp1) groups. Mouse sequences were used unless otherwise noted (Dm: Drosophila melanogaster, Hs: Homo sapiens). The phylogenetic tree was generated using maximum likelihood method in MEGA6 software. Values next to the branches describe percentage of trees where associated sequences group together (n = 1000). The branch length scale is in substitutions per site. MAEL HMG-box domain forms a new branch most closely related to the domain A of non-sequence specific HMG-box proteins. to be important for binding of canonical HMG-box domains [39], we predict this "hook" region to have functional consequences for the mouse MAEL HMG-box domain. Surface rendering of this domain shows that the region is bulky, containing three consecutive arginines at the C-terminus of the helix-1 that form what appears to look like a "propeller" (Fig. 1C). The consecutive positively charged residues are present in the canonical HMG-box domains (S1 Fig.). However, in sequence-specific (SS) binders these residues are located internally in helix-1, while in non-sequence-specific (NSS) binders these are not arginines (S1 Fig.). The presence of arginines is significant due to their ability to form multiple H-bonds with nucleic acid bases or the backbone [60,61]. In addition, arginine residues may also be involved in recognition of specific motifs within RNA [62]. Importantly, the "hook" and the "propeller" are specific to mouse MAEL HMG-box, and could be of functional significance.
To infer the domain relationships, we performed multiple sequence alignment of candidate HMG-box domains from SS and NSS groups with the mouse, human and Drosophila Mael HMG-box domains (S1 Fig.). This analysis showed that the MAEL HMG-box domains form a separate branch on the phylogenic tree (Fig. 1D). In addition to the described structural differences specific to the mammalian MAEL HMG-box domain, this implies that there are other features common to the MAEL HMG-box domain homologues that may be important. The MAEL HMG-box domains are most closely related to domain A of non-sequencespecific binders, but differ from these in their distribution of charged residues (S1 Fig.). In the mammalian MAEL HMG-box domains, the loop connecting helices-1 and 2 does not contain charged residues, and helix-2 is devoid of the positively charged residues that are present in all other groups (S1 Fig.). While charged residues in other domains seem to be alternating from helix-1 to helix-2, the mammalian MAEL HMG-box domain has concentrated positive residues, which form a novel region. The distribution of charged residues is indicative of an H-bond potential that, together with non-polar regions, can provide the biochemical basis for strong interactions with appropriate substrates. These features vary in Drosophila Mael HMG-box domain that is still distinct from canonical HMG-boxes (Fig. 1C,  S1A Fig.). The HMG-box domain of the Drosophila Maelstrom protein has evolved distinct features from its mammalian homologues, which perhaps reflect specie-specific specialization required for its functions. Nevertheless, all analyzed MAEL HMG-box domains, while related to the canonical HMG-box domains, have evolved characteristics that set them apart, and these are likely to influence their function.

Binding of MAEL HMG-box domain to DNA
To evaluate the biochemical activity and validate our previous analysis of MAEL HMGbox domain, we expressed HMG-box domains of SRY (SS), HMGB1a (NSS) and murine, and Drosophila Mael HMG-box domains in bacteria (S2A Fig.). The recombinant proteins were purified to homogeneity and then CD analysis was used to confirm their tertiary structure (S2B-D Fig.). To evaluate the ability of HMG-box domains to bind nucleic acids, we used gel shift assays where we titrated increasing amounts of protein of interest to known concentration of labeled (hot) substrate and determined the dissociation constant (K D ). When appropriate, we utilized competition assays, where the protein concentration was kept constant (60-90% total binding) along with the concentration of the hot substrate, and instead the unlabeled (cold) substrate was titrated in. This allowed us to estimate the competitor dissociation constant (K C ) that is directly related to the dissociation constant obtained from the binding assay. Considered together these two constants provide an estimate of the observed binding kinetics obtained by gel shifts.
We first evaluated DNA binding of SRY HMG-box and HMGB1a recombinant proteins ( Fig. 2A). Previous studies have determined that the SRY HMG-box domain binds dsDNA in a sequence-specific manner with consensus sequence AACAAN [34]. The SRY HMGbox domain recognizes the sequence through a number of minor groove interactions, intercalates a residue at the beginning of its helix-1 between the bases, bending the helical backbone and allowing for accommodation of rest of the helix in the minor groove [36]. Due to its specific residue composition, the SRY HMG-box domain is able to bend the DNA. Consistently, recombinant SRY HMG-box domain bound its consensus sequence strongly (average K D = *12 nM) and specifically forming a single complex ( Fig. 2A, S3A Fig.). In contrast, HMGB1a is known to bind pre-bent but not unperturbed dsDNA as it lacks the residues required for DNA bending [38,39,41]. This is in accordance with our observation that HMGB1a does not bind same dsDNA (Fig. 2B).
The MAEL HMG-box domains do not bind to single stranded (ss) (S3B Fig.-mouse), dsDNA ( Fig. 2C-mouse, S3C Fig.-Drosophila), or dsDNA methylated at CpGs (S3D Fig.-mouse). We have tested the mouse and Drosophila MAEL HMG-boxes with multiple dsDNA substrates containing canonical HMG-box motifs and non-canonical sequences, however we failed to detect complex formation under our conditions (S2 Table). Likely the reason for lack of mouse MAEL HMG-box binding is the "hook" region that prevents accommodation of the protein helices in the dsDNA groves even when the dsDNA is modified or pre-bent (Fig. 2D). Even though sequence and predicted structure of the Drosophila Mael HMG-box do not show a homologous "hook" region, it is still unable to bind dsDNA (S1 Fig., S3D Fig.). The presence of two arginines in the Drosophila domain suggests that an analogous feature may also be present (S1 Fig.). We have not tested all possible sequences for binding, however additional sequence permutations would only produce dsDNA with the B-type helix. Therefore, it is highly probable that the structural characteristics of the MAEL HMG-box domain will prevent any significant interactions in a sequence-specific manner.
A common observed characteristic of the HMG-box domains is the ability to bind to DNA four-way junctions (4WJ) in their open conformation [40,63,64]. These junctions are comprised of four double stranded arms with a central junction where the strands sharply turn and the helical grooves widen, essentially providing a pre-bent and open site for binding. To determine whether mouse MAEL HMG-box domain has retained this characteristic, we have compared its binding with that of SRY HMG-box and HMGB1a, both of which bound DNA 4WJ, to readily form multiple complexes (Fig. 2E-F). SRY formed five complexes with DNA 4WJ while HMGB1a formed two. However, MAEL HMG-box domain was able to form only a single complex even at high protein concentrations (Fig. 2G, S3D Fig.). Binding of SRY and other HMG-box domains to DNA 4WJ, as well as dynamics and structure of DNA 4WJ are well described [40,[63][64][65][66][67][68][69][70][71][72][73][74][75][76][77][78]. In accordance with these previous observations, the five SRY-DNA 4WJ complexes are likely the products of binding of SRY HMG-box domains to the 4WJ open center and to the AT-rich sites in double-stranded arms that approximate SRY recognition sequence (Fig. 2H). The two HMGB1a complexes likely represent two protein domains symmetrically accommodated at the irregular center of the junction. Only a pre-bent center can be bound due to the lack of intercalating residues required for bending of and consecutive binding to unperturbed dsDNA by HMGB1a [41,68,79]. Like HMGB1a, the MAEL HMGbox domain does not bind to dsDNA, but unlike HMGB1a, only a single complex is formed with the DNA 4WJ (Fig. 2H). A possible explanation for this is that the "hook" and "propeller" regions are accommodated at the open center of the junction, however, their bulkiness prevents accommodation of the second protein.
Taken together, the above experiments show that MAEL HMG-box domain does not bind to dsDNA, however it is able to form a single strong and specific complex with DNA 4WJ (average K D = *14 nM, S3E Fig.). This mode of binding is different from that of tested canonical HMG-box domains (Fig. 2E-F). Considering the unique structural characteristics of MAEL HMG-box domain, we propose the "hook" and "propeller" regions may play an important role in what appears to be a not sequence but a structure-specific mode of binding.

Binding of MAEL HMG-box domain to RNA
The MAEL is an important member of the piRNA pathway and specifically immunoprecipitates with piRNA precursor transcripts and transposon mRNAs [30]. Furthermore, the presence of RNAseH-like domain suggests that MAEL may operate in the RNA context [32,33]. Considered together with herein observed binding to structured DNA, we wanted to probe MAEL HMG-box binding to RNA.
The cellular RNAs exist as single-stranded molecules that are capable of forming intricate secondary and tertiary structures [50]. Therefore, using Mfold, we have identified conformational ensembles of RNAs to be tested for MAEL HMG-box binding (see Methods, S3 Table). While we did not observe any binding of the MAEL HMG-box domain to ssRNA (Fig. 3A), we did detect a weak complex formation with dsRNA (Fig. 3B). The MAEL HMGbox domain did not bind to small hairpin structures (Fig. 3C-D), but formed weak complexes with larger RNA hairpins (Fig. 3E-G). Of these, the strongest binding was observed with the RNA hairpin that carried the longest continuous dsRNA stem (9 base pairs) and hairpin loop (7 bases) (Fig. 3F). Because only *40% of this substrate was bound at a relatively high protein concentration, we were not able to calculate binding parameters. Lastly, we tested the RNA counterpart of the DNA 4WJ junction used previously (Fig. 2G, H) of the same nucleotide sequence but with RNA bases (S3 Table). MAEL HMG-box domain bound well to this substrate with single complex forming at lowest tested concentrations and multiple complexes formed at highest concentrations (Fig. 3H). Even though complete binding was not achieved, the binding strength was moderate (K D = 0.638 μM, S4A Fig.). The first complex formation resembled the interaction observed with DNA 4WJ, but never reached completion. Formation of the large complex was not observed previously with DNA 4WJ (Fig. 2G). In order to get at the specificity of the binding, we attempted a competition assay with the cold substrate, but the results were rather puzzling. Instead of cold substrate titrating away the protein from hot substrate, all of the hot substrate shifted to large complex (S4B Fig.).
The above results suggest that MAEL HMG-box domain prefers RNA hairpins with completely base-paired stems with adjacent loops larger than 4 bases to ones with disrupted double-stranded regions and smaller hairpins. MAEL HMG-box domain interacts better with RNA 4WJ than with other RNA tested earlier. However, unlike with DNA 4WJ, it forms a large complex (Fig. 3H). Furthermore, instead of being titrated away, this complex became predominant with additional RNA in the reaction (S4B Fig.). A description of identical RNA 4WJ by others [80] suggests a possible explanation for the presence of these larger complexes. Unlike a DNA 4WJ that primarily exists in the open conformation [63], the RNA counterpart is more dynamic, undergoing multiple structural transitions [63,80]. Therefore, its ensemble is largely composed of structures with dsRNA arms that are adjacent to each other either in parallel or antiparallel orientations (S4C Fig.). It is thus possible that RNA helices in proximity to each other form a structurally unique region that accommodates multiple MAEL HMGbox domains at once. This mode of binding is supported by the arginine-rich sequence and the structure of MAEL HMG-box domain (Fig. 1, S1 Fig.). These positive residues in the "hook" and the "propeller" regions are distributed such that they span almost 270 degrees, providing sufficient rotational freedom for the rest of the domain to be accommodated in multiple ways. Arginine-rich peptides are enriched in other RNA-binding motifs and, have previously been implicated in facilitating complex protein-RNA interactions through "arginine-fork" phenomena [62,81,82]. Therefore, formation of large complexes with RNA 4WJ may be due to the interaction of arginine-rich regions of MAEL HMG-box domain with the closed portion of 4WJ ensemble. In the presence of additional RNA 4WJ in the reaction, the ensemble of the structures effectively changes to favor closed conformations (S4C Fig.). HMGB1a binds RNA 4WJ similarly to its DNA counterpart, progressively forming larger complexes as more protein is bound (S4D Fig., Fig. 2F). In comparison, MAEL HMG-box domain forms a single complex with DNA 4WJ. A similar complex is observed with RNA 4WJ, but an additional complex is observed without apparent intermediate states (Figs. 2G and 3H, S3E and S4B Figs.). Additionally, the positive residues found within HMGB1a are all lysines, which are not capable of interactions equivalent to arginines despite their similar charge [62]. Taken together, the RNA binding data suggest that the MAEL HMG-box domain binds to RNA in a complex manner employing its arginine-rich "hook" and "propeller" regions to bind to structured RNA.

MAEL HMG-box domain mutagenesis
In order to determine whether arginines in the "hook" and "propeller" regions of the MAEL HMG-box domain are important for binding, we have mutated the individual arginines to alanines. Additionally, we have mutated the glutamine (Q16) along the helix-1 and the arginine (R8) in N-terminus of the helix-1 to see whether polar residues within these regions are also important for binding (Fig. 4A). Mutation Q16A in the middle of helix-1 had no effect on binding to the DNA 4WJ, however changed the complexes formed with RNA 4WJ (Fig. 4B, B`). Differential binding to DNA versus RNA 4WJ could be due to differences in ensemble composition of the two junctions, which accommodate MAEL HMG-box domain in very different fashions despite identical sequences. The R8A mutation completely abolished binding to both DNA and RNA, indicating that this residue may be important (Fig. 4C, C`). However, upon further inspection, the secondary structure of R8A was found affected (S2B Fig.), therefore, the loss of binding might reflect changes in protein folding. In contrast, mutations R23A and R25A in the "propeller" region and R31A in the "hook" region had no effect on protein folding but completely abolished binding to DNA 4WJ (Fig. 4D-F, S2B Fig.). This result supports our previous hypothesis that arginines in this region are distributed such that they form multiple contacts with the perturbed region of this substrate. Binding of the same three mutants (R23A, R25A, R31A) to RNA 4WJ was significantly decreased (Fig. 4D`-F`) compared to wild type (Fig. 3H). Mutating "propeller" residues (R23A, R25A) allowed for the formation of a large complex, but mutation of the "hook" arginine (R31A) abolished it. Instead, a small amount of a new complex at an intermediate position was observed (Fig. 4F`). These results also support our previous conclusions highlighting the importance of arginines in the "hook" and the "propeller" regions. It has been previously noted that even individual arginine residues are able to exert some degree of binding through "arginine-fork" phenomenon [62]. Therefore, considering the structural differences and the ensemble complexity of RNA 4WJ (versus DNA), the observed small amount of the complex in a single mutant binding assays may be a result of three arginines still being present and accommodated in one of many possible orientations.
The overall mutational analysis of the MAEL HMG-box domain has revealed that arginines in the "hook" and "propeller" regions are essential for binding, supporting our sequence and structure analyses and the interpretation of previous binding experiments. Taken together, our results point towards structured RNA as the preferred substrate for the MAEL HMGbox domain.

MAEL HMG-box domain binding to large structured RNA
The apparent preference of MAEL HMG-box domain for structured RNA is in agreement with the results of our analysis of MAEL immunoprecipitates (IP). MAEL protein complexes immunoprecipitated from the adult mouse testis lysate are specifically enriched for the fragments of piRNA precursor RNAs and retrotransposon mRNAs [30]. Therefore, we explored the possibility that the MAEL HMG-box domain bound to endogenous long RNAs. We searched C, D) MAEL HMG-box does not bind to small RNA hairpins. E) Only weak complex formation is observed with hairpin that has mismatches in the stem. F) But when the stem is perfectly base-paired, the binding is stronger and appears increased than that observed with dsRNA (B). G) Binding of MAEL HMG-box to substrate with multiple short hairpins is weak. H) However, binding to RNA 4WJ (sequence identical to DNA 4WJ) is strong and two complexes are formed. These observations indicate that MAEL HMG-box prefers to bind to substrates with continuous dsRNA helices longer than 6 base-pairs located near unstructured or perturbed RNA regions.  (C-F) Mutation of individual arginines (R) results in the complete loss of binding to DNA 4WJ suggesting that they are essential for complex formation. The R8A mutant fold appears to be affected (S2B Fig.), which might be responsible for reduced binding. Therefore, only arginines in the "hook" and "propeller" seem to be necessary for MAEL HMG-box domain binding to non-canonical DNA. (B`-F`) Interactions of mutants with RNA 4WJ are significantly affected. All mutants show greatly decreased binding to RNA 4WJ as well as variability in the type of complex formed when compared to wild-type protein (Fig. 3H). Residual complex formation was still observed (except for R8A) likely due to a high number of configurations that RNA 4WJ can take to accommodate the MAEL HMG-box domain. Nevertheless, the arginines in the "hook" and "propeller" of MAEL HMG-box are important for binding to structured RNA.
doi:10.1371/journal.pone.0120268.g004 MAEL IP RNA-Seq data for enriched L1 sequences and identified a fragment of the L1_Md_F2 element (S5A-B Fig.). Repeated and structured regions in mouse L1 elements have been previously described but no specific recognition signature has been defined [83,84]. Under the assumption that retrotransposon L1 RNA can be under positive selective pressure to retain some structural features, we determined limited secondary structure of the identified regions of L1_Md_F2 using a combination of covariation and thermodynamic approaches. Such secondary structure would allow us to examine various features that may be bound by MAEL HMGbox domain. To do this, we followed methodology applied previously to determine the structure of yeast telomerase flexible scaffold [59]. Initial attempts to determine the structure of a 277-nucleotide (nt) long piece of L1_Md_F2 with Mfold [49] produced multiple distinct structures making it impossible to identify their common structural features. After supplying the program with the identified covarying nucleotides, Mfold produced a single highly energetically stable structure for each region (S5C Fig.). It was reassuring to see that the structures originating from sequences located on different chromosomes resembled each other in both sequence and structure, with only minor variations. For further analysis, we selected the most energetically stable structure corresponding to the sequence from chromosome 10 (Fig. 5A). MAEL HMG-box domain bound this RNA substrate strongly, forming a single complex starting at 0.5 μM protein (Fig. 5B). Interestingly, the mobility of the complex in the gel was progressively more retarded with increasing amount of protein in the reaction. The binding of MAEL HMG-box domain to this long RNAs does not occur with the same kinetics as with DNA 4WJ (Fig. 2G), where a single domain binds to a single region in a non-cooperative manner. Instead, binding of long RNA appears to be highly cooperative, similar to the RNA 4WJ large complex (Fig. 3H, S4B Fig.), with consecutive molecules being bound after passing a certain concentration threshold of protein. In an attempt to identify the region that is recognized within this long RNA, we generated two shorter substrates, removing double-stranded segments of the stem, to create 200 and 149 nt long RNAs (Fig. 5A). While the full-length fragment was completely bound, only fractions of the shorter substrates were shifted even at the highest concentrations of protein (Fig. 5B).
These observations suggested that the removed stem somehow contributes to binding. Perhaps the double-stranded region constrains the ends of the RNA molecule allowing for unambiguous formation of the dual hairpin regions in the center. In this way, the ensemble of structures would be smaller with greater proportion of the preferred substrate. This would also explain the presence of the weak complexes and the lack of complete binding seen with shorter RNAs. Presence of additional RNA in reactions with the RNA 4WJ has led to full formation of large complexes, most likely also affecting the ensemble of structures (S4B Fig.). Importantly, testing the long RNA from a region adjacent to those recovered from MAEL immunoprecipitates failed to show any appreciable binding (Fig. 5C). Our analysis of the single transposon RNA is limited to a single structured fragment and flanking region and therefore its implication should be considered with caution. Nevertheless, with the previous biochemical observations, it raises the possibility that the MAEL HMG-box domain contributes structure-specific RNA binding ability to the MAEL protein and in such way may aid in selection of MAEL-immunoprecipitated RNAs.

Discussion
The aim of the study is to shed light onto the biochemical function of MAEL, a protein indispensable for the function of the piRNA pathway [24,28,30,85]. We focused on the N-terminal HMG-box domain, which is important for MAEL biological function [26]. Its classification implies DNA binding ability, which is the case for many canonical HMG-box domains [34][35][36].
However, MAEL has been almost exclusively linked to the piRNA pathway where it is essential for piRNA biogenesis [24,28,30] and localizes to cytoplasmic piP-bodies and chromatoid bodies, likely involved in retrotransposon mRNA and piRNA precursor RNA processing [24,30]. Additionally, retrotransposon RNAs are strongly enriched in MAEL immunoprecipitates [30]. Therefore, the evidence trail led us to hypothesize that the MAEL HMG-box domain is involved in RNA binding.
A plethora of non-sequence specific RNA binding domains have been described [86], but very few of them have the HMG-box domain [42,44]. Our sequence and structure analysis of MAEL HMG-box domain indicates that it belongs to this exclusive group. The MAEL HMGbox domain does not bind single-stranded, double-stranded, or modified DNA molecules in vitro despite the presence of the consensus HMG-box binding sites in the tested substrates. Of the DNA substrates, MAEL HMG-box domain only binds DNA 4WJs, where it likely interacts with the structured center like many other HMG-box domains [63,64]. On the other hand, it readily binds to RNA hairpins and forms multiple complexes with RNA 4WJ. As opposed to DNA, RNA junctions are far more prevalent in the cellular environment and commonly found in large molecules [87,88]. Furthermore, we describe a case where, MAEL HMG-box domain is able to preferentially bind to a large structured RNA molecule originating from MAEL immunoprecipitates. Based on these observations, MAEL HMG-box domain could provide structure-specific RNA-binding capability to the full-length MAEL protein.
We have identified a region within MAEL HMG-box domain rich in arginine residues that is responsible for complex formation with the structured nucleic acid substrates. The roles of the arginine-rich protein motifs in RNA binding have previously been demonstrated [82]. In MAEL HMG-box domain, the arginines residues form bulky "hook" and "propeller" regions providing a charged surface that, as our mutational studies showed, is required for strong and specific interactions with nucleic acids, likely through formation of arginine-forks [62]. The fact that even a single arginine mutation significantly affects the binding demonstrates that the composition and the architecture of these regions are important. Our work also suggests that MAEL HMG-box domain's "hook" and "propeller" regions set it apart from known HMGbox domains, contributing to the formation of a phylogenetically distinct group of MAEL HMG-boxes. Given the exclusivity of MAEL HMG-box domain in the piRNA pathway, it is tempting to speculate that it has diverged and acquired the described features to accomplish a novel function perhaps specific to the piRNA pathway. Such function could involve discrimination of L1 and piRNA precursor RNAs from other transcripts. An in vitro preference of MAEL HMG-box domain for structured nucleic acids, including RNA hairpins, four way junctions, and large structured RNAs are all in agreement with this hypothesis. We believe that the combination of new in vitro (RNBS, [89]) with in vivo techniques (HITS-CLIP, [90]) in the future will reveal whether this hypothesis is correct.
Lastly, a biochemical activity of the MAEL HMG-box domain in vitro is reminiscent of that of HMGB1a in terms of structure-directed binding. Interestingly, in addition to their prominent structural role in the nucleus, HMGB proteins have been shown to function as sentinels of immunogenic nucleic acids in innate cellular response [42,43]. A parallel presents itself where MAEL HMG-box may have diverged to aid in recognition of domesticated transposon RNAs. In this context, the piRNA pathway may be considered as an ancient arm of the innate immune response to protect genomes against retroviruses [91].
Supporting Information S1 Fig. Sequence and secondary structural comparison of HMG-box domains. An amino acid multiple sequence alignment of candidate HMG-box domains. The candidates were selected based on their substrate specificity (sequence vs. non-sequence specific) with at least four candidates per group, and with preferences for well-described and mouse HMG-box domains. Aligned codons (ClustalW) were translated to protein sequences (MEGA6), and then the alignment was adjusted to account for the secondary structural characteristics found in solved structures within each group (SRY HMG-box-1j46, HMGB1a-1ckt, MAEL HMG-box-2cto). The residues were pseudo-colored according to Taylor color scheme (JalView) to provide contrast to groups of residues. Conserved secondary structure characteristics and residues are shown below the alignment.