The evolutionary loss of the Eh1 motif in FoxE1 in the lineage of placental mammals

Forkhead box E1 (FoxE1) protein is a transcriptional regulator known to play a major role in the development of the thyroid gland. By performing sequence alignments, we detected a deletion in FoxE1, which occurred in the evolution of mammals, near the point of divergence of placental mammals. This deletion led to the loss of the majority of the Eh1 motif, which was important for interactions with transcriptional corepressors. To investigate a potential mechanism for this deletion, we analyzed replication through the deletion area in mammalian cells with two-dimensional gel electrophoresis, and in vitro, using a primer extension reaction. We demonstrated that the area of the deletion presented an obstacle for replication in both assays. The exact position of polymerization arrest in primer extension indicated that it was most likely caused by a quadruplex DNA structure. The quadruplex structure hypothesis is also consistent with the exact borders of the deletion. The exact roles of these evolutionary changes in FoxE1 family proteins are still to be determined.


Sequencing of FoxE1 genes
Genomic DNA samples of Macropus eugenii (Tammar wallaby) and Tachyglossus aculeatus (echidna) were kindly provided by Dr. Ke-Jun Wei (Comparative Genomics Group, Research School of Biological Sciences, Australian National University).PCR fragments of genomic DNA of FoxE1 of Macropus eugenii were generated using a set of PCR primers selected using conserved regions of the FoxE1 gene of Monodelphis domestica.A partial sequence of the FoxE1 gene of Tachyglossus aculeatus was determined by generating a set of overlapping PCR fragments using sets of PCR primers.PCR primers were designed to conserved regions, shared between the FoxE1 genes of Ornithorhynchus anatinus and Monodelphis domestica.In addition, a NotI fragment of the FoxE1 gene of Tachyglossus aculeatus was subcloned into the pCRII vector (Invitrogen) and sequenced with standard primers.Sequences of FoxE1 genes of Macropus eugenii and Tachyglossus aculeatus were deposited in the NCBI database under the accession numbers HM991738.1 and HQ111427.1.
The genomic DNA of Elephantulus rufescens was isolated from the blood of Elephantulus rufescens using a genomic DNA isolation kit (Qiagene).Blood of Elephantulus rufescens was kindly donated by the Philadelphia Zoo (philadelphiazoo.org).To amplify FoxE1 of Elephantulus rufescens, primers were designed based on conserved regions shared between the FoxE1 genes of Homo sapiens, Mus musculus and Loxodonta africana.PCR fragments obtained were sequenced by the automated Sanger method.FoxE1 gene sequences were assembled using software Accelrus gene 2.5 and Bio4Life 1.0.0.1.All designed primer sequences are available by request.

Sequence alignments and phylogenic analysis of Fox proteins
A phylogenic tree for the FoxE subclass was generated based on the forkhead DNA-binding domain sequences (100 residues) for FoxC, D, H, Q, E subclasses.Multiple sequence alignments were constructed using the software Clustal W [2] and T-COFFEE, version 7.7.1 (tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi)[3], and these sequences were converted into a cladogram using the software MEGA 3.1 (metameme.sdsc.edu).Sequence distances were calculated with Poisson correction, and a neighbor-joining method was used to construct the tree topology with bootstrap analysis of 1000 samples.

Plasmids and transfections
PucneoFox and PucneoFoxRev plasmids were obtained by cloning the hairpin-forming region of the echidna FoxE1 gene into the PucNeo plasmid, designed for DNA replication analysis in mammalian cells.PucNeo was obtained from pSV2neo plasmid (Stratagene) by inverting the region containing the ColE1 origin and ampicillin resistance gene [4].The sequence, corresponding to the hairpin-forming region of the echidna FoxE1 gene, accession number HQ111427.1 (5'-TTCAGCCTCAACGGGCTGGTGGCCGGGCCCCCGGGCCCGGCCGAGA TCCTGCAGCCGCCGTCGCAGCCC-3'), was first inserted into the blunt-ended Eco81I-site of the pYES-Bsg plasmid [5], followed by subcloning of FoxE1-containing HindIII fragment in two orientations into the blunt-ended AatII site of Pucneo.All cloning and DNA isolation steps were performed in E. coli XL1-blue strain (Stratagene).COS-1 monkey fibroblasts (ATCC CRL1650TM) were transfected with 5 μg of plasmid DNA using Lipofectin reagent (Invitrogen), according to manufacturer's instructions.

Analysis of single-stranded regions by chloroacetaldehyde modifications and primer extension reactions
PucneoFox and PucneoFoxRev plasmids containing the Fox region in two orientations were incubated in 100 mM KCl, 10 mM NaCl, 0.5 mM MgCl, and 20 mM Tris-HCl, pH 7, to mimic the intracellular environment.Each sample was then divided in two halves; one half was incubated with 0.5% chloroacetaldehyde at 37˚C for 8 minutes; while the second half was used as a control, which underwent the same protocol, except for the addition of chloroacetaldehyde.Chloroacetaldehyde modifies mostly single-stranded As and Cs [6].Reactions were stopped by the addition of 5 volumes of prechilled 0.3 M NaOAc pH 7, followed by two rounds of ethanol precipitation, and two washes with 70% ethanol.The control and chloroacetaldehyde samples were then linearized by ScaI-HF restriction endonuclease, followed by another round of precipitation, and finally dissolved in water, and subjected to 60% formic acid treatment for 5 min at room temperature to convert chloroacetaldehyde-modified sites into single-stranded breaks.The reactions were stopped by addition of 5 volumes of prechilled 0.3 M NaOAc pH 7, followed by ethanol precipitation and 70% ethanol wash.Upon drying, the samples were dissolved in TE buffer (10mM Tris-HCl, pH 8.0, 1mM EDTA), and heated at 65˚C for 10 min to minimize secondary structures.Next, primer extension reactions were performed with Taq polymerase (NEB) in a standard buffer with 2 mM Mg 2+ , and only a forward (no reverse) FWD primer, 5'-CAAATAGGGGTTCCCCCCACATTTCC-3', that annealed about 100 nt from the start of the cloned FoxE1 region.The primer extension reactions were performed in a thermo-cycler using the following cycling conditions: initial denaturing at 95˚C for 30 sec, then 60 cycles of denaturing 95˚C for 10 sec, annealing 58˚C for 30 sec, and extension at 72˚C for 30 seconds.Control primer extension reactions were performed using maxiprep samples, linearized with ScaI-HF, precipitated, and heated in TE, but not preincubated with ions, or subjected to chloroacetaldehyde and formic acid.The length controls were obtained with FWD and R1 5'-ATCGAATTTGAGGTCTGCACTCGAG-3', and FWD and R2 5'-GTTC AGCCTCAACGGGCT-3' primers combinations.R1 annealed right before the GC-rich area, and R2 annealed at the very end of FoxE1 region (Fig 2A).The reactions and controls were mixed with the same volumes of loading dye (95% formamide, 0.025% (w/v) Bromophenol blue, 0.025% (w/v) Xylene cyanol, 5 mM EDTA pH 8.0), denatured for 3 min at 70˚C, and separated in 6% denaturing polyacrylamide gel with urea.The sequencing reactions were performed with Thermo Sequenase™ Cycle Sequencing Kit (ThermoFisher Scientific)), using the same forward primer as in primer extension reactions, according to manufacturer's instructions.Afterwards, the gel was electro-transferred onto a Nytran Supercharge blotting membrane (Whatman).The membrane was hybridized with a biotinylated probe, obtained by PCR with Biotin-11 dUTP, using Fox-containing plasmid as a template, the same forward primer as used in primer extension, and the reverse R1 primer that annealed at the position immediately upstream from the cloned FoxE1 region.Hybridization was performed overnight in Church buffer, followed by two 10 min washes in 0.1xSSC, 0.1% SDS, at room temperature, and at 60˚C.Lastly, the detection was carried out by blocking the membrane in Blocking buffer (0.25 M Na 2 HPO 4 , pH 7.2, 5% (w/v) SDS), followed by incubation with streptavidin alkaline phosphatase conjugate dissolved in Blocking buffer, with subsequent steps using Blocking/Washing and Detection buffers from Biotin Chromogenic Detection kit (ThermoFisher Scientific),

Introduction
The Fox gene family is a large and functionally diverse family of forkhead-related transcriptional regulators, which are essential for embryogenesis and adult physiology.The majority of Fox family proteins contain an engrailed homology 1 (Eh1) motif positioned C-terminal to the forkhead-related DNA-binding domain [1].The Eh1 motif is a short eight amino acid sequence, FS(I/V)XXFFX, where F is a branched hydrophobic residue.Eh1 provides a specific binding surface on Fox proteins for recruitment of the Groucho/TLE protein co-repressor complex.The physical interaction of the Eh1 motif with Groucho/TLE co-repressors was shown for a variety of Fox family members, such as FoxA2, FoxG1, Slp2 (FoxG sub-family), FoxD3, and FoxH1 [1,[9][10][11][12][13].The Fox-Groucho/TLE complex can repress transcription of target genes either directly, or through reducing transactivation mediated by co-activators [9].Moreover, the recruitment of Groucho/TLE by FoxA2 generates a compact chromatin structure resulting in repression of target genes [14].
While the Eh1 motif is present in most members of the Fox family proteins, for individual Fox family genes, it was lost during evolution in certain species.In our previous study, we reported the absence of the Eh1 motif in some mammalian FoxE1 proteins [1].FoxE1 is a single-exon gene encoding a transcriptional regulator, required for the development of the thyroid gland, the palate, and for the hair morphogenesis [15].FoxE1 has been shown to be functionally indispensable in mice; FoxE1 null mice exhibit a cleft palate, and either a sublingual or a completely absent thyroid gland [15].Mutations in the human FoxE1 gene lead to Bamforth syndrome, which is characterized by thyroid agenesis, cleft palate, spiky hair, and choanal atresia [16,17].
Replication stalling has been demonstrated at trinucleotide repeats, such as (GAA) n , (CGG) n , and (CTG) n , which expand in trinucleotide repeats diseases.All these repeats have a potential to form unusual DNA conformations (reviewed in [20].For example, GAA repeats can form triplexes, while CTG and CGG repeats mostly form hairpins [19].The formations of these structures in the living cells have been directly demonstrated by multiple methods [31].Formation of quadruplexes has been revealed at stalled replication forks [32].Truncated DNA strands, resulting from replication stalling, can invade the neighboring DNA regions, triggering DNA repair.Alternatively, they can cause the fork reversal [33], resulting in a doublestranded DNA end, which can also initiate recombination.Repair and recombination can result in healing of the DNA damage.However, such healing increases the risk of DNA changes, such as deletions, duplications, or sequence rearrangements [20].Chromosomal fragility was observed at the GGC, AT, and other repeats that can form alternative DNA conformations in yeast [28,30,34].
Sequences that can form non-B structures are hot spots for genomic instability [35][36][37], associated with inherited disorders and cancer [20].They can also play a significant role in evolution [23,38].A study of microsatellite repeats in primate evolution showed increased mutagenicity associated with repeats having non-B DNA structure potential [39].An analysis of human-orangutan divergence showed an elevation in substitution frequencies corresponding to the single-stranded regions of non-B DNA structures [23].
Many sequences, especially those that are G-rich, have a potential to fold into several types of alternative conformations.DNA structure of the same sequence can differ in different cell types, and even within the same cell.For example, H-DNA mostly forms during S-phase, and is celltype specific.It has been shown to occur more frequently in transformed cell lines and differentiating cells, compared to iPSCs.The formation of non-B DNA structures can be affected by cellular processes, such as transcription and replication, since they can create temporarily singlestranded DNA regions that facilitate formation of alternative DNA structures [40].
In this study, we examined the timing of the loss of the Eh1 motif from the C-terminal domain of FoxE1 proteins.We have also investigated a potential molecular mechanism of this mutation and concluded that this loss likely occurred in a common ancestor of mammals due to the formation of a noncanonical DNA structure in the Eh1 region during DNA replication.We hypothesized that this deletion resulted in a loss of a co-repressor recruitment function, and possibly a gain of a novel function in transcriptional regulation by FoxE1.

Loss of the Eh1 motif in FoxE1 proteins of placental mammals
In our previous study, we identified a loss of the Eh1 motif in a subset of FoxE1 proteins of placental mammals [1].Here, we explore whether the Eh1 motif was lost in an ancestor of all mammals or later, in the evolution of placentals.To this end, the FoxE1 proteins of selected monotreme and marsupial mammals that diverged from placentals, were sequenced and examined for the presence of the Eh1 motif.Initial sequence analysis, performed in a previous study [1], identified the Eh1 motif in the FoxE1 protein sequences of the marsupial mammal Monodelphis domestica (opossum), and the monotreme mammal Ornithorhynchus anatinus (platypus).To confirm the presence of the Eh1 motif in FoxE1 proteins of other monotreme and marsupial mammals, FoxE1 of two additional species from both lineages, Tachyglossus aculeatus (echidna, monotreme), and Macropus eugenii (kangaroo, marsupial), were sequenced.The presence of the Eh1 motif in FoxE1 proteins of those marsupial and monotreme species indicated that the loss of the Eh1 motif occurred later, in the lineage of placental mammals (Table 1).
To determine the evolutionary timing of the Eh1 deletion in placental mammals, we performed an alignment of the C-terminal regions of 22 FoxE1 proteins, 12 of which were derived from species from different orders of placental mammals, such as Rodentia, Afrotherians, Xenarthra, Proboscidea, Primates, and Artiodactyla.The FoxE1 sequences were aligned, and the accuracy was scored using the T-coffee algorithm [3] (Figs 1A and S1).The analysis demonstrated the loss of the Eh1 motif in all 12 FoxE1 proteins of placental mammals from our dataset, while confirming its evolutionary conservation in cold-blooded vertebrates and nonplacental mammals.
To trace the evolution of Eh1 motif in FoxE proteins in metazoans, a phylogenetic tree of the FoxE subclass, along with the FoxC and FoxD outgroups, was constructed using a neighbor-joining method (Fig 1B).The topology of the phylogenetic tree indicated a close relatedness of all FoxE1 proteins examined, and a common ancestry of this subgroup.It also suggested that FoxE1 proteins from non-placental and placental mammals originated from a common ancestor, which contained the Eh1 motif.
Based on our phylogenetic analysis, we can estimate the timing of the deletion.Prior to the diversification of the placentals, the Eh1 motif was present in the FoxE1 proteins in non-placental mammals.The species we examined that split after placental divergence early in evolution forming the afrotheria clade, such as Elephantulus rufescens (elephant shrew), Loxodonta africana (African elephant), and Dasypus novemcinctus (nine-banded armadillo) did not include this motif.Thus, we concluded that the deletion already existed in the early evolution of the placentals.
Comparison of the Eh1 deletion sites across the placental mammals revealed that each had the same deletion borders.The resultant sequence lost four out of seven conserved residues of the Eh1 motif, along with 19 adjacent residues from the C-terminal side, while the other three conserved residues of the Eh1 motif remained in all of the analyzed placental mammalian sequences (in human FoxE1, residues 225-227, FGL).The sequence FGL V/M PERP was formed at the junction of the deletion borders (S2 Fig) .Due to incorporation of two proline residues in this sequence, the region was predicted to lose the α-helical structure, a The indicated sequence of FoxD3 binds to the XGrgr4 transcriptional co-repressor [12].b The indicated sequence of a Gsc homeodomain protein binds to the WD40 domain of the TLE1 transcriptional co-repressor [41].
https://doi.org/10.1371/journal.pone.0296176.t001characteristic of the Eh1 motif [41].This structural change, together with the loss of more than half of the conserved residues of the Eh1 motif, was predicted to inactivate Groucho/TLE recruitment activity in these FoxE1 proteins.Interestingly, the newly formed sequence was highly conserved across the analyzed placental mammals, which may indicate a potential functional significance.It is possible that the conserved sequence formed at the deletion junction site may confer a novel transcriptional function to these members of the FoxE subfamily.

Analysis of non-B DNA structures forming in the echidna FoxE1
To infer the mechanism of the Eh1 deletion, the area surrounding the deletion site was analyzed to assess the formation of non-B DNA structures.Non-B DNA structures, such as triplexes, hairpins, G-quartets, and Z-DNA, have been shown to be associated with genomic instability [21,22,24].We examined the structure of the Eh1 deletion region using the FoxE1 sequence of echidna (Tachyglossus aculeatus) (Fig 2A ), which belongs to the order Monotremata that diversified right before placental mammals.The region overlapping and flanking the Eh1 encoding sequence of the echidna FoxE1gene is about 80% G-C rich.A portion of it was predicted to form a stable DNA hairpin by Mfold algorithm [42] (H1 hairpin, Fig 2C) [42].
Additionally, several G quartets were predicted by QGRS mapper [42].However, in the G-rich regions, the formation of imperfect triplexes supported by G-C-G Hoogsteen triads [18] cannot be excluded.
In order to explore the DNA structure at the deletion site, we performed an analysis of the single-stranded regions by chemical modifications [6].Single-stranded DNA regions can be detected at the junctions of non-B structures with B DNA, and sometimes within DNA loops, which are often associated with non-B structures.To analyze the structures this DNA region can form, the deletion area of the echidna FoxE1 was subcloned into pUCneo in two orientations (Figs 2A and 3A), and supercoiled plasmids were incubated at the ionic concentrations, typical for the intracellular environment.Then, the plasmids were exposed to chloroacetaldehyde, which modifies A and C nucleotides in single-stranded regions, but not within the double-stranded helix.Formic acid treatment was performed to break the DNA strands at the chloroacetaldehyde modification positions.The breaks corresponding to the DNA singlestranded areas were detected using a primer extension assay that shows terminations at the template breakpoints.
In this assay, we took extra care to distinguish the bands corresponding to the breaks at chloroacetaldehyde modification sites from the bands caused by Taq pausing at DNA secondary structures that can form prior or during primer extension reaction.We minimized the formation of those structures by linearizing the plasmids, and then heating at 65˚C in low salt conditions with addition of EDTA.We also controlled for the structures that evaded this treatment, or still formed during primer extension, by performing the primer extension reaction on DNA that was not subjected to chloroacetaldehyde, but otherwise underwent the same procedure.Additionally, primer extensions were set with linearized templates that were preheated in TE to destroy pre-existing triplex structures, but have not been pre-incubated with ions or subjected to formic acid or chloroacetaldehyde treatments (conditions were indicated over the gel image in Fig 2B).Comparisons between samples helped understand if the bands resulted from inability of Taq to extend past the breaks or stalling by the structures that formed prior to or during the primer extension reactions.
The analysis indicated the presence of single-stranded areas at the borders of a potential extended hairpin H2 (Fig 2B , black arrows).Neither of the H1 or H2 hairpins predicted to form in this region could account for the observed Eh1 deletion since the endpoints for the deletion resided within the stem region of the hairpins (Fig 2C).However, the stalling point for the primer extension assay, which was not associated with hairpin formation (white arrow, Fig 2B), corresponded to the site of the deletion.Interestingly, the primer extension reaction stalled precisely at the border of the deletion, indicating that this position presented a roadblock for Taq replication (white arrow, Fig 2B).This stalling can be explained by a formation of the quadruplex structure Q1 (Fig 2C) right in front of the Taq polymerase; this quadruplex was predicted to be stable by QGRS mapper [42].Structures resulting in stalled replication could promote cleavage of DNA, followed by a repair process that could result in a deletion [43].The absence of primer extension breakpoints at this position in the sample subjected to chloroacetaldehyde suggested that Q1 structure was not forming without primer extension reaction, indicating that it only forms in a single-stranded DNA, which may exist temporarily during replication, but would not form within a double helix in non-replication conditions, even in the presence of supercoiling [44].Interestingly, the chloroacetaldehyde modification pattern also suggested a quadruplex formation in the other orientation of FoxE1 (Q2, S3 Fig) .However, the area of this quadruplex did not overlap the deletion, nor did it affect primer extension, likely due to disassembly of this quadruplex structure under the assay conditions.

The Eh1 region of the echidna FoxE1 gene causes replication stalling in mammalian cells
To determine whether the potential hairpin region affects replication fork progression in vivo, the replication of pUCneoFox and pUCneoFoxR was examined in COS-1 monkey fibroblasts cells.These plasmids can support a bidirectional replication initiated at the SV40 origin [45], and the predicted FoxE1 hairpin region was positioned in a manner to potentially interfere with the counterclockwise mammalian replication fork (Fig 3A).Replication progression through the FoxE1 region was examined by two-dimensional separation of replication intermediates (Fig 3B -3D).Replication intermediates form a Y arc if the replication origin is located outside or close to the end of a fragment [46].While the Y arc was smooth for the empty vector control plasmid (Fig 3D ), replication of the plasmids containing the FoxE1 insert in either orientation resulted in a Y-arc with a bulge (Fig 3B and 3C shown by an arrow).The bulges were located approximately one-third from the origin end, which corresponded to the position of the FoxE1 insert.The Y-arc bulge represented an accumulation of replication intermediates at the position of FoxE1 sequence, indicative of replication stalling in this area.The presence of the bulge was confirmed by quantitative analysis of the middle area of the arc (Fig 3B-3D bottom).While the two-dimensional analysis does not have sufficient resolution to define the structure that causes replication stalling, it is likely that the same structure is responsible for both the endpoint of primer extension in vitro and the replication fork stalling in cells.

Discussion
This study focused on a functional change in Fox family proteins that occurred during the evolutionary divergence of placental mammals: the loss of the Engrailed Homology 1 motif.Comparative sequence analysis demonstrated that the Eh1 motif was lost from FoxE1 proteins of all placental mammals, with the evolutionary timing of the loss roughly coinciding with the diversification of the placental mammalian lineage.Since the Eh1 motif mediates the recruitment of Groucho/TLE co-repressors [47], the loss of the functional Eh1 motif likely resulted in a loss of the Groucho/TLE-dependent repressive function for these FoxE1 proteins.We also identified a new conserved motif formed in FoxE1 at the junction of the deletion borders, which could potentially confer a new transcriptional function.Consistent with this suggestion of a change in FoxE1 function, FoxE1 is required for the chondrogenesis of the zebrafish pharyngeal skeleton [48], but not for the thyroid development, as has been observed for the FoxE1 of placental mammals [15].
A loss of the Eh1 motif has been observed for other transcriptional factor families.Two homeodomain-containing proteins, Msx2 and Msx3, lost the C-terminal Eh1 motifs following duplication in the vertebrate lineage.In contrast, a vertebrate Msx1 protein and an ancestral form of Msx protein found in amphioxus, contain both N-terminal and C-terminal Eh1 motifs [49].Moreover, both motifs are functional in the Msx protein, and mediate binding to the Groucho-related co-repressor, Grg1 [50].In our previous study, we have also reported a similar loss of the Eh1 motif in FoxE1 genes in the avian lineage [51].We identified two microdeletions that resulted in a loss of the Eh1 motif and another repressive domain, which indicated a significant divergence of the avian FoxE1 proteins from their amphibian and mammalian versions.
The modular structure of transcriptional regulatory proteins provides an efficient mechanism for the modification of protein structure and function during evolution.Gain and loss of cofactor interaction motifs in trans-regulatory domains has been linked to acquisition of novel regulatory functions [52].An example of a loss and a gain of cofactor interaction motifs was observed in an orthologous group of the pair rule transcription factor, Fushi tarazu (Ftz).The F/YPWM motif of the Schistocerca Ftz protein is required for the homeotic function and interaction with the transcriptional cofactor, extradenticle/Pdx.This motif is also present andappears to be necessary for a similar homeotic function in the ancestral form of the Ftz protein in Crustaceans [53].However, the Drosophila Ftz protein has lost this motif and the associated homeotic function, while acquiring a novel embryonic function in regulation of segmentation [54].This newly acquired segmentation function of Drosophila Ftz correlated with acquisition of a novel cofactor interaction motif, LXXLL [55].This short motif has been shown to be involved in the interaction of the Drosophila Ftz protein with the nuclear receptor F1-Ftz during segmentation [56,57].
In this study, we investigated the deletion mechanism that led to the loss of the Eh1 motif in FoxE1 genes.The deleted area is within a highly G-C rich region that can potentially adopt a number of non-B conformations.Analysis of sequences surrounding the deletion site predicted the potential to form perfect or extended imperfect hairpins, as well as quadruplexes.However, it is unclear which specific structures form in the sequence, and especially which structure formation is responsible for the observed microdeletion of the Eh1 motif.The primer extension analysis indicated that the deletion border corresponded precisely with the site of replication stalling, consistent with the formation of a quadruplex structure at this position.
Quadruplexes (G quartets or G4 DNA structures) are stabilized by Hoogsteen bonds between the tetrads of G nucleotides.Additionally, the adjacent tetrads are also involved in ππ interactions, so called stacking [58].Formation of quadruplexes requires the presence of four clusters of at least two adjacent Gs that form tetrads, separated by not more than 7 nucleotides that form the loops [59].The stability of quadruplexes depends on many parameters, including the composition and length of the loops and the composition of the adjacent regions [60].Quadruplexes were shown to interfere with DNA replication, in both in vitro and in vivo assays [61,62]; however, there are natural mechanisms in the cell that unwind quadruplex structures to allow the replication to proceed [63].The observed deletion is consistent with a proposed mechanism of Dna2 endonuclease cleavage in a single-stranded loop of the quadruplex [64], followed by excision of the structure by EXO1 exonuclease [64].Interestingly, the hairpin structure that formed independently of replication (detected by chloroacetaldehyde modification of non-replicating plasmid DNA), did not correspond to the deletion borders, eliminating this structure as the underlying cause of the deletion.This observation is consistent with previous studies showing that not all predicted quadruplexes would readily form in the supercoiled DNA, but require a single strand formation as a result of transcription or replication [44].
Our in vivo studies in mammalian cells confirmed the obstructed replication within the same region of the echidna FoxE1.However, the two-dimensional analysis of replication intermediates does not provide a single-nucleotide resolution needed to determine the exact position of replication stalling.We and others have previously reported replication stalling at a triplex-forming sequence, GAA repeat [33,65,66], similar to the one we observed here [66].We have also previously shown that d(G)n�d(C)n repeats block DNA replication progression in yeast [67], while this effect was further enhanced by stalled transcription caused by R-loops, stable complexes between DNA and RNA [68].
The sequence that the Taq polymerase stalled at is a perfect candidate for a quadruplex, however, it cannot be excluded that the same sequence can also adopt an imperfect triplex, based on its high G-C content.Recent studies demonstrated a major role of triplex structures in genome instability [40,69].Using the method of S1-END-seq, thousands of single-stranded regions were detected at sequences predicted to form triplexes [40].The formation of these structures was more pronounced in cycling cells and correlated with replication.The triplex structures were also shown to be hotspots of genomic instability in several studies.Intriguingly, the formation of triplex structures was shown to increase during differentiation of iPSCs, which may indicate that conditions existing during specific developmental windows can favor triplex formation and associated genomic instability [40].However, the formation of such structures may occur with low probability, and further selection would determine whether the triplex-associated mutations or deletions lead to development of beneficial properties.
A similar example of a loss of a DNA region affecting evolution has been demonstrated for the Stickleback fish that adapted to freshwater environment.The adaptation was associated with a deletion of an enhancer located inside an area prone to the formation of an alternative DNA structure, presumably Z-DNA forming within a long TG-repeat.The deletion area was also shown to cause replication-dependent fragility, which can explain the elevated mutation rate that was likely responsible for this evolutionary change [70].
Our results support the conclusion that in replication or non-replication conditions, different structures can form in the FoxE1 deletion region.In the context of non-replicating doublestranded supercoiled DNA, the deleted area displays a chloroacetaldehyde modification pattern likely corresponding to a hairpin formation, while in replication conditions, the position of replication stalling was consistent with the formation of a quadruplex.The quadruplex hypothesis is more consistent with the FoxE1 deletion borders.These insights into the cause and consequences of the loss of Eh1 motif in Fox family may promote a better understanding of the genomic drivers of the evolution of placental mammals.

Fig 1 .
Fig 1.The C-terminal domain of FoxE1 of placental mammals contains a microdeletion of the Eh1 motif and adjacent region, which resulted in the loss of the Eh1 motif.(A) A multiple sequence alignment of the C-terminal domain of the FoxE1 proteins of vertebrates.(B) A phylogenetic tree for proteins of the FoxE subclass, and the FoxC and FoxD outgroups.A neighbor-joining method was used to construct the tree topology and bootstrapping values are shown at each branch point (percentage of 1000 bootstrap samples) using MEGA 3.1.The distance scale below the tree represents the number of substitutions per site.The C and D Fox outgroups have been collapsed for better illustration.Protein sequences that lack an Eh1 motif are represented by blue triangles.Fox proteins and subclasses that contain an Eh1 motif are represented by red circles.https://doi.org/10.1371/journal.pone.0296176.g001

Fig 2 .
Fig 2. Analysis of secondary structures of the echidna FoxE1 G-C rich region in PucNeoFoxRev plasmid with and without replication.A. Schematic of the PucNeoFoxRev area used in primer extension experiments.The position of FWD primer used in primer extension and R1 and R2 reverse primes used for the length controls are shown.B. The denaturing polyacrylamide gel electrophoresis of the primer extension reactions and control to analyze the structure of the deleted area of FoxE1.The first three lanes correspond to the primer extension reactions of the PucNeoFoxRev subjected to cellular ionic concentrations, then chloroacetaldehyde (CAA), linearized by ScaI digest, treated with formic acid (FA), and heated in TE to destroy the preexisting structures.Chemically modified DNA is in the second lane, while the first and the third are controls with omitted CAA or both CAA and FA.The regions that were modified are shown with black arrows on the left; the area of Taq polymerase stalling is shown with the white arrow.The following two lanes contain the length controls corresponding to the distances from the FWD primer to the start of the hairpin areas, or to the end of the FoxE1 region, obtained by PCR with FWD and R1 or R2 primers.The last four lanes contain the Sanger sequencing reactions of PucNeoFoxRev plasmids performed with Thermo sequenase Cycle Sequencing Kit, based on Taq polymerase extension from FWD primer.The A, G, T, C lanes correspond to ddA, ddG, ddT, and ddA termination reactions of the synthesized strand; complementary nucleotides of the template strand are marked next to the termination bands.The following areas are shown in brackets next to the sequence: Δ, deleted area of FoxE1; H1, perfect hairpin; H2, extended imperfect hairpin; Q1, quadruplex area.The direction of Taq synthesis is from bottom to top (shown by a gray circle).C. The structures of hairpins H1 and H2, and quadruplex Q1 that can form in FoxE1 deleted area.https://doi.org/10.1371/journal.pone.0296176.g002

Fig 3 .
Fig 3.The echidna FoxE1 hairpin-forming region causes replication stalling in COS-1 cells.(A) A map of PucNeoFox and PucNeoFoxRev plasmids.Hybridization probe and sites of AflIII digest used in two-dimensional electrophoresis assay are shown.(B-D) Two-dimensional agarose gel electrophoresis of replication intermediates of PucNeoFox (B), PucNeoFoxRev (C), and control (D) plasmids, digested with AflIII.The bulges on Y-arcs, corresponding to the replication stalling at the echidna FoxE1 region are shown by arrows.Quantitative analyses of the middle areas of the Y-arcs containing the bulges are shown below.(E) A schematic representation of the two-dimensional gel electrophoresis.The top part shows the shapes of replication intermediates upon AflIII digest, ranging from 1X to 2X size; the bottom part represents their separation in two dimensions, resulting in the Y-arc.1X spot consists of the molecules that are not currently replicating.A small bubble-shape intermediate is not visible on the 2D gel since it is covered by the 1X spot.https://doi.org/10.1371/journal.pone.0296176.g003

S1Fig.
Multiple sequence alignment of the C-terminal segment of FoxE1 proteins from 22 vertebrate species.The color represents the scoring of the alignment.Multiple sequence alignments were constructed using T-COFFEE, version 7.7.1 [3].(TIFF) S2 Fig.A consensus of an evolutionary novel sequence formed as a result of the microdeletion of the Eh1 motif and adjacent region in the Fox proteins of placental mammals.The diagram was generated with the WebLogo program [71].(TIFF) S3 Fig. Analysis of secondary structures of the echidna FoxE1 G-C rich region in PucNeo-Fox plasmid with and without replication.A. Schematic of the PucNeoFox area used in primer extension experiments.The position of FWD primer used in primer extension is shown.B. The denaturing polyacrylamide gel electrophoresis of the primer extension reactions and control to analyze the structure of the deleted area of FoxE1.The first three lanes correspond to the primer extension reactions of the PucNeoFox subjected to cellular ionic concentrations, then chloroacetaldehyde (CAA), linearized by ScaI digest, treated with formic acid (FA), and heated in TE to destroy the preexisting structures.Chemically modified DNA is in the second lane, while the first and the third are controls with omitted CAA or both CAA and FA.The regions that were modified are shown with black arrows on the left; the area of Taq polymerase stalling is at the structure forming at the border of FoxE1 and plasmid regions, hence not indicated by the arrow.The last four lanes contain the Sanger sequencing reactions of PucNeoFox plasmids performed with Thermo sequenase cycling kit, based on Taq polymerase extension from FWD primer.The A, G, T, C lanes correspond to ddA, ddG, ddT, and ddA termination reactions of the synthesized strand; complementary nucleotides of the template strand are marked next to the termination bands.The following areas are shown in brackets next to the sequence: Δ, deleted area of FoxE1; Q2, quadruplex area.The direction of Taq synthesis is from bottom to top (shown by a gray circle).C. The structures of quadruplex Q2 that can form adjacent to the FoxE1 deletion region.(TIFF) S1 Raw image.(TIFF)