Evolution of the Auxin Response Factors from charophyte ancestors

Auxin is a major developmental regulator in plants and the acquisition of a transcriptional response to auxin likely contributed to developmental innovations at the time of water-to-land transition. Auxin Response Factors (ARFs) Transcription Factors (TFs) that mediate auxin-dependent transcriptional changes are divided into A, B and C evolutive classes in land plants. The origin and nature of the first ARF proteins in algae is still debated. Here, we identify the most ‘ancient’ ARF homologue to date in the early divergent charophyte algae Chlorokybus atmophyticus, CaARF. Structural modelling combined with biochemical studies showed that CaARF already shares many features with modern ARFs: it is capable of oligomerization, interacts with the TOPLESS co-repressor and specifically binds Auxin Response Elements as dimer. In addition, CaARF possesses a DNA-binding specificity that differs from class A and B ARFs and that was maintained in class C ARF along plants evolution. Phylogenetic evidence together with CaARF biochemical properties indicate that the different classes of ARFs likely arose from an ancestral proto-ARF protein with class C-like features. The foundation of auxin signalling would have thus happened from a pre-existing hormone-independent transcriptional regulation together with the emergence of a functional hormone perception complex.

ARF class) as well as the TPL co-repressor [37,38]. Recent studies showed the existence of two ARF subfamilies in charophytes, class C and class A/B [20], but the absence of functional TIR1/AFB and Aux/IAAs suggested that a fully functional NAP did not exist before land plants [1,20,37,[39][40][41]. How these ancestral components evolved to form the land plants NAP remains an open question. Through the structural, biochemical and phylogenetic characterisation of a proto-ARF from an early divergent charophyte we set a scenario of how the co-option of ancestral mechanisms of transcriptional control possibly led to the evolution of hormone signalling pathways in plants.

Identification of proto-ARF and proto-RAV in early divergent charophytes
To understand the evolution of ARFs, we first characterized the biochemical properties of proto-ARFs and closely related proto-RAVs from early divergent charophytes. We searched for B3 homologues in charophyte transcripts databases (OneKp and Marchantia.info) [11,44] [9,42]. Although some publications placed Mesostigmaphoticae and Chlorokybophycae in a unified clade large differences in sequences and morphological traits argues for two different clades [2,6,42,43]. The presence of class C, A/B, A or B ARFs is indicated by blue, orange, red or green circles whereas yellow circles correspond to RAV proteins. Identities of proteins reported in the figure are shown in Supplementary S2 Table. https://doi.org/10.1371/journal.pgen.1008400.g001 Evolution of the Auxin Response Factors and classified them as B3 RAV or B3 ARF , depending on the residues signature of their predicted DBDs (S1 Table) [45]. B3 RAV domains were frequently associated with an APETALA2 (AP2) domain and/or PB1 domains in the basal charophyte Mesostigma viride and all later clades (Fig 1; S2 Table). M. viride also has an ARF homologue (GBSK01006108.1) devoid of a PB1 domain [1]. Its DBD was reliably modelled as an ARF (100% confidence with AtARF1 [46,47]), but it lacks most residues involved in the interaction with AuxREs (S3 Fig) and thus does not qualify as a functional ARF. The proto-ARF of the earliest diverging green Charophyte algae with predicted functional B3 ARF and PB1 domains was found in C. atmophyticus. Other ARF homologues were also present in all later diverging clades (Fig 1; S3 Fig).

DNA binding specificities and oligomerization potential of proto-RAV and proto-ARF
We determined the properties of "ancestral" RAV and ARF proteins, focusing on Klebsormidium nitens proto-RAV (containing predicted AP2, B3 RAV and PB1) (KnRAV, kfl00094_0070) and C. atmophyticus proto-ARF (CaARF, AZZW-2021616). The predicted B3 domains of KnRAV and CaARF display the signature residues typical of B3 RAV and B3 ARF , respectively (S1 Table; S3 and S4 Figs) suggesting that their divergent DNA binding specificities were already established in charophytes. To test this hypothesis, we characterized the binding of their DBD against the canonical DNA binding sites identified in angiosperms for ABI3, RAV and ARF TFs. KnRAV specifically bound the AP2/B3 RAV bipartite element described for Arabidopsis thaliana RAV TFs (Fig 2A) [48]. CaARF interacted strongly with double AuxRE sites (DR or ER, Fig 2B) but not with a single AuxRE site suggesting that the DBD of CaARF binds DR and ER motifs as a dimer without the help of the Middle Region (MR) and the PB1 domain. Altogether, these results confirm that RAV and ARF DNA binding preferences were established in basal charophytes and maintained along evolution.
Next, we studied the oligomerization capacity of their PB1 domain. Based on AtARF5 PB1 structure [23,47], the PB1 domains of KnRAV and CaARF were modelled as type I/II PB1 with electrostatic oligomerization potential (Fig 2C and 2D). Molecular weight determination of KnRAV-PB1 and CaARF-PB1 by Size Exclusion Chromatography combined with Multi-Angle-Light Scattering (SEC-MALLS) experimentally validated that both domains form oligomeric complexes (Fig 2E and 2F) but with a lower oligomeric potential than AtARF5-PB1 (S3 Table). Charophycean algae therefore appear to possess proto-RAV and proto-ARF proteins with oligomerization potential and diverging DNA binding specificities (Fig 2A and 2B; S3 and S4 Figs).

Evolution of ARF DNA binding specificity from early divergent charophytes to land plants
To further characterize the biophysical properties of proto-ARFs, we determined the predicted structure of CaARF DBD and showed that it was reliably modeled (99% confidence; Phyre 2) with AtARF1 and AtARF5 DBDs [46,47] except for an additional disordered region in CaARF present within the DD (Fig 3A). Similar disordered regions were found as a characteristic feature of all class C ARFs (Fig 3B and 3C; S3 and S7 Figs). In agreement with this, our phylogenetic studies position CaARF within clade C (S5 Fig). Such insertions are expected to modify class C DNA binding compared to A and B ARFs. We tested this hypothesis using ER motifs with different spacing (ER4-9). Unlike Arabidopsis AtARF2 (class B) and AtARF5 (class A) that largely prefer ER7/8 motifs (Fig 3D and 3E), CaARF showed promiscuous binding to ER4-9 but did not interact with a single AuxRE motif (Fig 3D-3F) confirming its interaction with ER motifs as a dimer. Arabidopsis class C AtARF10 behaves similarly to CaARF (Fig 3G). This shows that CaARF has a relaxed DNA specificity allowing binding to ER binding sites with various distances between the monomeric sites and that this specificity was maintained in class C ARF along plants evolution. The presence of a specific disordered region (Fig 3A-3C; S3 and S7 Figs) in class C ARF DBDs suggests a possible role in their relaxed specificity, that remains to be tested.  Table. Proteins added at 0 and 0.5 μM. (C-D), Structure models for KnRAV-PB1 (C, green) and CaARF-PB1 (D, blue) superposed to AtARF5-PB1 structure (cyan) (PDB code 4CHK [23]). Conserved residues indicated refer to AtARF5-PB1 structure. Positive and negative signs indicate potential interaction surfaces for oligomerization. (E), SEC-MALLS KnRAV-PB1 molecular weight determination for four protein concentrations (from dark to light green: 5 mg/ mL, 2.5 mg/mL, 1.25 mg/mL and 0.625 mg/mL). (F), SEC-MALLs CaARF-PB1 molecular weight determination (from dark to light blue: 5 mg/mL, 2.5 mg/mL, 1.25 mg/mL and 0.625 mg/mL.

Interaction with co-repressors in early divergent charophytes
As mentioned before, certain land plants ARF proteins have the capacity to interact directly or indirectly with the TPL co-repressor [35,36,50]. We wondered when in evolution this interaction was first established. Direct TPL-recruitment usually involves two different amino acid regions in the Middle Region (MR) of repressor ARFs: the EAR-motif (ERF-associated Amphiphilic Repression motif with LxLxL sequence or its LxLxPP variant) and the BRD domain (B3 Repression Domain with the K/RLFG sequence) [35,36], the BRD domain also being found in RAV proteins. CaARF-MR presents two potential repression regions with an EAR-like motif (LPLLPS, similar to LxLxPP) and a BRD domain (KLFG). Since TPL EARinteracting-region (TPL N-terminal, TPL-N) is extremely conserved between charophytes and land plants [49,51] (Fig 3H; S8 Fig; S4 Table), we used A. thaliana TPL-N (AtTPL202) to assay the TPL/CaARF interaction. CaARF interacted with AtTPL202 in co-purification assays and this interaction was lost with AtTPL202 -F74A , mutated in the hydrophobic EAR peptide binding groove (Fig 3I) [49]. Moreover, mutations in CaARF KLFG (CaARF -L523S/F524S ) or LPLLPS (CaARF -L492A/L493A ) weakened the interaction with AtTPL202, indicating that both sites might participate to TPL-N recruitment. The binding of the BRD domain of CaARF differs from that of the RAV1 of A. thaliana which interacts with the C-terminal part of TPL [52], suggesting different TPL recruitment mechanisms for these two protein families. The presence of similar TPL-recruitment sequences in proto-ARFs of different charophytes clades ARFs (S5 Table) suggests that they might also interact with TPL.

Discussion
The present biochemical characterization of CaARF, a proto-ARF from an "early divergent" charophyte, identifies this protein as class C ARF, in agreement with our phylogenetic analyses (S5 Fig). Mutte et al. (2018) proposed the existence of two ARF classes in "late divergent" charophytes, C and A/B, deriving from a common ancestor that diverged in an ancient charophyte clade [20]. Further identification of class C ARFs in the "early divergent" charophytes (Klebsormidiophyceae [1] and Chlorokybophyceae (this work)) and the presence of both classes C and A/B in the "late divergent" C. orbicularis suggest a second and more parsimonious scenario in which class A/B ARF members come from an ancestral proto-ARF, belonging to class C or class C-like that existed before the emergence of "late divergent" charophytes (S6 Fig). This hypothesis implies only a few class C ARF gene losses in some Klebsormidiophyceae, Coleochaetophyceae and Zygnematophyceae species. Still, all these scenarios need to be taken with caution as they are based on transcriptomic datasets and could be challenged when genomic sequences become available.
When comparing C and A/B clades we found a disordered region within the predicted DD of ancestral and land plants clade C ARFs that is not present in clade A/B neither in land plants clades A and B. We speculate that during the duplication event leading to A/B emergence from clade C, the loss of this disordered sequence occurred. The DNA interaction experiments presented in this manuscript suggest that this event might have contributed to the acquisition of a more restricted DNA specificity of class A and B ARFs for ER motifs.
Apart from the similar behaviour observed for CaARF and AtARF10 when binding to DNA, ancestral clade C ARFs already presented PB1 oligomerization potential and interaction with the co-repressor TPL. The conservation of these properties along evolution is consistent with experiments conducted on Marchantia showing partial complementation of the loss of function MpARF3 by class C AtARF10 [40]. Moreover, these biochemical facts are instructive on several aspects of the evolution of the NAP in plants. First, proto-ARFs being able to interact with AuxREs supports that the NAP could have co-opted sets of genes already regulated by ARFs in charophytes, as suggested in other studies [1,20,39,40]. In this context, the emergence of the A/B clade with a different DNA binding behaviour could have allowed to target a more specific set of genes. Second, proto-ARF interaction with TPL provides functional evidence for a role for class C ARFs as transcriptional repressors. Putative TPL interaction motifs are also present in proto-RAV and most proto-ARFs across charophytes, which includes class A/B ARFs. The capacity to recruit TPL co-repressors could thus be an ancestral property of RAV and ARF TFs.
From these observations, we propose ARFs recruitment of co-repressor complexes to AuxREs promoter elements as a primitive and conserved mechanism predating the NAP. The absence of a functional TIR/AFB-Aux/IAA co-receptor [1,20,41] indicates that this primitive system was auxin-independent. These observations are consistent with a series of experiments in Marchantia showing that auxin-responsive genes show similar transcriptional responses in WT and MpARF3 mutants [20,40]. Alongside the diversification of ARF DNA binding specificity, emergence of the auxin perception complex in the first land plants turned ARFs-regulated genes into auxin-responsive genes through ARF-Aux/IAA-TIR/AFB interactions evolution (Fig 4).
Our work thus allowed proposing a scenario where the evolution of the binding specificity of an ancestral TF together with the emergence of a functional hormone perception complex create a hormone signalling pathway. This scenario offers a better understanding of how hormone signalling pathways can evolve from pre-existing mechanisms of transcriptional regulation independent of any hormone signalling.

Protein homologues search and classification
Potential homologs of the NAP components were searched by sequence homology to the corresponding NAP proteins from M. polymorpha. Blasts were done using different databases: OneKp, PlantTFDB and Marchantia.info. Due to the lack of proteomic data in charophyte organisms, we carried out tblastn. Each transcript was then translated using Expasy Translate tool. Sequences resulting from this search were classed using protein sequence alignments and phylogenetical studies. Protein sequences alignments were done with Multialin (http:// multalin.toulouse.inra.fr/multalin/) and ESPrit (http://espript.ibcp.fr/ESPript/ESPript/) online tools. Phylogenetic analyses were conducted using predicted DBDs from charophyte proto-ARFs and DBDs belonging to A. thaliana and M. polymorpha ARFs. Phylogenies were done with MEGA and Phylogeny.fr software using Maximum likelihood algorithm.

3D structure modelling
Protein structure modelling was done with Phyre2 online tool [47]. Three-dimensional structures were visualized with PyMOL software (www.pymol.org).

Protein expression and purification
All proteins were expressed in Escherichia coli BL21 strain. Bacteria cultures were grown with the appropriate antibiotics at 37˚C until they achieved an OD 600nm of 0.6-0.9. Protein expression was induced with isopropyl-β-D-1-thyogalactopiranoside (IPTG) at a final concentration of 400 μM at 18˚C overnight. Bacteria cultures were centrifuged, and the pellets were resuspended and sonicated in the buffers indicated in S7 Table. After centrifugation, soluble fractions of KnRAV, KnRAV-DBD, CaARF, CaARF-DBD and CaARF mutants were loaded on Dextrin-Sepharose (GE Healthcare) column previously  Table). After column washing, proteins were eluted in buffer A with maltose 10 mM (S6 Table).
PB1 domains of KnRAV and CaARF as well as full-length proteins ARF2, ARF5 and ARF10 were purified on Nickel-Sepharose (GE Healthcare) columns previously equilibrated in the appropriate buffers (S6 Table). After protein binding, columns were washed with 30 mM imidazole to remove all proteins non-specifically bound to the column. Proteins were eluted in the corresponding buffer containing 300 mM imidazole (S6 Table). His-tags of PB1 domains were cleaved by TEV protease (5% w/w) overnight at 4˚C followed by incubation at 20˚C for 2 h for SEC-MALLS experiments.
AtTPL202 and mutants were purified as explained in Martin-Arevalillo et al., 2017 [49]. Following purification step, all proteins were dialyzed for 15 h at 4˚C in their purification buffers, frozen in liquid nitrogen and conserved at -80˚C until used.

EMSA DNA binding tests
DNA probes were artificially designed based on the DNA binding site for each TF (S8 Table) (Eurofins). Oligonucleotides for the sense strand were designed with an overhanging G in 5' that allows the labelling of the DNA (S8 Table). Annealing of the oligonucleotides and Cy5-labelling of the probes were performed as described in Stigliani et al.,(2019) [19]. Electrophoretic Mobility Shift Assays (EMSA), were done on native 2% agarose gels prepared with TBE buffer 0.5X. Gels were pre-run in TBE buffer 0.5X at 90 V for 90 min at 4˚C. Protein-DNA mixes contained Salmon and Herring Sperm competitor DNA (final concentration 0.07 mg/ml) and labelled DNA (final concentration 20 nM) in the interaction buffer (20 mM HEPES pH 7.8; 50 mM KCl; 100 mM Tris-HCl pH 8.0, 2.5% glycerol; 1 mM DTT). Mixes were incubated in darkness for 30 min at 4˚C and next loaded in the gels. Gels were run for 1 hour at 90 V at 4˚C in TBE 0.5X. DNA-protein interactions were visualized with Cy5-exposition filter (Biorad ChemiDoc MP Imaging System).

Co-purification protein-protein interaction tests
For protein-protein interaction analyses, complexes between potential interaction partners were first formed by mixing MBP-tagged CaARF (wt or mutants) (90 μg) with His-tagged AtTPL202 (and mutants) (70 μg) in CAPS 20 mM pH 9.6; Tris-HCl 100 mM pH 8; NaCl 50 mM; TCEP 1 mM buffer for 1 h at 4˚C. Complexes formed were fixed through the MBP tag to Dextrin-Sepharose columns previously equilibrated with CAPS 20 mM pH 9.6; Tris-HCl 100 mM pH 8; NaCl 50 mM; TCEP 0.1 mM buffer. After incubation of the complexes with Dextrin-Sepharose for 30 min at 4˚C, nonspecific interactions were removed by a washing step with the same buffer. Protein complexes were eluted with 200 μl of the same buffer containing 10 mM of maltose. MBP was used as control for unspecific interactions. The eluted fractions were analysed by SDS-page polyacrylamide gel electrophoresis 12%.  Fig. Different evolution hypotheses. A. ARF C and A/B originated from a common ancestor that had already diverged in an early charophyte and evolved independently in later clades, with subsequent losses in different clades/species. B. The presence of ARF C homologues from the first clades of charophytes evolutionary line suggests this subfamily or a closely-related one (C-like), as the common ancestor for current charophycean A/B and C ARFs. In both scenarios duplication of A/B into A and B happened in land plants. C. Phylogenetic tree generated by Maximum likehood (phylogeny.fr [54,55]) that supports charophytes C clade as ancestor of charophyte and land plants ARF subfamilies.   Table. TPL homologues found in charophyte organisms. Accession numbers for transcripts or proteins and the databases used for each search are indicated. Amino acidic sequences were obtained by transcripts translation, except for K.nitens-RAV protein, obtained from PlantTFDB. Predicted domains are indicated with a tick. (DOCX) S5 Table. ARF Charophytes potential EAR motifs. Accession numbers for transcripts or proteins and the databases used for each search are indicated. Potential EAR motifs in the Middle Regions (MR) were searched for each protein, with the MR corresponding to the sequence in between the DBD domains and the PB1 domains. Possible EAR motifs were identified as potential TPL-recruitment sites based on the EAR/EAR-like motifs described in TPL interactome publication [35,36].