Ribosomal DNA promoter recognition is determined in vivo by cooperation between UBTF1 and SL1 and is compromised in the UBTF-E210K neuroregression syndrome

Transcription of the ~200 mouse and human ribosomal RNA genes (rDNA) by RNA Polymerase I (RPI/PolR1) accounts for 80% of total cellular RNA, around 35% of all nuclear RNA synthesis, and determines the cytoplasmic ribosome complement. It is therefore a major factor controlling cell growth and its misfunction has been implicated in hypertrophic and developmental disorders. Activation of each rDNA repeat requires nucleosome replacement by the architectural multi-HMGbox factor UBTF to create a 15.7 kbp nucleosome free region (NFR). Formation of this NFR is also essential for recruitment of the TBP-TAFI factor SL1 and for preinitiation complex (PIC) formation at the gene and enhancer-associated promoters of the rDNA. However, these promoters show little sequence commonality and neither UBTF nor SL1 display significant DNA sequence binding specificity, making what drives PIC formation a mystery. Here we show that cooperation between SL1 and the longer UBTF1 splice variant generates the specificity required for rDNA promoter recognition in cell. We find that conditional deletion of the TAF1B subunit of SL1 causes a striking depletion of UBTF at both rDNA promoters but not elsewhere across the rDNA. We also find that while both UBTF1 and -2 variants bind throughout the rDNA NFR, only UBTF1 is present with SL1 at the promoters. The data strongly suggest an induced-fit model of RPI promoter recognition in which UBTF1 plays an architectural role. Interestingly, a recurrent UBTF-E210K mutation and the cause of a pediatric neurodegeneration syndrome provides indirect support for this model. E210K knock-in cells show enhanced levels of the UBTF1 splice variant and a concomitant increase in active rDNA copies. In contrast, they also display reduced rDNA transcription and promoter recruitment of SL1. We suggest the underlying cause of the UBTF-E210K syndrome is therefore a reduction in cooperative UBTF1-SL1 promoter recruitment that may be partially compensated by enhanced rDNA activation.


Introduction
The ribosomal RNA (rRNA) genes encode the catalytic and structural RNAs of the ribosome as a single 47S precursor. As such, transcription of these genes is a major determinant of cell growth, cell cycle progression and cell survival and an essential factor in the formation of hypertrophic diseases such as cancer [1]. Dysregulation of rRNA genes is also the cause of a large range of developmental and neurological disorders that are often also associated with cancer [2][3][4][5][6]. To develop treatment strategies for these diseases it is important that we command an understanding of how these genes are transcribed and regulated. Transcription of the several hundred tandemly repeated and essentially identical [7] rRNA genes, the rDNA, is undertaken exclusively by RNA Polymerase I (RPI, Pol1, POLR1) and a set of basal transcription factors dedicated to this task. This strict correspondence of gene and polymerase has resulted in the rapid coevolution of rDNA promoters with basal factors, leading to a high degree of species specificity of the RPI transcription machinery [8][9][10]. The functional uniqueness of the RPI machinery provides an obvious target for novel therapeutic approaches [11]. However, what directs the RPI transcription machinery exclusively to the rDNA and how it is specifically recruited to both the major 47S pre-rRNA promoter and enhancer element despite these having little or no DNA sequence commonality are still not understood. Here we show that despite having little or no inherent DNA sequence selectivity, the multi-HMGbox Upstream Binding Factor (UBF/UBTF) plays a crucial role in targeting RPI preinitiation complex formation to the rDNA promoters in vivo. Further, we show that this UBTF function is compromised by an E210K mutation recently linked to a recurrent human pediatric neuroregression syndrome [6,[12][13][14].
The basal factors of the mammalian RPI transcription machinery include Selectivity Factor 1 (SL1), the multi-HMGbox factor UBTF/UBF and the RPI-associated initiation factor RRN3 [9,15,16] (Fig 1A). SL1 consists of the TATA-box Binding Protein (TBP) and the RPI specific TBP Associated Factors, TAFIA to D. Evolutionary variability of these TAFs determine their species-specific functions and that of RPI transcription, and for example human or mouse SL1 complexes are not functionally exchangeable [17][18][19]. In contrast, UBTF is a highly conserved essential factor that is functionally exchangeable between human, mouse and to some extent even Xenopus [19][20][21]. Our present understanding of how SL1 and UBTF function in rDNA transcription derives predominantly from cell-free studies. These have suggested sequential binding scenarios for the formation of the RPI preinitiation complex, whereby one or two dimers of UBTF bind across the rDNA promoter provide a landing site for SL1, or may stabilize the initial binding of SL1 [22,23]. UBTF was found to bend and loop DNA, suggesting that it could bring together the two distal Upstream Promoter Element (UPE, aka UCE) and Core Elements of the RPI promoter to form such a landing site for SL1 [24]. However, UBTF of itself does not display any significant degree of sequence specific binding, making its role in targeting SL1 difficult to understand [10]. Indeed, UBTF binds continuously throughout the transcribed regions of the active rDNA genes, where it creates a 15.7 kbp long DNase I hypersensitive Nucleosome-Free Region (NFR) that is bounded upstream by CTCF and the enhancer associated Spacer Promoter, and flanked by the nucleosomal InterGenic Spacer (IGS) [25,26] (Fig 1B). Despite this, targeted gene inactivation has unequivocally shown UBTF to be also essential for recruitment of SL1 to both the major 47S promoter and the enhancer associated Spacer Promoter in mouse [25,27].
Here we extend recent studies of UBTF and RRN3 [25,27] to a conditional cell culture model for the TAF1B (TAF68) subunit of SL1, to the roles of the UBTF variants and to a UBT-F-E210K mutation recently identified as the cause of a recurrent pediatric neurodegeneration syndrome. The resulting data provide the first in cell test of the requirement for an RPI-specific TAF and a significant new insight into RPI preinitiation complex formation. The data resolve key questions surrounding rDNA promoter recognition and RPI pre-initiation complex formation by showing that the UBTF1 splice variant displays a striking specificity for the RPI promoter sequences only when in the presence of a functional SL1. They further suggest an "induced-fit" model of promoter recognition in which UBTF plays an architectural role to model rDNA conformation to fit SL1 and hence catalyze its recruitment. Our findings further suggest that the fundamental cause of the UBTF-E210K pediatric neuroregression syndrome is a partial defect in SL1-UBTF cooperation leading to reduced RPI preinitiation complex formation.

Results
The TAF1B subunit of SL1 is essential for mouse development and rDNA transcription Mouse lines carrying a targeted "Knockout First" insertion in the gene for TAF1B (TAF68), were established and these crossed to generate lines carrying either conditional Taf1b flox or Taf1b Δ null-alleles (S1 Fig). Mice heterozygous for a Taf1b Δ allele were found to be both viable and fertile and the null-allele was propagated at near Mendelian frequency (S1 Table). However, no Taf1b Δ/Δ homozygous offspring (pups) were identified and genotyping of embryos detected no Taf1b Δ/Δ homozygotes at stages 6.5 and later. It was therefore concluded that Taf1b was essential for mouse development beyond blastula, (see S1 Text for more detail).

Taf1b deletion induces nucleolar stress
Deletion of the Taf1b gene also led to a disruption of nucleolar structure characteristic of nucleolar stress. Prior to TAF1B depletion, immunofluorescence imaging (IF) of conditional MEFs revealed the typical punctate sub-nucleolar pattern of RPI and UBTF overlapped by fibrillarin (FBL) staining indicative of transcriptionally active rDNA gene units (0h post 4-HT, in Figs 1D and S3). During TAF1B depletion, RPI and UBTF staining collapsed into common intense foci that were often arranged in pairs around more central FBL staining (e.g. 96 and 120h post 4-HT in Figs 1D and S3). At later times, UBTF became highly condensed and partially segregated from RPI while FBL dispersed throughout the nucleus (120h and 144h post 4-HT in Figs 1D and S3). These changes were consistent with the nucleolar changes previously observed on inactivation of RPI transcription either by RRN3 gene deletion or CX-5461 drug inhibition [25,29]. Despite this, cells continued to slowly divide for at least 120h post 4-HT and remained >95% viable as determined by trypan blue exclusion and MTT tetrazolium reduction assays (S4 Fig). Loss of TAF1B prevents SL1 recruitment but has only a small effect on "active" rDNA chromatin Chromatin Immunoprecipitation (ChIP-qPCR) revealed that deletion of the TAF1B gene essentially eliminated TAF1B binding at the 47S and Spacer rDNA promoters in both MEFs and ESCs. It also prevented recruitment of the SL1 subunits TBP and TAF1C at both promoters (Figs 2B and S5). Thus, loss of TAF1B functionally inactivated SL1 and prevented preinitiation complex formation, explaining the suppression of rDNA transcription. In contrast, loss of TAF1B had only a limited effect on UBTF binding across the 47S gene body where it predominantly replaces histone-based chromatin [25,26] (see ETS and 28S amplicons in Figs 2B and S5). This was further confirmed by DNase-Seq analysis that showed little reduction in the pattern of rDNA hypersensitivity on TAF1B loss (S6 Fig). Consistent with this, psoralen accessibility crosslinking (PAC) also indicated only a small reduction in the "active" form of rDNA chromatin previously shown to depend on UBTF [25][26][27] (Fig 2C). Nevertheless, an increase in the mobility of the "active" PAC band on TAF1B loss suggested some degree of rDNA chromatin compaction, most probably related to the concomitant loss of RPI loading. A similar observation was made when RRN3 was inactivated [25,30].

The recruitment of UBTF and SL1 to the rDNA promoters is highly cooperative
Though TAF1B depletion and the loss of SL1 recruitment to the rDNA had only a small effect on UBTF binding across the gene body, inspection of the ChIP-qPCR mapping suggested that selective depletion did occur at both rDNA promoters (compare SpPr and T0/Pr with ETS and 28S amplicons in Figs 2B and S5). This was confirmed using the higher resolution of Deconvolution ChIP-Seq mapping (DChIP-Seq) [10,25]. Before TAF1B depletion, DChIP-Seq revealed overlapping peaks of SL1 (TAF1B) and UBTF at both rDNA promoters in conditional MEFs ( Fig 3A). Subsequent TAF1B depletion essentially eliminated it from the promoters but also strongly suppressed the overlapping peak of UBTF, and this same effect was also seen in TAF1B conditional mESCs (S7A Fig). Despite the TAF1B-dependent loss of UBTF from the rDNA promoters, its binding profile elsewhere across the rDNA was unaffected, though a

PLOS GENETICS
UBTF1-SL1 cooperative rDNA promoter recognition and the UBTF-E210K syndrome generalized 25 to 50% loss of UBTF respectively in mESCs and MEFs was observed. The dependence of UBTF binding on TAF1B, and hence on functional SL1, was most evident in DChIP-Seq difference maps, which showed strong suppression of UBTF specifically at 47S and Spacer promoters in both MEFs and mESCs types (Figs 3A and 3B and S7A, S7B).
The generation of DChIP-Seq profiles at differing degrees of TAF1B depletion allowed a quantitative estimate of the interdependence of UBTF and SL1 binding at both rDNA promoters (see Materials and Methods and S8 Fig). The data revealed near linear relationships between SL1 and UBTF occupancy and confirmed that their binding was strongly interdependent at either promoter ( Fig 3C). This was particularly striking given that these promoters share only 26% base sequence identity, no more than expected for two sequences chosen at random. Interestingly, the data ( Fig 3C) potentially also suggested a 2-fold difference in the relative SL1: UBTF stoichiometries between the Spacer and the 47S promoter, possibly a factor in their differential promoter strengths and functionalities.

UBTF1, the longer of the two Ubtf variants is selectively recruited to the rDNA promoters
The observation that UBTF recruitment depended on SL1 specifically at the rDNA promoters but not elsewhere across the rDNA repeat suggested that the UBTF variants might be important in this specificity. Mammals express two splice variants of UBTF, both UBTF1 and UBTF2 encompass six tandem HMGbox DNA binding domain homologies but differ in HMGB-box2, a central segment of which is deleted in UBTF2 (Fig 4A). While MEFs express both forms of UBTF, ESCs naturally express exclusively UBTF1 (S7C Fig). Promoter recruitment of UBTF1 in these cells was found to be strongly suppressed on depletion of functional SL1 (S7A and S7B Fig). However, this left open the question of whether UBTF2 could also cooperate, albeit less efficiently, with SL1 in preinitiation complex formation. To answer this question, we determined the distribution of each UBTF variant across the rDNA in MEFs.
Pools of NIH3T3 MEF clones expressing 3xFLAG-tagged UBTF1 or UBTF2 at sub-endogenous levels were selected and subjected to DChIP-Seq mapping (Figs 4B and S9A and S9B). The profiles of the 3xFLAG-UBTF1 and -UBTF2 binding closely followed that of total endogenous UBTF across most of the rDNA, however, it was significantly different at the Spacer and 47S promoters. Characteristic peaks of UBTF were present at both promoters in the UBTF1 profile but were absent in the UBTF2 profile. This differential promoter binding was most evident in UBT-F1-UBTF2 difference maps (Figs 4B and 4C and S9B). Quantitative analysis of the variant UBTF occupancy profiles (Experimental Procedures and S10 Fig) showed greater than 4 times more UBTF1 than UBTF2 at the rDNA promoters ( Fig 4D). However, our previous studies of the UBTF-DNA complex showed that a UBTF dimer contacts contiguously 130 to 140 bp of DNA, arguing that each 150-170bp rDNA promoter could interact with at most two dimers of UBTF [10,24]. Hence, the rDNA promoters appear to predominantly, if not exclusively, recruit UBTF1.
The combined data from MEFs and ESCs showed that formation of the RPI preinitiation complex in cell involves a cooperation between UBTF1 and SL1. Further, this same cooperation occurs at both Spacer and 47S promoters despite their unrelated base sequences. Since the only difference between UBTF1 and UBTF2 lies in the structure of HMGbox2, this domain must play an important role in UBTF-SL1 cooperation and RPI promoter recognition.

UBTF2 is not detected in the preinitiation complex either in the presence or absence of UBTF1
Given that the UBTF variants can heterodimerize [31], it was surprising that little or no UBTF2 was detected within the preinitiation complexes at either rDNA promoter. This left open the question of whether HMGbox2 splicing in UBTF2 eliminates or simply reduces the ability of this variant to cooperate in preinitiation complex formation. To test this, we investigated promoter recruitment of each of the UBTF variants separately. MEFs conditional for both UBTF forms (Ubtf fl/fl /ER-Cre +/+ /p53 -/-) and expressing 3xFLAG-tagged UBTF1 or UBTF2 at sub-endogenous levels were isolated and subjected to DChIP-Seq mapping before and after deletion of endogenous UBTF (Figs 5A, and S11A and S11B). 3xFlag-UBTF1 was recruited to both Spacer and 47S promoters and this recruitment was enhanced by the loss of endogenous UBTF, as might be expected if this variant were essential. In contrast, 3xFlag-UBTF2 was not detected at the promoters either before or after UBTF deletion, despite 3xFlag-UBTF2 displaying a typical binding profile elsewhere across the rDNA (S11B Fig). Quantitative analysis of the UBTF1 and -2 occupancy profiles in independent experiments suggested that, in the absence of endogenous UBTF, UBTF2 recruitment at the Spacer and 47S promoters was respectively less than 2% and 10% that of UBTF1 (Figs 5B and S12). UBTF1 therefore represents the predominant and very possibly the sole form of UBTF able to cooperate with SL1 in pre-initiation complex formation.

An HMGbox2 mutation associated with neuroregression is unlikely to directly affect DNA interactions
An E>K mutation at residue 210 in HMGbox2 of UBTF was recently shown to be the cause of a recurrent human pediatric neuroregression syndrome [6,[12][13][14]. The key role of HMGbox2 revealed by our study suggested that this mutation might affect the formation of the RPI preinitiation complex in vivo and possibly explain the origin of this syndrome. Unfortunately, as yet the structure of HMGbox2 has not been determined experimentally. However, despite a high degree of primary sequence variability, HMGboxes display very similar tertiary structures and DNA contacts, making them accessible to molecular modelling (summarized in S13A Fig). Modelling of UBTF-HMGbox2 revealed a typical HMGB saddle structure with basic residues K198, 200 and 211 lining the DNA binding underside (S13B Fig). Significantly, the sidechain of residue K211, a highly conserved minor groove contact in other HMGboxes, was predicted to be correctly oriented towards the DNA. In contrast, the sidechain of the immediately adjacent E210 residue was predicted to point away from the DNA and lay on the seat of the HMGbox saddle. Furthermore, this predicted sidechain position was unaffected by the E210K mutation (S13C Fig). We concluded that the E210K mutation was extremely unlikely to affect HMGbox2 interactions with the DNA. However, the mutation would create a significant change in the electrostatic surface potential of the seat of HMGbox2 (S13D Fig), suggesting that it could have an effect on interactions with protein factors such as SL1. However, to date we have been unable to detect such effects in UBTF Pull-Down and coIP assays.

The UBTF HMGbox2 E210K mutation suppresses 47S rRNA synthesis in a MEF model
Given that the sequences of human and mouse UBTF are 99% identical, we took advantage of a recently generated ubtf E210K mouse knock-in model. Mice homozygous for the E210K mutation are viable but exhibit behavioral abnormalities that worsen with increasing postnatal age (details will be described elsewhere). Ubtf E210K/E210K MEFs were isolated from these mice and found to proliferate somewhat more slowly than MEFs from isogenic wild type littermates, Peak fit analysis of UBTF1 and UBTF2 occupancies over the Spacer and 47S promoter revealed that UBTF1 was at least 4 times more prevalent at either promoter. The data derive from two biological replicas and the SEM is shown.
https://doi.org/10.1371/journal.pgen.1009644.g004 doubling times of 35h and 31h respectively (Fig 6A). Metabolic RNA labelling also revealed a >40% lower rate of de novo 47S pre-rRNA synthesis in the mutant as compared to the wild type MEFs (Fig 6B), however, no overt rRNA processing defects were detected (S14A Fig). The mutant MEFs also contained 30% less total cellular RNA, (~80% of which is rRNA), than wild type MEFs (Fig 6C). Thus, the E210K mutation in UBTF significantly reduced the capacity of

The E210K mutation also enhances UBTF1 levels and the fraction of active rDNA repeats
Unexpectedly, the ubtf E210K/E210K MEFs displayed a significant increase in the fraction of activated rDNA copies determined by PAC (Fig 6D and 6E), and this corresponded to an equally significant increase in the expression of the UBTF1 variant both at the protein and mRNA levels (Fig 6F and 6G). A similar bias towards UBTF1 expression was also observed in brain tissue of mutant mice (Fig 6H and 6I). This suggested the interesting possibility that the enhanced levels of UBTF1 in the mutant MEFs revealed an inherent feedback mechanism regulating splicing. In this way the cell might control the fraction of active rDNA copies and hence potentially also rRNA synthesis. However, it will first be necessary to determine whether the E210K mutation directly affected usage of the adjacent splice junctions (see S14B Fig). In either scenario, the increase in active rDNA copies would normally be expected to enhance rRNA synthesis and cell growth in the mutant MEFs. This was clearly not the case since the E210K mutant MEFs displayed both reduced rRNA synthesis and rRNA accumulation and proliferated more slowly than matched wild type MEFs (Fig 6A-6C).

The E210K mutation reduces RPI loading and SL1 and UBTF recruitment to the rDNA promoters
The spatial distribution of UBTF in the nuclei of E210K mutant MEFs closely resembled that in the matched wild type MEFs, UBTF precisely colocalizing with RPI and Fibrillarin (S15 Fig). Hence the mutation had no overt effect on nucleolar topology and did not cause nucleolar stress for example as seen on TAF1B loss (Figs 1D and S3). However ChIP-qPCR analyses revealed that RPI loading across the rDNA was reduce by >40% in the ubtf E210K/E210K mutant MEFs, explaining the observed reduction in pre-rRNA synthesis in these cells (compare RPI loadings in Fig 7A with de novo rRNA synthesis levels in 6B). Recruitment of TAF1B (SL1) and UBTF to both Spacer and 47S rDNA promoters was somewhat reduced in the mutant MEFs, though less than RPI loadings ( Fig 7B). Thus, the E210K UBTF mutation most probably reduced pre-initiation complex formation, consistent with it affecting UBTF-SL1 cooperation. The higher resolution of DChIP-Seq further showed that occupancy of UBTF at both 47S and Spacer promoters was selectively reduced by the E210K mutation (Fig 7C), again consistent with a reduced UBTF-SL1 cooperativity. The reduction of UBTF at the rDNA promoters was particularly apparent in difference maps between wild type and E210K mutant MEFs (Figs 7D and S16). The reduction in UBTF was especially strong at the Spacer promoter and corresponded with a similar reduction in TAF1B occupancy and in RPI recruitment (Fig 7D and MEF isolates. C) Per cell total cellular RNA content of ubtf E210K/E210K and wildtype MEFs. The data in A, B and C derive from two or more biological replicas in each of which a minimum of two independently isolated mutant and wild type MEF cultures were analyzed in parallel. D) PAC analysis of ubtf E210K/E210K and wild type MEFs. Upper panel shows an example of the active rDNA "a" and inactive "i" profiles for the 1.3kbp BamHI-BamHI 47S coding region fragment (see Fig 2A) and the lower panel the corresponding band intensities. E) Active rDNA fractions were estimated from the combined curve fit analysis of 1.3, 2.4 and 4.7kbp BamHI-BamHI rDNA fragment PAC profiles. The data derive from three independent Ubtf E210K/E210K and two wild type MEF isolates in two PAC biological replicas and are plotted to show median, upper and lower data quartiles and outliers. F) and G) Analysis of UBTF1 and 2 levels in Ubtf E210K/E210K and wild type MEF isolates. Panel F shows a typical Western analysis of UBTF variants in these MEFs and panel G quantitative estimates of relative UBTF1/UBTF2 protein and mRNA ratios in these MEFs. H) and I) Show similar estimates of relative UBTF1/UBTF2 protein and mRNA ratios in Cortex and Cerebellum tissue from matched Ubtf wt/wt , Ubtf wt/E210K and Ubtf E210K/E210K adult mice. Error bars throughout indicate SEM.
https://doi.org/10.1371/journal.pgen.1009644.g006 . The data strongly suggested that the E210K mutation causes a small but significantly reduced ability of UBTF to cooperate with SL1 in the formation of the RPI preinitiation complex, and together point to a reduction in the efficiency of RPI transcription initiation as the fundamental cause of the UBTF-E210K neuroregression syndrome. Thus, these data indirectly support the central role of the SL1-UBTF1 cooperation in determining RPI preinitiation complex formation and efficient rDNA transcription in vivo.

Discussion
Formation of the RPI preinitiation complexes on the rRNA genes (rDNA) in mammals determines as much as 35% of total nuclear RNA synthesis but is still poorly understood. In particular, prior to our study it was unclear how, or indeed if, the multi-HMGbox factor UBTF played a role in targeting the TBP-TAF I complex SL1 to the rDNA promoters or if it simply acted as a general chromatin replacement protein. UBTF displays little or no DNA sequence binding specificity and binds throughout the 15kbp NFR of the mouse and human rDNA [26]. We previously showed that conditional deletion of the UBTF gene inactivated rDNA transcription and allowed the reformation of nucleosomes across the rDNA [25,27]. Hence, it appeared that UBTF may simply facilitate SL1 recruitment by eliminating the obstacles presented by nucleosomes. Genetic ablation of the SL1 subunit TAF1B has now revealed an unexpected sequence specific role for UBTF and has suggested a novel induced-fit model for RPI preinitiation complex formation.
Though it was assumed from cell-free studies that SL1 would be essential for cell and organism survival due to its role in rDNA transcription, this had not been directly tested. Our data show that homozygous deletion of the gene for the SL1 subunit TAF1B prevented mouse development beyond blastocyst while heterozygous deletants were viable and fertile. Hence, as observed for the other RPI basal factors UBTF and RRN3, TAF1B is an essential factor in mouse. Conditional deletion of Taf1b in MEF and mES cell culture was also found to arrest rDNA transcription and to cause severe disruption of nucleolar structure characteristic of nucleolar stress [25,29]. Depletion of TAF1B also prevented promoter recruitment of TAF1C and TBP subunits of SL1 and hence PIC formation at both the 47S pre-rRNA and the Enhancer-associated Spacer rDNA promoters. Quite unexpectedly, this also led to a loss of UBTF at both promoters, though not elsewhere across the rDNA NFR. ChIP-qPCR and high resolution DChIP-Seq showed that the loss of UBTF from the promoters was proportional to the loss of SL1, strongly arguing that binding of these two basal factors was cooperative. Conversely, we had previously shown that in cell loss of UBTF eliminated SL1 from the rDNA promoters [25,27], consistent with the cooperative recruitment of these factors. Data from early cell-free studies had suggested two possible scenarios for RPI preinitiation complex formation, either SL1 recruitment depended on pre-binding of UBTF or conversely that UBTF recruitment depended on pre-binding of SL1 [23,32]. Our data resolve this contradiction by showing that in cells UBTF and SL1 binding at the rDNA promoters is strongly interdependent, neither factor being recruited in the absence of the other. The lack of UBTF binding at the promoters in the absence of SL1 was particularly surprising, especially so since UBTF remained bound throughout the rest of the rDNA NFR and even at immediately promoter adjacent sites. Thus, in the absence of SL1 the RPI promoters rather than being preferred sites of UBTF binding, as D and E show an enlargement of the rDNA promoter and Enhancer region and a difference map of UBTF occupancy (mutant-wild type MEFs). A full-width UBTF difference map is shown in S16 Fig. The  usually assumed, are quite on the contrary sites of low affinity within an NFR continuum of higher affinity sites.
Our data further revealed the key importance of the UBTF1 variant in the recruitment of SL1 to the rDNA promoters. Mouse and human cells express varying levels of the UBTF1 and UBTF2 splice variants that differ by a 37a.a. deletion in HMGbox2 of UBTF2 (Fig 4). By mapping these variants across the rDNA we found that UBTF1 was recruited to the rDNA promoters at least four times more often than UBTF2, though the data were also consistent with the exclusive recruitment of UBTF1 at the promoters. In contrast, UBTF1 and UBTF2 bound indistinguishably elsewhere across the rDNA. Since only UBTF1 is present in mESCs, deletion of Taf1b in these cells also clearly demonstrated that promoter recruitment of UBTF1 depended on SL1 (S5 and S7 Figs). Thus, formation of the RPI preinitiation complex is driven predominantly if not exclusively by a cooperation between SL1 and UBTF1. This provides the first mechanistic explanation for why UBTF1 is absolutely required for rDNA activity in vivo [33].
Recruitment of UBTF1 and SL1 was found to be cooperative not only at the major 47S rDNA promoter but also at the enhancer associated Spacer promoter. Since these promoters display little DNA sequence homology, this raised the question of what in fact defines an RPI promoter and how is it recognized? Our data clearly show that promoter recognition involves the cooperative recruitment of UBTF1 and SL1. Previous data showed that UBTF interacts with SL1 via its highly acidic C-terminal tail, an~80 a.a. domain containing 65% Asp/Glu residues [34]. However, though this domain is required to enhance cell-free transcription [35] and has been shown to bind TBP [36], it is anyhow present in both UBTF variants. So, while it might play some role in bringing SL1 to the promoters it cannot explain their selective binding of UBTF1. Co-immunoprecipitation also failed to detect a specific interaction between SL1 and the UBTF1 variant. Thus, it seems unlikely that the rDNA promoters are recognized by a pre-formed SL1-UBTF1 pre-initiation complex. Rather we suggest that promoter recognition involves the transient imposition of a specific DNA conformation by UBTF1 that is in turn locked into place by SL1 (Fig 8). There is significant precedent for such a mechanism, since the HMGboxes of UBTF were shown to induce in-phase bending and looping of a DNA substrate. Indeed, it was suggested that such a looping could position UCE and Core promoter elements ( Fig 1A) to facilitate their contact by SL1 [24,37,38]. Essentially, UBTF1 might transiently mould the promoter DNA to create an induced-fit for SL1, locking it in place. Since the 37 a.a. deletion in HMGbox2 of UBTF2 prevents this box from bending DNA [38], UBTF2 would mould the promoter DNA differently from UBTF1 and would therefore not induce the appropriate fit for SL1.
Though in the induced-fit model of rDNA promoter recognition direct UBTF1-SL1 contacts would not be essential, they may nonetheless play an important part in stabilizing the preinitiation complex. Indeed, our study of the UBTF-E210K recurrent pediatric neuroregression syndrome suggested that this was quite probably the case. Molecular modelling showed that while this E210K mutation in HMGbox2 of UBTF was very unlikely to affect interactions with DNA, it might possibly affect interactions with other proteins such as SL1. This said, we did not detect significant effects of the E210K mutation on UBTF1-SL1 interactions in pull-down assays. We found that introduction of the homozygous UBTF-E210K mutation in MEFs significantly reduced rDNA transcription rates, reduce total cellular RNA accumulation and slowed cell proliferation. In apparent contradiction to these effects, the E210K mutation enhanced expression of UBTF1 both in mutant MEFs and mouse tissues, and this led to an increase in the fraction of active rDNA copies, possibly as an attempt to compensate for reduced rDNA transcription. However, ChIP analyses further revealed that the UBTF-E210K mutation reduced the cooperative recruitment of SL1 and UBTF to the rDNA promoters. Thus, it appeared that the primary effect of the E210K-UBTF mutation was to limit PIC formation on the rDNA. This further emphasized the central importance of a functional cooperation between UBTF and SL1 in determining rDNA activity. It further suggested that the UBT-F-E210K neurodegeneration syndrome was caused by a subtle defect in PIC formation on the rDNA.
In summary, our study identifies the parameters that determine RNA polymerase I promoter recognition and preinitiation complex formation in vivo. We reveal the central importance of a cooperative interaction between the RPI-specific TBP complex SL1 and the UBTF1 splice variant in promoter recognition and propose an induce-fit model for pre-initiation complex formation that explains the functional differences between the UBTF splice variants in terms of their abilities to induce specific conformational changes in the rDNA promoter sequences. Our data further suggest that the UBTF-E210K recurrent neurodegeneration syndrome is caused by a subtle reduction in UBTF-SL1 cooperativity that leads to reduced rDNA transcription.

Ethics statement
All animal care and animal experiments were conducted in accordance with the guidelines provided by the Canadian Council for Animal Protection, under the surveillance and authority of the institutional animal protection committees of Université Laval and the Centre hospitalier universitaire de Québec (CHU de Québec). The specific studies described were performed under protocols #2014-100 and 2014-101 examined and accepted by the "Comité de protection des animaux du CHU de Québec". This ensured that all aspects of the work were carried out following strict guidelines to ensure careful, consistent, and ethical handling of mice.

Primary antibodies for Immunofluorescence, ChIP and Western blotting
Rabbit polyclonal antibodies against mouse UBTF, RPI large subunit (RPA194/Polr1A), and TAF1B were generated in the laboratory and have been previously described [25], anti-TAF1C was a gift from I. Grummt

Generation of Taf1b mutant mice
Taf1b+/-(targeted allele Taf1btm1a(EUCOMM)Hmgu) embryonic stem (ES) cells were obtained from EuMMCR and generated using the targeting vector PG00150_Z_6_E02. Two ES clones (HEPD0596_3_G02 and HEPD0596_3_H01) were each used to generate independent mouse lines using the services of the McGill Integrated Core for Animal Modeling (MICAM).

Embryo collection and genotyping
Heterozygous Taf1b Δ/wt mice were inter-crossed and embryos isolated, imaged and genotyped from pregnant females at E3.5, 6.5, 7.5, 8.5 and E9.5 as described in [25,27]. DNA from E3.5 embryos was amplified using the REPLI-g Mini kit (QIAGEN). Individual embryos were genotyped by PCR using the same primers as for mouse lines (S1A Fig).

Isolation and culturing of Taf1b conditional MEF and mES cells
Conditional TAF1B primary mouse embryonic fibroblasts (MEFs) were generated from E14.5 Taf1b fl/fl /ER-Cre +/+ /p53 -/and wild type control ER-Cre +/+ /p53 -/embryos as previously described [27,39], and were genotyped by PCR as described for mice, see S1A and S1B Fig. Mouse Embryonic Stem (mES) cells were derived from the inner cell mass of Taf1b fl/fl / ER-Cre +/+ and wild type control ER-Cre +/+ blastocysts essentially as published [40]. After establishment of the cells on feeder monolayers, they were adapted to feeder-independence on 2i/LIF N2B27 (ThermoFisher) free serum medium [41] and subsequently maintained in this medium. The Taf1b fl/fl /ER-Cre +/+ and control mESCs were genotyped as for MEFs.
Primary MEFs were also generated from E14.5 Ubtf E210K/E210K and wild type control Ubtf wt/ wt sibling embryos from three independent litters, immortalized by transfection with pBSV0.3T/t [27] and genotyped by base sequencing of PCR products generated using primers 5' CTGGGTGAAGTAGGCCTTGG and 5' CCAGGAGGGTAAGGTGGAGA flanking the mutation site. All MEFs were cultured in Dulbecco's modified Eagle medium (DMEM)-high glucose (Life Technologies), supplemented with 10% fetal bovine serum (Wisent, Life Technologies or other), L-glutamine (Life Technologies) and Antibiotic/Antimycotic (Life Technologies).

Inactivation of Taf1b or Ubtf in cell culture
Gene inactivation in MEFs and ESCs followed the previously described procedures [25,27]. Briefly, cells were plated in 6 cm petri dishes (0.8x10 6 cells each) and cultured for 18 hours in DMEM, high glucose, 10% fetal bovine serum or 2i/LIF N2B27 free serum medium as appropriate. For Taf1b or Ubtf inactivation, 4-hydroxytamoxifen (4-HT) was added in parallel to conditional and wild type control cell cultures to a final concentration of 50nM (the 0h time point for analyses). After 4 hr incubation the medium was replaced with fresh medium without 4-HT. Cell cultures were then maintained for the indicated times and systematically genotyped by PCR on harvesting.
Analysis of TAF1B and UBTF1/2 protein and mRNA levels TAF1B, and UBTF1/2 protein levels were monitored by Western blotting. At harvesting, cells were quickly rinsed in cold phosphate buffered saline (PBS), recovered by centrifugation (2 min, 2000 r.p.m.) and resuspended directly in SDS-polyacrylamide gel electrophoresis (SDS-PAGE) loading buffer [42]. After fractionation by SDS-PAGE, proteins were analysed by standard Western blotting procedures using an HRP conjugated secondary antibody and Immobilon chemiluminescence substrate (Millipore-Sigma). Membranes were imaged on an Amesham Imager 600 (Cytiva) and UBTF1/2 ratios were determined from lane scans using ImageJ [43] and Gaussian curve fit using MagicPlot Pro (Magicplot Systems). Relative UBTF1/ 2 mRNA levels were determined by PCR on total cDNA using primers bracketing the spliced sequences (5'TGCCAAGAAGTCGGACATCC and 5'TCCGCACAGTACAGGGAGTA). Products were fractionated by electrophoresis on a 1.5 or 2% agarose EtBr-stained gel, photographed using the G:BOX acquisition system (Syngene) and UBTF1/2 mRNA ratios determined using ImageJ and Gaussian curve fitting as for proteins.

Determination of rRNA synthesis rate
The rate of rRNA synthesis was determined by metabolic labelling immediately before cell harvesting. 10 μCi [ 3 H]-uridine (PerkinElmer) was added per 1ml of medium and cell cultures incubated for a further 30min to 3h as indicated. RNA was recovered with 1 ml Trizol (Invitrogen) according to the manufacturer's protocol and resuspended in Formamide (Invitrogen). One microgram of RNA was loaded onto a 1% formaldehyde/MOPS Buffer gel [44,45] or a 1% formaldehyde/TT Buffer gel [46]. The EtBr-stained gels were photographed using the G:BOX acquisition system (Syngene), irradiated in a UV cross-linker (Hoefer) for 5 min at maximum energy, and transferred to a Biodyne B membrane (Pall). The membrane was UV cross-linked at 70 J/cm 2 , washed in water, air dried and exposed to a Phosphor BAS-IP TR 2025 E Tritium Screen (Cytiva). The screen was then analyzed using a Typhoon imager (Cytiva) and quantified using the ImageQuant TL image analysis software.

Psoralen crosslinking accessibility and Southern blotting
The psoralen crosslinking accessibility assay and Southern blotting were performed on cells grown in 60 mm petri dishes and DNA was analyzed as previously described [47,48], using the 6.7kb 47SrRNA gene EcoRI fragment (pMr100) [47]. The ratio of "active" to "inactive" genes was estimated by analyzing the intensity profile of low and high mobility bands revealed by phospho-imaging on an Amersham Typhoon (Cytiva) using a Gaussian peak fit generated with MagicPlotPro (MagicPlot Systems LLC).

Indirect immunofluorescence (IF) microscopy
Cells were plated on poly-lysine treated coverslips and subjected to the standard 4-HT treatment to induce Taf1b deletion. At the indicated time points cells were rinsed with PBS, fixed in 4% PFA in PBS for 10 min and permeabilized with 0.5% Triton, PBS for 15 minutes. After a blocking step in PBS-N (PBS, 0.1% IGEPAL (Sigma)), 5% donkey serum, cover slips were incubated with primary antibodies in PBS-N, 5% donkey serum for~16h at 4 deg. C. RPI was detected using a combination of mouse anti-A194 and A135 antibodies (#SC-293272, #SC-48385), fibrillarin with goat anti-FBL (#LS-C155047), and UBTF with rabbit anti-UBTF (inhouse #8). Cells were incubated for~2h at room temperature with the appropriate AlexaFluor or Dylight 488/568/647 conjugated secondary antibodies (ThermoFisher / Jackson ImmunoResearch) and counterstained with DAPI. After mounting in Prolong Diamond (Thermo-Fisher), epifluorescent 3D image stacks were acquired using a Leica SP5 II scanning confocal microscope and LAS-AF (Leica Microsystems) and Volocity (Quorum Technologies) software.

Chromatin immunoprecipitation (ChIP)
Cells were fixed with 1% formaldehyde for 8 min at room temperature. Formaldehyde was quenched by addition of 125 mM Glycine and cells harvested and washed in PBS. Nuclei were isolated using an ultrasound-based nuclei extraction method (NEXSON: Nuclei Extraction by SONication) [49] with some modifications. Briefly, for all cell types, 33 million cells were resuspended in 1.5 ml of Farnham lab buffer (5 mM PIPES pH 8.0, 85 mM KCl, 0.5% IGEPAL, protease inhibitors). Cell suspensions were sonicated in 15 ml polystyrene tubes (BD #352095) using 3 to 4 cycles of 15 sec on: 30 sec off at low intensity in a Bioruptor (Diagenode). After recovery of the NEXSON-isolated nuclei by centrifugation (1000g, 5 min), nuclei were resuspended in 1.5 ml of shearing buffer (10 mM Tris-HCl pH 8.0, 1 mM NaEDTA, 0.1% SDS, protease inhibitors) and sonicated for 25 min, 30 sec on: 30 sec off, at high intensity. Each immunoprecipitation (IP) was carried out using the equivalent of 16 x 10 6 cells as previously describe [25]. To map UBTF1 and UBTF2 variants cDNAs encompassing the complete coding regions (M61726 / M61725) were subcloned into pCDNA3Hygro-3xFLAG-C1 (N. Bisson) and verified by Sanger sequencing. The resulting pC3xFLAG-UBF1 and -UBF2 (lab. stocks #2072, #2073) constructs were used to transfect NIH3T3 or conditional Ubtf (Ubtf fl/fl /ER-Cre +/ + /p53 -/-) MEFs [25] and cultures selected with Hygromycin. In the case of NIH3T3, pools of positive clones expressing UBTF1 or UBTF2 at sub-endogenous levels were then subjected to parallel ChIP for FLAG-UBTF1/2 (anti-FLAG) and total UBTF. Essentially the same procedure was used for the MEFs except that FLAG-UBF1 and 2 expressing cell clones were first isolated and these used in ChIP experiments 5 days after Ubtf inactivation by 4-HT addition or mock inactivation to eliminate endogenous UBTF.

ChIP-Seq and data analysis
ChIP DNA samples were quality controlled by qPCR before being sent for library preparation and 50 base single-end sequencing on an Illumina HiSeq 2500 or 4000 (McGill University and Genome Quebec Innovation Centre). Sequence alignment and deconvolution of factor binding profiles to remove sequencing biases (Deconvolution ChIP-Seq, DChIP-Seq) were carried out as previously described [10,25]. The manual for the deconvolution protocol and a corresponding Python script can be found at https://github.com/mariFelix/deconvoNorm. Gaussian curve fitting to transcription factor binding profiles was perform using MagicPlot Pro (Magicplot Systems) on data from the DChIP-Seq BedGraph files. The raw sequence files and the processed deconvolution BedGraphs have been submitted to ArrayExpress under accession E-MTAB-10433 and E-MTAB-11385.

DNase I accessibility mapping (DNase-Seq)
DNase-Seq analysis of MEFs was carried out following the published protocol [50], the only modifications being adjustment of digestion level to obtain the required fragmentation. DNA samples were quality controlled by qPCR before size selection following the published protocol.

Cell Proliferation assay
Taf1b fl/fl /ER-Cre +/+ /p53 -/and isogenic control Taf1b wt/wt /ER-Cre +/+ /p53 -/-MEFs were continuously cultured for more than a week prior to assay. Cells were plated at~500 per well in 96-well plates, treated as indicated above with 50 ng.ml -1 4-HT for just 4h and cultured for a further six days. At each timepoint, duplicate wells were treated with Hoechst 3342 (Invitrogen, Thermo Fisher Scientific) for 45min. Images were acquired using Cytation5 (Cell Imaging Multi-Mode Reader by BioTek) and cell counts for each clone were determined using the Gen5 software. Cells from three ubf E210K/E210K MEF clones (#1, #2, #3) and two isogenic ubf wt/ wt MEF clones (#3, #4) were cultured and cell proliferation analyzed in the same way.

Viability assay
Cell Viability was assayed using the colorimetric MTT kit for cell survival and proliferation (Millipore, sigma) and the Trypan Bleu exclusion test (Gibco, Thermo Fisher Scientific). Taf1b fl/fl /ER-Cre +/+ /p53 -/and isogenic control Taf1b wt/wt /ER-Cre +/+ /p53 -/-MEFs were cultured in 96-well plates at a density of 1 to 6 x 10 3 cells/well and were treated with 4-HT as described above. The assay was performed at different time points post 4-HT by adding MTT to culture wells following the manufacturer's instructions and incubating for 5h at 37˚C. MTT absorbance readings were then normalized to cell numbers. For the Trypan Bleu exclusion test quadruplicate cell cultures cells were stained and manually counted at different time points after 4-HT treatment using a hemacytometer.

Total RNA Extraction and quantification
Cells were trypsinized, counted and total RNA was recovered from 3x10 6 cells using 1 ml of Trizol (Invitrogen, Thermo Fisher Scientific) according to the manufacturer's protocol. RNA yields were determined using Qubit RNA BR (Invitrogen, Thermo Fisher Scientific).
Supporting information S1 Text. The TAF1B gene is essential for mouse development beyond early blastula. (DOCX) S1 Table. Numbers and genotypes of embryos and pups derived from mating of Taf1b+/mice. (DOCX) S1 Fig. Construction and phenotypic effects of the mutant Taf1b alleles. A) Organisation of the first 8 exons of the mouse Taf1b gene (Taf1bwt), and the "flox-neo" insertion, "floxed" and alleles indicating the position of inserted FRT and Lox sites and the inactivated (Taf1bΔ) allele after Lox site recombination to delete exons 4 and 5. The positions of genotyping primers A to D are also indicated. B) Examples of mouse PCR genotyping. C) alignment of the N-terminal sequence of wild type TAF1B with the predicted residual TAF1B peptide encoded by the Taf1bΔ allele. D) Typical images of mouse embryos at 3.5 dpc derived from Taf1b wt/Δ mouse crosses. The corresponding genotypes and numbers of embryos in each class are indicated, see also S1 Table.  examples of UBTF and TAF1B enrichment profiles over the 47S promoter region before and after Taf1b inactivation are shown, (dark blue line), and the best Gaussian peak fits to these profiles (dashed red line). In the case of TAF1B the profile closely followed a single Gaussian peak from which both the position and relative occupancy were determined. Since UBTF was present not only at the promoter but also over the adjacent regions, curve fits were made using three Gaussians peaks, and the central one used to estimate relative occupancy. and Flag-UBTF2 enrichment profiles respectively over the 47S and Spacer promoter regions, (dark blue line). The best Gaussian peak fits to these profiles are shown (dashed red line), as are the individual Gaussian peaks used to estimate relative promoter occupancy. Since UBTF was present not only at each promoter but also over adjacent regions, curve fits were made using three, or in the case of the Spacer promoter four, Gaussians peaks. (TIF) S11 Fig. A) expression of 3Flag-UBTF1 and 2 and endogenous UBTF in conditional MEFs before and after UBTF deletion as revealed by Western blot using anti-Flag and anti-UBTF antibodies. B) D-ChIP mapping of UBTF forms across the full rDNA transcription unit as in Sequence alignment of the HMGboxes 1 and 2 of human and mouse UBTF1 (NM_014233-2, NP_035681) with HMGbox2 of Xenopus laevis UBTF1a (CAA42523.1) and the HMGbox of SOX2 (P48431). The positions of the predicted DNA intercalating residue, the E210K mutation and the adjacent conserved basic DNA contacting residue are indicated as are the positions of the α-helical segments. B) Comparative molecular modelling of UBTF HMGbox2 using as templates the structures 1k99 (human UBTF HMGbox1) and 6hb4 (human mitochondrial transcription factor A, TFAM). The two predicated structures were generated by SWISS-MODEL [51] and are shown individually and as an aligned overlay generated in Chi-meraX-1.1.1 [52]. Comparison of these structures using the Matchmaker routine in Chi-meraX-1.1.1 revealed an RMSD of 1.215 Å over 41 of 72 alpha-carbons, including those of helix 1 affected by the E210K mutation. C) The predicted positions and orientations of the E210 and K210 residues within the HMGbox2 of UBTF1 are shown relative to the adjacent conserved basic residue at position 211, which is a lysine in UBTF/UBTF. The likely other DNA minor groove contacting residues K198 and K200 are also shown. D) The predicted surface electrostatic potential of the wild type, left, and the mutant, right, HMGbox2. Blue indicates a positive and red a negative potential. Position of changes in surface potential due to the E210K mutation are enclosed by an ellipse. (TIF) S14 Fig. A) RNA metabolic pulse labelling to reveal 47S pre-rRNA synthesis and processing products in Ubtf E210k/E210K knock-in and wild type Ubtf wt/wt MEFs. Gel fractionation of RNA after increasing labelling times is shown for two individual (numbered) MEF isolates. B) DNA base sequence of the differentially spliced region of mouse Ubtf gene showing coding exon 6, the differentially spliced coding exon 7 and coding exon 8 in black and the intervening introns in red (taken from GRCm38:11:102303960:102320342). The position of the G>A gene mutation, the cause of the E210K change in the UBTF protein, is indicated as are the potential splice branch sites in the intervening introns that most closely fit the yTnAy consensus [53]. SIB, University of Zürich) for making his computing facilities and advice available to us. We further wish to acknowledge the help provided by Colyn Crane-Robinson in interpreting the structural and functional implications of the E210K HMGbox mutation.

Author Contributions
Conceptualization: Tom Moss.