The transcription and export complex THO/TREX contributes to transcription termination in plants

Transcription termination has important regulatory functions, impacting mRNA stability, localization and translation potential. Failure to appropriately terminate transcription can also lead to read-through transcription and the synthesis of antisense RNAs which can have profound impact on gene expression. The Transcription-Export (THO/TREX) protein complex plays an important role in coupling transcription with splicing and export of mRNA. However, little is known about the role of the THO/TREX complex in the control of transcription termination. In this work, we show that two proteins of the THO/TREX complex, namely TREX COMPONENT 1 (TEX1 or THO3) and HYPER RECOMBINATION1 (HPR1 or THO1) contribute to the correct transcription termination at several loci in Arabidopsis thaliana. We first demonstrate this by showing defective termination in tex1 and hpr1 mutants at the nopaline synthase (NOS) terminator present in a T-DNA inserted between exon 1 and 3 of the PHO1 locus in the pho1-7 mutant. Read-through transcription beyond the NOS terminator and splicing-out of the T-DNA resulted in the generation of a near full-length PHO1 mRNA (minus exon 2) in the tex1 pho1-7 and hpr1 pho1-7 double mutants, with enhanced production of a truncated PHO1 protein that retained phosphate export activity. Consequently, the strong reduction of shoot growth associated with the severe phosphate deficiency of the pho1-7 mutant was alleviated in the tex1 pho1-7 and hpr1 pho1-7 double mutants. Additionally, we show that RNA termination defects in tex1 and hpr1 mutants leads to 3’UTR extensions in several endogenous genes. These results demonstrate that THO/TREX complex contributes to the regulation of transcription termination.


Introduction
In eukaryotes, mRNAs are generated by several dynamic and coordinated processes including transcriptional initiation, elongation and termination, as well as splicing and nuclear export. Failure in any of these processes has a profound impact on the level and identity of transcripts [1][2][3]. These transcriptional steps are sequentially orchestrated by a multitude of RNA-binding protein complexes that co-transcriptionally couple with the nascent RNA [4]. For example, protein complex required for transcription termination cleaves pre-mRNA close to RNA polymerase II (RNAPII) and adds a poly (A) tail to the 3'end of nascent RNA [3]. Pre-mRNA cleavage exposes the 5'end of nascent mRNA to 5'-3' exonucleases which degrades the RNA attached to RNAPII, leading to transcription termination [5].
Cleavage and polyadenylation define the transcription termination at a given locus and has a decisive role in regulating gene expression as it can influence stability and translation potential of the RNA via the inclusion of regulatory sequence elements [6]. Moreover, transcription termination avoid interference with the transcription of downstream genes and facilitates RNAPII recycling [7]. It also prevents synthesis of antisense RNAs which can have a severe effect on RNA production and overall gene expression [8]. Additional regulatory role of transcription termination is the synthesis of chimeric transcripts formed by the tethering of two neighboring genes on the same chromosomal strand [8]. Considering its importance, molecular mechanisms which regulate RNA termination are relatively poorly understood.
After transcription termination, nascent RNA is assembled in a ribonucleoprotein complex and is delivered for RNA export into the cytosol. Similar to other steps in RNA biogenesis which are closely coupled in a sequential manner, it is likely that transcription termination is associated with nuclear export of RNAs. The TREX (TRanscription-EXport) protein complex has emerged as an important component in coupling transcription with RNA processing and export [9]. In metazoans, TREX consists of the THO core complex, which includes THO1/ HPR1, THO2, THO3/TEX1, THO5, THO6 and THO7 [10]. The proteins associating with the THO core components and forming the TREX complex include the RNA helicase and splicing factor DDX39B (SUB2 in yeast) as well as the RNA export adaptor protein ALY (YRA1 in yeast) [10]. TREX is co-transcriptionally recruited to the nascent mRNA and regulates splicing, elongation and export [11]. Moreover, THO/TREX complex is required for genetic stability as it is involved in preventing DNA:RNA hybrids that lead to transcription impairment and are responsible for genetic instability phenotypes observed in these mutants [12]. The genes encoding the THO core components are conserved in plants, including in the model plant Arabidopsis thaliana [13,14]. A. thaliana mutants defective in THO components show a wide range of phenotypes, from no obvious alteration and relatively mild phenotypes to lethal phenotypes, suggesting overlapping but independent functions of these components [14][15][16]. A. thaliana mutants in THO components, including HPR1, THO2 and TEX1, show defects in small RNA biogenesis, mRNA elongation, splicing and export, but no defect in mRNA termination has been reported [14][15][16][17][18][19][20].
In this work, we explored the regulatory function of two components of the THO/TREX complex, namely TEX1 and HPR1, in transcription termination in A. thaliana. We show that the tex1 and hpr1 mutants are defective in RNA termination at the Nopaline Synthase (NOS) terminator present on a T-DNA inserted in the PHO1 locus. Additionally, genome-wide analysis of mRNAs revealed RNAPII termination defects in tex1 and hpr1 mutants at several loci leading to the 3'UTR extensions.

A forward genetic screen using the T-DNA insertion mutant pho1-7 for reversion of the growth phenotype identified the TEX1 gene
The PHO1 gene encodes an inorganic phosphate (Pi) exporter involved in loading Pi into the root xylem for its transfer to the shoot [21][22][23][24]. Consequently, pho1 mutants in both A. thaliana and rice have reduced shoot Pi contents and shows all symptoms associated with Pi deficiency, including highly reduced shoot growth and the expression of numerous genes associated with Pi deficiency [23,25,26]. However, it has previously been shown that low shoot Pi content can be dissociated from its major effects on growth and other responses normally associated with Pi deficiency through the modulation of PHO1 expression or activity [25,27]. We thus used the pho1 mutant as a tool, in a forward genetic screen, to identify mutants which restore pho1 shoot growth to wild type (Col-0) level, while maintaining low shoot Pi contents. We performed ethyl methane sulphonate (EMS) mutagenesis on seeds of the pho1-7 mutant, derived from the SALK line 119520 containing a single T-DNA inserted in the PHO1 gene in between the first and third exon of the PHO1 gene leading to the deletion of the second exon from the genome (S1 Text). Screening of 10'300 mutagenized pho1-7 plants grown in soil for improved rosette growth led to the isolation of a suppressor mutant that had rosette growth similar to Col-0 while maintaining a low shoot Pi content similar to pho1-7 ( Fig  1A and 1B) (see Material and methods for further detail). Both pho1-7 and the suppressor mutant maintained resistance to kanamycin associated with the T-DNA insertion in PHO1.
Mapping-by-sequencing revealed that the mutation C116T is introduced into the TEX1 gene in the pho1-7 suppressor mutant. This leads to a conversion of serine 39 to phenylalanine in the TEX1 protein. Transformation of TEX1 gene into pho1-7 suppressor led to pho1-like phenotype, confirming that mutation in tex1 was the causal mutation for restoration of pho1-7 shoot growth (Fig 1A and 1B). Furthermore, crossing of a T-DNA allele tex1-4 (SALK_100012) to pho1-7 also resulted in the suppression of pho1-7 shoot growth phenotype (Fig 2A), further confirming that mutation in TEX1 is responsible for the suppression of shoot growth phenotype in pho1-7 suppressor mutant. Therefore, we named this new S39P mutant in the TEX1 gene as tex1-6. In agreement with previous reports, TEX1 protein was localized to the nucleus [19] (S1A Fig). TEX1 promoter fusion with GUS showed that TEX1 is expressed in root, cotyledon and rosette (S1B Fig).
The pho1-7 tex1-6 mutant shows Col-0-like shoot growth while maintaining a low Pi content that is only slightly higher to the parental pho1-7 (Fig 1A and 1B). A key molecular response of pho1 mutants is the manifestation of gene expression and lipid profiles in the shoots that are associated with strong Pi deficiency [25]. To determine if the tex1-6 mutation can also suppress the induction of Pi starvation responses (PSR) in the rosettes of pho1-7 tex1-6, we performed quantitative RT-PCR (qRT-PCR) to see the expression of PSR genes. pho1-7 tex1-6 shoots showed an expression profile of PSR genes that was comparable to Pi-sufficient Col-0 plants (S2A Fig). Additionally, lipid analysis in pho1-7 mutants showed a decrease of phospholipids and an increase in galactolipids expected for Pi-deficient plants, while pho1-7 tex1-6 plants showed lipids profiles similar to Col-0 plants (S2B Fig), confirming that the tex1-6 mutation suppressed morphological as well as molecular response to Pi deficiency displayed by the pho1-7 mutant.

Mutation of TEX1 in pho1-7 resulted in the synthesis of a truncated PHO1 protein
To determine if the tex1 mutation also suppresses the growth phenotype associated with other pho1 alleles generated by EMS mutagenesis, a double mutant pho1-4 tex1-4 was generated. Surprisingly, pho1-4 tex1-4 double mutant showed only minor improvement in shoot growth and maintained low shoot Pi content (Fig 2A and 2B). In order to understand how the tex1 mutation can result in restoration of pho1-7 shoot growth, we performed a detailed analysis of transcripts produced at the PHO1 locus in the pho1-7 tex1-4 mutant. Interestingly, we identified a truncated PHO1 Δ249-342 transcript which only lacked the 2 nd exon suggesting that the T-DNA is spliced out from the mature mRNA ( Fig 2C, S1 Text). The mRNA produced is in frame and resulted in the production a truncated PHO1 Δ84-114 protein (S1 Text). Western blot experiments confirmed the presence of a PHO1 Δ84-114 truncated protein in both pho1-7 and pho1-7 tex1-4 roots with a strong increase of expression in the pho1-7 tex1-4 double mutant as compared to pho1-7 ( Fig 2D). This increase in protein quantity can be attributed to an increase in expression of PHO1 Δ249-342 RNA in the pho1-7 tex1-4 double mutant (Fig 2E). We hypothesized that the PHO1 Δ84-114 protein variant was at least partially active as a Pi exporter and that its increased expression in pho1-7 tex1-4 partially restored PHO1 function, resulting in an improvement of the shoot growth phenotype. We confirmed this hypothesis by expressing the PHO1 Δ84-114 variant and the wild type PHO1 fused to GFP using the PHO1 promoter in the

PLOS GENETICS
The THO/TREX complex participates in transcription termination pho1-4 null mutant. As expected, pho1-4 mutant which expressed the wild type PHO1 fully complemented the growth and Pi content to Col-0 level (Fig 3A and 3B). However, plants expressing the PHO1 Δ84-114 variant only restored the shoot growth phenotype while maintaining low Pi contents comparable to pho1-7 tex1-4 plants (Fig 3A and 3B). Confocal analysis of

PLOS GENETICS
The THO/TREX complex participates in transcription termination roots showed that PHO1 Δ84-114 variant protein was localized similarly to wild type PHO1 ( Fig  3C), which was previously shown to be primarily in the Golgi and trans-Golgi network (TGN) [21]. Furthermore, transient expression of the PHO1 Δ83-114 -GFP fusion in Nicotiana benthamiana leaves led to Pi export to the apoplastic space, demonstrating that the protein was competent in Pi export ( Fig 3D) [21]. Collectively, these results confirmed that restored expression of PHO1 Δ249-342 RNA is responsible for the improved shoot growth in pho1-7 tex1-4 double mutants.

PLOS GENETICS
The THO/TREX complex participates in transcription termination Mutation in HPR1, another components of THO/TREX complex, can also restore pho1-7 shoot growth but not mutations affecting tasiRNA biogenesis To investigate if TEX1 exerts its function in the restoration of PHO1 expression via the THO/ TREX complex, we crossed pho1-7 to hpr1-6 which is a mutant in another component of THO/TREX core complex [14]. Double mutant pho1-7 hpr1-6 partially restored shoot growth while maintaining relatively low Pi contents comparable to the pho1-7 tex1-6 but slightly higher than pho1-7 (Fig 1B and Fig 4A and 4B).
THO/TREX complex has previously been demonstrated to participate in the biogenesis of trans acting small interfering RNAs (tasiRNAs) and other small RNAs (siRNAs and miRNAs) that can affect levels of transcription through DNA methylation and some unknown mechanisms. We explored the possibility that changes in the biogenesis of tasiRNAs may be responsible for the growth phenotype associated with the pho1-7 tex1-4 and pho1-7 hpr1-6 mutants. We crossed pho1-7 with two mutants in genes encoding core components required for the biogenesis of tasiRNAs, namely rdr6-11 and sgs3-1 [28,29]. Double mutants pho1-7 rdr6-11 and pho1-7 sgs3-1 had shoot growth similar to the parental pho1-7 ( Fig 4C and 4D). Together, these results indicate disruption in distinct genes of the THO/TREX complex, namely TEX1 and HPR1, can revert the growth phenotype of the pho1-7 mutant and that biogenesis of tasiR-NAs is not implicated in these processes.

Impaired mRNA termination at the NOS terminator restores expression of truncated PHO1 in pho1-7 tex1-4 mutant
To elucidate how mutations in TEX1 and HPR1 lead to changes in transcription at the PHO1 locus, we performed paired-end next generation RNA sequencing of Col-0, pho1-7, pho1-7 tex1-4 and pho1-7 hpr1-6 mutants from roots. We first mapped the RNA reads of Col-0 and pho1-7 against the Col-0 genome and confirmed the absence of the second exon of PHO1 in the genome of pho1-7 (S3A Fig). To understand the transcription dynamics at the PHO1 locus in the various mutants derived from pho1-7, we mapped RNA sequencing reads to the pho1-7 genomic configuration with the T-DNA insertion and exon 2 deletion. Detailed analysis of mRNAs from PHO1 locus in pho1-7 mutants indicated that transcription was initiated in the PHO1 promoter and terminated at two different locations, namely at NOS terminator inside the T-DNA and at the endogenous PHO1 transcription termination site (Fig 5A-5C). Using RT-PCR and various primer combinations, we could detect four types of transcripts in the pho1-7 mutant, namely one unspliced and two spliced mRNA ending at the NOS terminator, and one long transcript ending at the endogenous PHO1 terminator and where PHO1 exons 1 and 3 were appropriately spliced, removing the T-DNA and generating the PHO1 Δ249-342 RNA variant (Fig 5A-5D). While in the pho1-7 mutant the four transcripts were expressed at similar low level, in pho1-7 tex1-4 and pho1-7 hpr1-6 double mutants the majority of transcripts was the PHO1 Δ249-342 RNA variant (Fig 5C and 5D). Analysis by PacBio sequencing of full-length mRNAs produced at the PHO1 locus in Col-0, pho1-7, pho1-7 tex1-4 and pho1-7 hpr1-6 supported these conclusions and highlighted that essentially two classes of transcripts are produced in the various mutants, namely transcripts that include the 5'portion of the T-DNA and end at the NOS terminator and transcripts that end at the PHO1 terminator and exclude the complete T-DNA (S3B Fig). While in the pho1-7 mutant the majority of transcripts were of the first type, the pho1-7 tex1-4 and pho1-7 hpr1-6 mutants mostly expressed the second type. Such pattern of transcripts are not consistent with alternative splicing but rather indicate that transcription termination at the NOS terminator was suppressed in pho1-7 tex1-4 and pho1-7 hpr1-6 mutants, and this enabled the transcription machinery to reach the PHO1 terminator and generate a transcript where exon 1 was spliced to exon 3, resulting in the removal of the T-DNA. Chromatin immunoprecipitation using an antibody against the elongating RNAPII (phosphorylated at S2 of the C-terminal domain) followed by qPCR showed that RNAPII

PLOS GENETICS
The THO/TREX complex participates in transcription termination

PLOS GENETICS
The THO/TREX complex participates in transcription termination occupation at the PHO1 locus situated after the T-DNA insertion was significantly reduced in pho1-7 mutants but increased in pho1-7 tex1-4 mutant (Fig 5E), consistent with an increase in transcriptional read-through past the T-DNA in the pho1-7 tex1-4 double mutant (Fig 5D).

THO/TREX complex is required for the correct termination of mRNAs in endogenous loci
To understand if TEX1 and HPR1 contribute to the termination of RNA transcription at endogenous genes, RNA sequencing data generated from roots of Col-0, pho1-7, pho1-7 tex1-4 and pho1-7 hpr1-6 grown in soil for 3 weeks were analyzed for the presence of 3' UTR extensions. We observed significant changes in 3'UTR extensions in pho1-7 tex1-4 and pho1-7 hpr1-6 mutants as compared to both Col-0 and pho1-7. Two examples of such 3'UTR extensions are shown in Fig 6A. While 3'UTR extensions were observed in only 3 transcripts in pho1-7, 72 and 51 transcripts showed 3'UTR extensions in the pho1-7 tex1-4 and pho1-7 hpr1-6 mutants, respectively, but not in the pho1-7 parent, with a subset of 38 transcripts found in common between pho1-7 tex1-4 and pho1-7 hpr1-6 but not pho1-7 ( Fig 6B). These results indicate that while a large proportion of genes affected in their 3'UTR extension in the tex1-4 respond similarly in the hpr1-6, the effects of these two mutations on RNA termination are not completely redundant.
We hypothesized that if regulation of RNA termination by THO/TREX complex is generic and robust, changes in 3'UTR extensions should be relatively insensitive to growth conditions. To assess the robustness of 3'UTR extensions, we analyzed an independent RNA sequencing dataset generated from roots of Col-0 and tex1-4 mutant grown in vitro for 7 days in MS medium supplemented with sucrose. A total of 77 transcripts showed 3'UTR extensions in the tex1-4 mutant relative to Col-0, with 48 transcripts found also in the dataset of 3'UTR extensions for pho1-7 tex1-4 mutant grown for 3 weeks in soil (S4 Fig), indicating that a large proportion of transcripts with 3'UTR extensions observed in the tex1-4 genetic background are insensitive to major changes in growth conditions. Validation of 3'UTR extensions in a set of genes identified by RNA sequencing was first performed by qRT-PCR (Fig 6C). A transient assay was also developed whereby the sequence 500 bp downstream of the stop codon (which includes the 3'UTR) of two genes, AT1G76560 and AT1G03160, was fused after the stop codon of the nano-luciferase gene. These constructs were expressed in Arabidopsis mesophyll protoplasts and the ratio of transcripts with an extended 3'end relative to the main transcription termination site was determined by qRT-PCR 16 hours after transformation. Analysis revealed an increase, for both constructs, in the ratio of long-to-short transcripts by approximately 50-60% in the hpr1-6 and tex1-4 mutants compared to Col-0 (Fig 6D), further supporting the implication of both TEX1 and HPR1 in mRNA termination.
GO term enrichment analysis of transcripts with impaired RNA termination in the tex1-4, pho1-7 tex1-4 and pho1-7 hpr1-6 showed that these transcripts did not belong to a particular functional category (S5 Fig). Therefore, we looked at the sequences of RNA termination sites for mechanistic clues of impaired RNA termination and 3'UTR extension in tex1 and hpr1 mutants. RNA termination sites have defined characteristic motifs, including an A-rich region at approximately -20 nucleotides (position -1 being defined as the last nucleotide before the polyA tail), which includes the AAUAAA-like sequence. This is followed by a U-rich region at -7 nucleotide and a second peak of U-rich region at approximately +25 nucleotides [30][31][32]. Analysis of the distribution of nucleotides upstream and downstream of the 3' cleavage site did not reveal a significant difference from this distribution for genes showing a 3'extension in the tex1-4 and hpr1-6 mutant backgrounds (Fig 7A). Additionally, we analyzed the differences at

PLOS GENETICS
The THO/TREX complex participates in transcription termination -20 polyadenylation signal for the genes with 3' extensions compared to all Arabidopsis genes. Although not statistically significant (p = 0.22), a lower representation of the canonical AAUAAA sequence appeared associated with the group of genes with 3' extension compared to all genes (Fig 7B). We used the transient expression to test the effect of changing the single AAUGAA polyadenylation signal present in gene AT1G76560 to the canonical AAUAAA. While the optimized polyadenylation signal led to a small decrease in 3'UTR extensions relative to the wild type sequence in Col-0, there was still an approximately 50% increase in the ratio of long-to-short transcripts in the hpr1-6 and tex1-4 mutant compared to Col-0 ( Fig 7C). Altogether, these results indicate that while HPR1 and TEX4 contribute to mRNA termination, they do not appear to act primarily via the nature of the -20 polyadenylation signal.

Discussion
The contribution of the THO/TREX complex to mRNA biogenesis has been particularly studied for splicing and export [9]. In contrast, the direct implication of the THO/TREX complex in mRNA termination is poorly defined and reported only in few studies in human cells [33,34]. The THO5 was shown to interact with both CPSF100 and CFIm, two proteins involved in 3'end-processing and polyadenylation site choice, and differences in mRNA polyadenylation

PLOS GENETICS
The THO/TREX complex participates in transcription termination were observed in human cells depleted for THO5 [33,34]. Recruitment of the cyclin-dependent kinase CDK11 by the THO/TREX complex was shown to be essential for the phosphorylation of RNAPII and the proper 3'end processing of the human immunodeficiency virus RNA, although it is unknown if this interaction is also mediated via THO5 [35].
Although considerably less is known about mRNA biogenesis in plants compared to yeast and metazoans, proteins forming the THO core complex are also found in plants, implicating a conservation in their mode of action [13]. Indeed, A. thaliana mutants in the HPR1 and TEX1 genes show defects in mRNA splicing and export [18,19,36]. However, most of our knowledge in plants on THO core complex relates to its implication in small RNA biogenesis. A. thaliana mutants in either HPR1, TEX1, THO2 or THO6 are defective in the synthesis of one or multiple forms of small RNAs, including miRNA, siRNA, and tasiRNAs, although the mode of action behind this defect is currently unknown [14][15][16][17]. Some of the phenotypes associated with the tex1 and hpr1 mutants in A. thaliana, such as repression of female germline specification or reduction in scopolin biosynthesis under abiotic stress, are likely caused by a reduction in tasiRNA or miRNA production [37,38].

PLOS GENETICS
The THO/TREX complex participates in transcription termination This current work highlights the contribution of both TEX1 and HPR1 to mRNA termination. Mutations in these genes in a pho1-7 mutant background having a T-DNA insertion between the PHO1 exons 1 and 3 led to the suppression of mRNA termination at the NOS terminator present in the T-DNA, followed by transcriptional read-through and splicing of the T-DNA, resulting in the generation of a near full-length PHO1 mRNA (minus exon 2). The truncated PHO1 protein generated from this mRNA maintained some Pi export activity, resulting in a reversion of the growth phenotype associated with the severe Pi deficiency of the pho1-7 mutant.
Beyond its effect on the NOS terminator, mutations in the HPR1 and TEX1 genes also affected mRNA 3' processing of endogenous loci leading to 3' UTR extensions. Interestingly, the majority of loci with impaired transcription termination were shared between tex1 and hpr1 mutants, suggesting that both proteins have overlapping functions in RNA termination. Analysis of nucleotides surrounding of the 3' cleavage site did not reveal a significant difference for genes showing a 3'extension in the tex1-4 and hpr1-6 mutants compared to the Arabidopsis genome. The best defined DNA sequence involved in the control of mRNA polyadenylation site is a A-rich region at approximately -20 nucleotides defined as the near upstream element (NUE). Although the NUE canonical AAUAAA sequence is found very frequently in animal genomes, the heterogeneity in NUE sequences is larger in plants [30,39]. An apparent deviation (but not statistically significant) from the AAUAAA was observed in genes showing a 3'UTR extension in the hpr1-6 and tex1-4. Furthermore, optimization of the polyadenylation site of the gene AT1G76560 from AAGAAA to AAUAAA did not affect the extent of 3'UTR extension in the tex1-4 and hpr1-6 mutants compared to Col-0. It is likely that the relatively low number of genes showing 3'UTR extensions significantly limits our ability to identify nucleotide features that are important in 3'UTR extension in the tex1-4 and hpr1-6 mutants through a bioinformatic approach. A more systematic analysis of the 3'UTR of the genes affected in the tex1-4 and hpr1-6 mutants, such as AT1G76560 and AT1G03160, using the transient assay described in this study may lead to the identification of the cis-elements involved.
Analysis of the A. tumefaciens NOS transcript revealed two putative NUE sequences, namely AAUAAA and AAUAAU, at position -135 and -50 nucleotides, which is much further upstream than the usual -20 nucleotides [40]. It is thus likely that other sequences within the NOS terminator play a determinant role in RNA transcription termination. Furthermore, while the prominent dinucleotide located immediately before the cleavage site is typically CA or UA, the dinucleotide GA is present in the NOS terminator [40]. While no detailed functional analysis of the polyadenylation signal of the NOS gene has been reported, it is likely that its unusual structure is related to the bacterial origin of the gene. While the NOS terminator is a common feature of many T-DNA vectors, several studies have shown that transgene expression can be considerably enhanced either when the NOS terminator is combined with a second terminator or when it is replaced by the terminator of plant endogenous genes [41][42][43]. For example, replacement of the NOS terminator with an extensin terminator was shown to reduce read-through transcript and improve expression of transgenes [44]. Altogether, these data suggest that mutations in the HPR1 and TEX1 genes may more prominently affect mRNA 3'processing for genes having unusual or weak polyadenylation signals, such as found in the NOS terminator.
The 3'UTR of mRNAs have important regulatory functions, impacting mRNA stability, localization and translation potential via interaction with numerous RNA binding proteins as well as miRNAs [45]. Extension of the 3'UTRs of genes in the hpr1-6 and tex1-4 genetic background may thus impact their expression in numerous ways. In some cases, extension of 3'UTR may also lead to disruption of the downstream gene by the formation of an antisense RNA with potential to trigger siRNA-mediated gene silencing, or by transcriptional interference via RNAPII collision [46]. The tex1 and hpr1 mutant are known to have multiple phenotypes, ranging from defects in vegetative and reproductive development [18,38], responses to both biotic and abiotic stress [36,37] and the expression of genes encoding acid phosphatases [16] or ethylene signaling pathway repressor [20]. It would be of interest to determine if the genes affected by 3'UTR extensions contribute to some extent to these phenotypes.
Further work is necessary to gain detailed mechanistic insights as to how mutations in tex1 and hpr1 lead to both suppression of termination at the NOS terminator and changes in the 3'UTR of endogenous genes. Being part of the TREX complex associated with the mRNA transcription machinery, TEX1 and HPR1 could affect mRNA transcription termination through interactions with proteins more specifically involved in 3'end processing. This would be analogous to the implication of the THOC5 protein, another component of the TREX complex, in mRNA 3'-end processing in mammals via interactions with mRNA cleavage factors, including CPSF100 [33,34]. Since TEX1 and HPR1 have both been implicated in the generation of small RNAs, including tasiRNA, siRNA and miRNA, the contribution of these pathway to mRNA termination should also be further examined. Reversion of the pho1-7 growth phenotype could not be reproduced by mutations in the SGS3 and RDR6 genes involved in small RNA biogenesis, in particular of tasiRNAs, indicating that the effects of the hpr1 and tex1 mutations in pho1-7 could not be caused by changes in tasiRNAs generation [14,15,17,18]. siRNA is associated with DNA methylation, which in turn could impact RNA polymerase activity and mRNA processing, including splicing and termination [47,48]. Although the T-DNA present in the pho1-7 mutant is both well transcribed and mediates resistance to kanamycin, suggesting that it is unlikely to be strongly methylated, more subtle effects of siRNA-mediated methylation on RNA transcription termination at the NOS terminator and endogenous loci should be explored.
The N-terminal half of the PHO1 protein harbors a SPX domain which is involved in binding inositol polyphosphate at high affinity via interactions with conserved tyrosine and lysine residues [49,50]. A PHO1 protein with mutations in these SPX conserved amino acids is unable to complement the pho1-2 null mutant, indicating that the binding of inositol polyphosphate is important for PHO1 activity in plantae [50]. The protein PHO1 Δ84-114 synthesized from the pho1-7 mutant does not affect the core of the SPX domain but only leads to a small deletion at the N-terminal end of the second SPX subdomain (see S1 Text). Heterologous expression of the PHO1 Δ84-114 protein in tobacco leaves led to specific Pi export activity, indicating that the 31 amino acid deletion does not completely inactivates the protein. Considering that pho1-7 tex1-4 and pho1-7 hpr1-6 as well as the pho1-4 null mutant expressing the PHO1 Δ84-114 protein have low shoot Pi, is thus likely that the PHO1 Δ84-114 retains some Pi export activity in root xylem parenchyma cells, but lower than the wild type protein. The high level of expression of the PHO1 Δ84-114 protein in the pho1-7 tex1-4 mutant cannot be simply explained by a higher expression of the truncated PHO1 Δ249-342 mRNA in the double mutant background relative the pho1-7 parent, since the PHO1 Δ249-342 mRNA remained lower than the full length PHO1 mRNA in Col-0 plants. PHO1 is known to be degraded by PHO2, a key protein involved in the Pi-deficiency signaling pathway [51]. Whether the high level of PHO1 Δ84-114 accumulation is a reflection of greater stability of the truncated protein and/or increased translation efficiency of the truncated mRNA remains to be determined.
Uncoupling low leaf Pi from its main effect on shoot growth has previously been reported for plants under-expressing PHO1 via silencing, indicating a role for high root Pi content and PHO1 in modulating the response of the shoot to Pi deficiency [25]. The improved shoot growth observed in the pho1-7 tex1-4 and pho1-7 hpr1-6 compared to the parent pho1-7 is likely due to the increased expression of the PHO1 Δ84-114 hypomorphic protein and an increase in Pi export activity. However, both hpr1 and tex1 mutants have low expression of the RTE1 gene involved in the repression of the ethylene signaling pathway which has been linked to an increase in root-associated acid phosphatase activity and root hair elongation in these mutants, two characteristics that can positively impact Pi acquisition [16,20]. It is thus possible that a small part of the growth improvement observed in the tex1-4 pho1-7 and hpr1-6 pho1-7 may also be associated with the repression of the ethylene pathway.

Plant material and growth conditions
All Arabidopsis plants used in this study, including mutants and transgenic plants, were in the Columbia (Col-0) background. For in vitro experiments, plants were grown in half-strength Murashige and Skoog (MS) salts (Duchefa M0255) containing 1% (w/v) sucrose and 0.7% (w/ v) agar. For Pi-deficient medium MS salts without Pi (Caisson Labs, MSP11) and purified agar (Conda, 1806) was used. Pi buffer pH5.7 (93.5% KH 2 PO 4 and 6.5% K 2 HPO 4 ) was added to obtain different Pi concentrations. In the Pi-deficient media, Pi buffer was replaced by equimolar amounts of KCl. Plants were also grown either in soil or in a clay-based substrate (Seramis) supplemented with half-strength MS for the isolation of roots. Growth chamber conditions were 22˚C, 60% humidity, and a 16h light/8h dark photoperiod with 100 μE/m2 per sec of white light for long days and 10h light/14h dark for short days. The pho1-2, pho1-4, pho1-7 (SALK_119529) and tex1-4 were previously described [15,22,52] and hpr1-6 (SAIL_1209_F10) is a T-DNA mutant from SAIL collection. The rdr6-11 is an EMS-derived mutant while sgs3-1 is a T-DNA mutant from the SALK collection (SALK_001394) and have both been previously described [28].

pho1 suppressor screen
Ethyl methanesulfonate (EMS) mutagenesis was performed on approximately 20,000 seeds of homozygous pho1-7. Seeds were treated with 0.2% (v/v) EMS in 100mM phosphate buffer for 8 hours and were rinsed with water 10 times afterwards. Approximately 10, 000 individual M1 plants were grown and seeds of their progeny were collected in bulk. Approximately 10'300 M2 plants were grown in soil for 4 weeks under an 16h/8h day/night light cycle. Plants showing an increased rosettes size relative to the pho1-7 parent were identified and their seeds collected. In the next generation, plants retaining 100% kanamycin resistance mediated by the T-DNA in PHO1 were sown in soil and the Pi content in 3-week-old rosettes was determined using the molybdate assay [53] as previously described [27]. Plants showing the combination of increased rosettes size with low shoot Pi similar to pho1-7 were then genotyped by PCR to further confirm the presence of the T-DNA in the PHO1 locus in an homozygous state.

Identification of mutant genes by next generation sequencing
pho1-7 suppressor mutant was back-crossed to the parent pho1-7 line to test if the mutation was dominant or recessive and to generate an isogenic mapping population. Approximately 40 plants with a pho1-7 suppressor phenotype were selected from the segregating F2 population. DNA was extracted from the pool of these 40 mutants and sequencing was performed using Illumina Hiseq 2000 system. DNA sequencing yielded an average coverage of 80 to 100 nucleotides per nucleotide per sample. Sequencing reads were mapped with Burrows-Wheeler Aligner software (BWA) version 0.5.9-r16 using default parameters to the TAIR10 release of the A. thaliana genome. Using SAM (Sequence Alignment/Map) tool the alignments were converted to BAM format. SNPs were called with the Unified Genotyper tool of the Genome Analysis Toolkit (GATK) version v1.4-24-g6ec686b. SNPs present in the parental line pho1-7 were filtered out using BEDTools utilities version v2.14.2. TAIR10_GFF3_genes_transposons. gff file was used to filter out the SNPs present in the transposons. The predicted effect of the remaining SNPs in coding regions was assessed with SNPEff version 2.0.4 RC1. Unix command awk was used to extract the SNP frequencies (the number of reads supporting a given SNP over the total number of reads covering the SNP location) and were plotted with R 2.15.1.

Cloning and transgenic lines
For complementation of pho1-7 suppressor with TEX1, genomic sequence including 2 kb promoter, 5'UTR and 3'UTR was amplified using primers TEX1-gen-F -and TEX1-gen-R (sequences of all primers used are listed in S1 Table). The amplicon was cloned into pENTR/D TOPO vector (Invitrogen). The entry vector was then shuttled into the binary vector pMDC99 [54] using Gateway technology (Invitrogen). For pTEX1:TEX1:GFP fusion, TEX1 promoter and gene was amplified using primers TEX1-gen-F and TEX1-gen-R-w/o-stop. Reverse primer was designed to remove the stop codon from TEX1 gene. The amplicon was cloned into pENTR/D TOPO vector (Invitrogen). The entry vector was then shuttled into the binary vector pMDC107, which contains GFP at the C-terminal [54]. For TEX1 promoter GUS fusion, promoter was amplified using the primers TEX1-Pro-infu-LP and TEX1-Pro-infu-RP. The amplicon was cloned into the gateway entry vector pE2B using Infusion technology (Clonetech). The entry vector was then shuttled into the binary vector pMDC63 [54] using Gateway technology (Invitrogen). For cloning pPHO1:gPHO1 Δ83-114 :GFP, promoter and first exon was amplified using primers P2BJ pPHO1 1exon L and P2BJ pPHO1 1exon R (fragment 1), PHO1 gene from third exon until before the stop codon was amplified using primers P2BJ PHO1gene 3rd exon F and P2BJ PHO1gene 3rd exon R (fragment 2). The two fragments were combined together in the Gateway entry vector pE2B using Infusion technology (Clonetech). The entry vector was then shuttled into the binary vector pMDC107 [54] using Gateway technology (Invitrogen). All the binary vectors were transformed into Agrobacterium tumefaciens strain GV3101 and plants were transformed using flower dip method [55].
For analysis of transcript termination by transient expression in Arabidopsis protoplasts, the 500 nucleotides located immediately after the stop codon of the genes AT1G76560 and AT1G03160 were fused after the stop codon of the nano-luciferase (nLUC) gene. To achieve this, the 500 bp 3'sequences from AT1G76560 (wild-type and mutated) and AT1G03160 flanked by attR1 and attR2 sites were synthesized and inserted in the pUC57 plasmid by Genscript. The DNA insert was then shuttled by Gateway cloning into the dual-luciferase vector nLucFlucGW (Genbank MH552885) [56] modified to lack the original nLuc 3'UTR and terminator sequences. The final constructs had the hybrid genes expressed under the control of the ubiquitin promoter, as well as the firefly luciferase gene (Fluc) constitutively expressed, used for loading control.

Quantitative RT-PCR, phosphate measurement, lipid profile and Pi export assay
Quantitative RT-PCR and Pi export assay was performed as previously described [27]. For transient expression of PHO1, Nicotiana benthamiana tobacco plants were infiltrated with A. tumefaciens as previously described [21]. Pi measurements were performed using the molybdate-ascorbic acid method [53]. Analysis of leaf diacylglycerides was performed by the Kansas Lipidomics Research Center (www.k-state.edu/lipid/lipidomics/)

Confocal microscopy
Seedlings were incubated for 10 min in a solution of 15 mM Propidium Iodide (Sigma, P4170) and rinsed twice with water. Excitation and detection window for GFP was set at 488 nm for excitation and 490-555 nm for detection. Propidium iodide was excited at 555 nm and detected at 600-700nm. All experiments were performed using Zeiss LSM 700 confocal microscope.

Western blot analysis
Plants were grown on clay-based substrate (Seramis) supplemented with half-strength MS liquid medium. Proteins were extracted from homogenized 25-day-old roots at 4˚C in extraction buffer containing 10 mM phosphate buffer pH 7.4, 300 mM sucrose, 150 mM NaCl, 5 mM EDTA, 5 mM EGTA, 1 mM DTT, 20 mM NaF and 1× protease inhibitor (Roche EDTA free complete mini tablet), and sonicated for 10 min in an ice-cold water bath. Fifty micrograms of protein were separated on an SDS-PAGE and transferred to an Amersham Hybond-P PVDF membrane (GE healthcare). The rabbit polyclonal antibody to PHO1 [51] and goat anti-rabbit IgG-HRP (Santa Cruz Biotechnology) was used along with the Western Bright Sirius HRP substrate (Advansta). Signal intensity was measured using a GE healthcare ImageQuant RT ECL Imager.

Illumina RNA-sequencing data analysis
RNA was extracted from roots of plants grown for 3 weeks in pots containing clay-based substrate (Seramis) or for 7 days on vertical agar plates containing half-strength MS media with 1% (w/v) sucrose. Strand-specific libraries were prepared using the TruSeq Stranded Total RNA kit (Illumina). PolyA + RNAs were selected according to manufacturer's instructions and the cDNA libraries were sequenced on a HiSeq 2500 Illumina sequencer. The reads were mapped against TAIR10.31 reference genome using Hisat2 [57] and the readcount for each gene was determined using HTSeqcount [58]. Readcounts were normalized using DESeq2 [59]. Figures showing read density from RNAseq data were generated using Integrative genomics viewer (IGV) [60].

Analysis of full-length mRNA using PacBio sequencing
One μg of total RNA was used to generate cDNA with the SMARTer PCR cDNA Synthesis kit (Clontech). Fifty μl of cDNA were amplified by 13 PCR cycles with the Kapa HiFi PCR kit (Kapa Biosystems) followed by size selection from 1.5kb to 3.5kb with a BluePippin system (Sage Science). Seventy ng of the size selected fragment were further amplified with Kapa HiFi PCR kit for 5 cycles and 2 minutes extension time and 750 ng was used to prepare a SMRTbell library with the PacBio SMRTbell Template Prep Kit 1 (Pacific Biosciences) according to the manufacturer's recommendations. The resulting library was sequenced with P4/C2 chemistry and MagBeads on a PacBio RSII system (Pacific Biosciences) at 240 min movie length using one SMRT cell v2. Bioinformatics analysis were performed through SMRT Analysis Server v2.3.0. using RS_IsoSeq.1 workflow and TAIR10.31 as reference genome.

Identification of 3'UTR extensions
3' UTR extensions were identified following a procedure adapted from Sun et al. 2017 [8]. Briefly, reads obtained by single or paired-end polyA+ RNAseq were mapped with Hisat2 [57] against the intergenic regions extracted from TAIR10.31 annotation. Each intergenic region was divided into 10 nucleotide bins and the normalized readcount was determined for each bin with HTSeq-count [58] and DESeq2 [59]. 3' extensions were then contiguously assembled from the 5' end of intergenic intervals until a bin had a normalized readcount < 1. Only extensions longer than 200 nucleotides were kept for further analyses.
The number of reads mapping each TAIR10.31 gene and newly identified 3' extensions was determined with HTSeq-count [58]. Differential expression analysis was performed with DESeq2 [59] to identify extensions significantly up-or down-regulated independently of the expression level of the TAIR10.31 annotated gene body, comparing different genotypes. An extension was considered significantly differentially expressed if the adjusted pvalue corrected for false discovery rate was < 0.1 and the fold change of the ratios normalized readcount 3' extension / normalized readcount gene body between 2 genotypes was > 2.
To analyze the polyadenylation signal present in genes with and without 3'UTR extensions, the frequency of each nucleotide at the polyadenylation consensus sequence AAUAAA was calculated for each gene and a Chi square test was used to test for statistical significance.

Chromatin immunoprecipitation analyzed by qPCR
Leaves from 3-week-old A. thaliana seedlings from different genotypes were harvested and immediately incubated in 37 ml of pre-chilled fixation buffer (1% formaldehyde in 0.4 M sucrose, 10 mM Tris pH 8, 1 mM EDTA, 1 mM PMSF, 0.05% Triton X-100) for 10 min under vacuum. 2.5 ml of Glycine (2.5 M) was added and samples were incubated for 5 additional min under vacuum, rinsed 3 times with water and frozen in liquid nitrogen. Frozen samples were ground and the powder resuspended in 30 ml of extraction buffer I (0.4 M sucrose, 10 mM HEPES pH 8, 5 mM ß-mercaptoethanol, 0.1 g/ml 4-(2-aminoethyl) benzenesulfonyl fluoride hydrochloride (AEBSF). After 20 min incubation at 4˚C, the mixture was filtered through Miracloth and centrifuged for 20 min at 3000 g at 4˚C. The pellet was resuspended in 300 μl of extraction buffer III (1.7 M sucrose, 10 mM HEPES pH 8, 0.15% Triton X-100, 2 mM MgCl 2 , 5 mM 5 mM ß-mercaptoethanol, 0.1 g/ml AEBSF), loaded on top of a layer of 300 μl of extraction buffer III and centrifuge for 1h at 16000 g at 4˚C. The pellet was resuspended in 300 μl of Nuclei Lysis Buffer (50 mM HEPES pH 8, 10 mM EDTA, 1% SDS, 0.1 g/ml AEBSF) and incubated on ice for 30 min.
Chromatin solution was centrifuged twice for 10 min at 14000 g, 4˚C and incubated overnight with S2P-RNApolII specific antibodies. The mixture was then incubated with Protein A beads for 3h at 4˚C. After washing, immune complexes were eluted twice with 50 μl of Elution Buffer (1% SDS, 0.1 M NaHCO3). To reverse crosslinking, 4 μl of a 5 M NaCl solution was added to 100 μl of eluate and the mixture incubated overnight at 65˚C. Two μl of 0.5 M EDTA, 1.5 μl of 3 M Tris-HCl pH 6.8 and 20 μg of proteinase K were then added and the mixture incubated for 3h at 45˚C. DNA was then extracted using the NucleoSpin kit from Macherey Nagel. DNA samples were diluted 10 times and 2 μl were used for quantification by qPCR using Master Mix SYBR Select (Applied Biosystems).

Transient expression in Arabidopsis protoplasts
Arabidopsis protoplasts were produced and transformed as previously described [61]. In brief, wild type Col-0, as well as hrp1-6 and tex1-4 mutant plants were grown in long photoperiod (16 h light and 8 h dark at 21 0 C) for 4-5 weeks and leaves were cut with razor blades to produce 0.5-1 mm leaf strips. These were submerged in enzyme solution (1% cellulase, 0.25% macerozyme, 0.4 M mannitol, 20 mM KCl, 20 mM MES and 10 mM CaCl2), vacuum infiltrated and incubated at room temperature for 2 h. Protoplasts were harvested by centrifugation at 100 g for 3 min, washed with W5 solution (154 mM NaCl, 125 mM CaCl2, 5 mM KCl and 2 mM MES) and resuspended in MMG solution (4 mM MES, pH 5.7, 0.4 M mannitol and 15 mM MgCl2) at 1x10 6 protoplast/ml. Protoplast transformation was performed by combin-ing~1.5 x10 5 protoplasts, 8μg of plasmid, and PEG solution (40% PEG4000, 0.2 M mannitol and 100 mM CaCl2). After replacing PEG solution with W5 solution by consecutive washings, protoplasts were kept in the dark for approximately 16 hours at 21˚C. Transformed protoplasts were harvested by centrifugation at 6000 g for 1 min, and resuspended in 1X Passive Lysis buffer (Promega, E1941). The lysate was cleared by centrifugation and RNA was extracted using RNA purification kit as described by the manufacture (Jena Bioscience, PP-210), followed by DNase I treatment. cDNA was synthesized from 0.1 μg RNA using M-MLV Reverse Transcriptase (Promega, M3681) and oligo d(T)15 as primer following the manufacturer's instructions. qPCR analysis was performed using SYBR select Master Mix (Applied Biosystems, 4472908) with primer pairs specific to transcripts of interest and firefly luciferase mRNA, used for data normalization. Long/short transcript ratio was calculated with the following formula: ΔCT