Direct targets of pSTAT5 signalling in erythropoiesis

Erythropoietin (EPO) acts through the dimeric erythropoietin receptor to stimulate proliferation, survival, differentiation and enucleation of erythroid progenitor cells. We undertook two complimentary approaches to find EPO-dependent pSTAT5 target genes in murine erythroid cells: RNA-seq of newly transcribed (4sU-labelled) RNA, and ChIP-seq for pSTAT5 30 minutes after EPO stimulation. We found 302 pSTAT5-occupied sites: ~15% of these reside in promoters while the rest reside within intronic enhancers or intergenic regions, some >100kb from the nearest TSS. The majority of pSTAT5 peaks contain a central palindromic GAS element, TTCYXRGAA. There was significant enrichment for GATA motifs and CACCC-box motifs within the neighbourhood of pSTAT5-bound peaks, and GATA1 and/or KLF1 co-occupancy at many sites. Using 4sU-RNA-seq we determined the EPO-induced transcriptome and validated differentially expressed genes using dynamic CAGE data and qRT-PCR. We identified known direct pSTAT5 target genes such as Bcl2l1, Pim1 and Cish, and many new targets likely to be involved in driving erythroid cell differentiation including those involved in mRNA splicing (Rbm25), epigenetic regulation (Suv420h2), and EpoR turnover (Clint1/EpsinR). Some of these new EpoR-JAK2-pSTAT5 target genes could be used as biomarkers for monitoring disease activity in polycythaemia vera, and for monitoring responses to JAK inhibitors.


Introduction
EPO is produced by specialised cells in the kidney in response to reduced oxygen tension. It binds to the erythropoietin receptor (EpoR) expressed on erythroid progenitor cells to induce proliferation, differentiation, survival and enucleation. Thus, an EPO-EpoR feedback loop reestablishes physiologically appropriate numbers of circulating mature red blood cells in response to need. Exogenous use of EPO, inherited mutations in the EpoR [1], and acquired gain-of-function mutations in JAK2 [2,3], induce supra-physiological production of red blood a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 cells. Conversely, loss-of-function experiments have shown both EPO and EpoR are essential for the proliferation and survival of erythroid cells [4,5].
There are differing models of EPO-EpoR signalling. One model suggests EPO induces dimerization of the EpoR and this is essential for signaling [6]. Others suggest EpoR exists as pre-assembled homodimer complexed with JAK2 bound to the box1-2 motif in the cytoplasmic membrane-proximal domain of each monomer. Binding of EPO causes a conformational twist in the orientation of the two EpoR monomers such that cross-inhibition of JAK2 kinase domains (via the pseudo-kinase domains) is released, resulting in JAK2 phosphorylation [7]. This mechanism of signal propagation applies to all type 1 dimeric cytokine receptors [8]. Activating mutations in the pseudo-kinase domain of JAK2, such as JAK2V617F, lead to ligand-independent phosphorylation of JAK2 and downstream activation of signalling pathways in patients with polycythemia vera (PV). STAT5 is recruited to pY343 of the EpoR via its SH2 domain [9], where it is phosphorylated by JAK2. This results in dimerization, translocation to the nucleus, and DNA binding to palindromic gamma-activated sequences or 'GAS 0 motifs (TTCYXRGAA) in promoters and enhancers; it binds as a dimer [10] or a tetramer [11]. Thus, EPO induces changes in gene expression through the JAK2-pSTAT5 pathway. There are duplicated STAT5 genes, Stat5a and Stat5b, which have 95% amino acid sequence identity [12]. They play specific roles in responses to various hormones in non-erythroid cells [13,14], but they play redundant roles in erythropoiesis; i.e. Stat5a-/-Stat5b-/-mice die in utero from anemia [15]. The phenotype is like Epo, EpoR and Jak2 gene knockout mice [4,5].
Activated pJAK2 phosphorylates additional tyrosine (pY) residues in the cytoplasmic tail of the EpoR which leads to engagement of other SH2 domain-containing cytoplasmic signalling molecules. For example, pY429/Y431 and pY479 bind the p85 subunit of PI3 kinase leading to subsequent engagement of p110 and phosphorylation of downstream transcription factors (TFs). Gain-of-function mutations in p85 lead to constitutive EpoR activity whereas p85 knockout mice display anemia [16]. The raf-MAPK and LYN kinase pathways are also activated in erythroid cells by EPO [17]. A truncated EpoR containing the binding site for pSTAT5 (Y343), but missing C-terminal Y residues, is sufficient to rescue responses to anemic stress in vivo [9], suggesting STAT5 engagement is critical.
The Bcl2l1 gene is a known direct target of pSTAT5 in erythroid cells [15], and it is required for a pro-survival signal in response to EPO [18]. It has a long second intron containing enhancers, some of which have been shown to bind STAT1 or STAT5 [15,19], and some of which bind other erythroid TFs such as GATA1 and KLF1 [20,21]. Furthermore, Bcl2l1 expression is dependent on both GATA1 and KLF1 [22,23]. The Bcl2l1 gene undergoes dynamic alternative splicing during erythroid differentiation leading to the mutually exclusive production of short (Bcl-x S ) and long (Bcl-x L ) protein isoforms. The long isoform is a pro-survival factor while the short isoform is pro-apoptotic [24].
Other well-studied targets of EpoR signalling include members of suppressor-of-cytokinesignalling (SOCS) gene family, particularly Socs3 and Cish [25,26]. The protein products of these genes are responsible for rapid down-regulation of the EPO-EpoR complex via engagement of ubiquitin ligase pathways, receptor internalisation and its degradation in the proteasome and lysosome pathways [27]. Thus, SOCS proteins rapidly dampen EPO-induced signals. There are likely many other direct targets of pSTAT5 in erythroid cells but positive identification has been hampered by the lack of ChIP-seq datasets. We have undertaken the first ChIP-seq for pSTAT5 in erythroid cells in response to EPO and found 302 robust sites of genome occupancy. While some pSTAT5 is bound at promoters, the majority occupies enhancers, often in concert with GATA1 and KLF1. Those sites not bound by these TFs tend to be bound by STAT5 in other cell types suggesting generic targets and functions for STAT5 in many cells. We used 4sU-RNA-labelling to determine rapidly induced genes and also examined the dynamics of gene induction in response to EPO using qRT-PCR and published dynamic CAGE data [28]. We found expected direct targets of EpoR-JAK2-pSTAT5 signalling such as Cish and Bcl2l1, but also several novel target genes with likely roles in EpoR turnover, chromatin compaction, alternative splicing and other processes critical to terminal erythroid differentiation. This work provides new insights into how EPO works and provides a list of possible biomarkers to monitor disease activity in PV.

Direct targets of pSTAT5 in erythroid cells
The murine erythroid cell line, J2E, expresses~1000 copies of the EpoR on the cell surface and undergoes terminal erythroid differentiation in response to EPO. J2E have been employed by the FANTOM5 consortium for high resolution mapping of dynamic changes in CAGE tags at erythroid promoters and enhancers in response EPO [28]. They are immortalised at the proerythroblast stage like MEL cells, G1-ER cells [23], and K1-ER cells [29]. Thus, data sets such as RNA-seq, histone ChIP-seq and TF ChIP-seq generated in these cells can be employed to interrogate the genomic landscape of J2E.
After EPO stimulation (10U/ml) for 30 min (S1A Fig), pSTAT5 was robustly induced in the nucleus (S1B Fig). We validated EPO-induced STAT5 enhancer occupancy using two predicted target genes: Bcl2l1 and Abcg2 [30] (see Methods). A pSTAT5 antibody which recognises both pSTAT5a and pSTAT5b (see Methods) was best able to enrich for DNA at the Bcl2l1 enhancer (S1C and S1D Fig). We also detected EPO-dependent pSTAT5 occupancy at one of two reported enhancers in the Abcg2 gene, a known target of STAT5 in response to prolactin in mammary epithelium (S1E and S1F Fig) [30]. Based on these pilot studies, we undertook ChIP-seq using a pool of five biological replicates and matched input DNA samples (see Methods). A total of 302 peaks were called by MACS2 [31]; 23% of these fall within promoters (<1kb from TSS), whereas most reside within introns or intergenic regions ( Fig 1A). The 50 peaks with highest enrichment are listed in Table 1 along with distance to the nearest TSS and gene feature. A full list of peaks with genome co-ordinates is available in S1 Table. We undertook de novo motif discovery using MEME [32] and found significant enrichment for a palindromic STAT binding site or GAS element, TTCYMRGAA, within the peak regions ( Fig 1B). This is consistent with similar GAS motif enrichment in other STAT5 data sets [30,33], with EMSA studies of EPO-induced GAS element binding in erythroid cells [15,26], and with detailed in vitro binding data for STAT5a and STAT5b [15,26,34,35]. The majority of our pSTAT5 peaks contained a central GAS element according to CentriMO [32], confirming specificity of the ChIP (S1G Fig). We also found significant enrichment of typical GATA (WGATAA) and CACCC-box (CCC-CNC-CCN) motifs within the neighbourhood of pSTAT5 peaks (Fig 1B), but no other motifs.

Genome co-occupancy by pSTAT5, GATA1 and KLF1
Based on these enriched PWMs, we asked whether there is co-occupancy with GATA1 and KLF1 (two critical TFs for erythroid maturation) using reported ChIP-seq data [36,37]. We found 67 of the 302 pSTAT5-occupied sites were bound by KLF1 and 147 were bound by GATA1 (Fig 1C and 1D). Interestingly, the pSTAT5 occupied sites not co-occupied by GATA1 are occupied by STAT5a and/or STAT5b in different cell types [38][39][40][41] (Fig 1D). This suggests pSTAT5 regulates a common subset of genes in most cell types in which it is expressed. Such genes include generic negative feedback regulators of cytokine signalling such as Cish and Socs3 and generic effectors of cell survival such as Bcl2l1 (S1D Fig). Thus, we suggest pSTAT5 undertakes two functions in erythroid cells; a generic function which is present in all cells where cytokine receptor signalling occurs, and an erythroid-specific function. In some cases this is likely to depend upon the activity of GATA1 and/or KLF1. This has been formally shown for STAT5-induced differentiation of human CD34+ cells into erythrocytes [42]. We hypothesised many pSTAT5-occupied regions are likely to be enhancers. For those sites which overlap KLF1 and GATA1-occupied sites this provides evidence for such a role [20,43], but we sought additional evidence from H3K4me1 and H3K4me3 ChIP-seq, and DNase1 HS in primary fetal liver cells (

Immediate EPO-induced transcriptome changes
To determine the immediate transcriptional responses to EPO we performed 4sU-labelling of newly transcribed RNA in murine J2E cells for 30 mins following 10 mins of induction with De novo motif discovery on all 302 peaks using MEME (see Methods) identified highly significant enrichment of a (i) palindromic GAS element (TTCYMRGAA), a (ii) GATA binding site (WGATAR), and a (iii) KLF binding site (CCMCRCCCN). No other significantly enriched motifs were found. (C) Venn diagram and of erythroid co-occupancy between pSTAT5, KLF1 and GATA1 at key erythroid enhancers and promoters. KLF1 ChIP-seq was generated in K1-ER cells and GATA1 ChIP-seq was generated in G1-ER4 cells after induction with tamoxifen. Cooccupancy defined as ChIP summits within 500 bp. (D) Density heat-map of ChIP signal centred on pSTAT5 peak summits from GATA1 and KLF1 as above (C) and additional STAT5 ChIP in other murine tissues. The Y-axis represents individual peak regions, and the X-axis represents 10 kb surrounding the summit. Read intensities were normalized to number of reads in dataset and hierarchically clustered according to intensity within 500 bp of peak centre.
There was an overlap between rapidly induced transcripts and nearby ChIP-seq peaks but this was not always the case (Fig 2 and S2 Table). In fact, we found three scenarios: (i) EPOinduced genes with a nearby pSTAT5 peak (likely direct targets), (ii) EPO-induced genes with  Direct targets of pSTAT5 signalling in erythropoiesis no nearby peak (likely STAT5-independent), and (iii) pSTAT5 peaks without any change in expression of the nearest gene (S1 Fig). For the second scenario, some DEGs may be dependent upon alternative EpoR-generated signals, such as PI3K-AKT or MAPK-ERK pathway signals, which induce gene expression changes via alternate transcription factors to STAT5. For the third scenario, a putative pSTAT5-regulated gene may not be the closest gene to the peak, but reside hundreds of kilobases away. Thus, we could miss-assign some true immediate target genes regulated by distant enhancers. This possibility is difficult to resolve without chromatin conformation capture data. Alternatively, there may have been a delay between pSTAT5 binding and induction of gene expression in some cases. Indeed, there is a dynamic transcriptional response to EPO stimulation [45,47]. To try to resolve these alternate possibilities, we integrated published dynamic CAGE data from J2E cells and also performed qRT-PCR at multiple time points up to 24 hours post EPO stimulation (S1 Fig).

Direct pSTAT5 target genes with immediate early induction
For many DEGs with a proximal ChIP-seq peak, there was rapid induction of primary transcription (within 30-60 mins) which was processed leading to a slower accumulation of mRNA. For example, Cish has a robust pSTAT5 ChIP-seq peak at its promoter (Fig 3A, yellow  track) and weaker peaks about 500bp upstream which overlap with GATA1 occupied regions (brown track). There are four typical palindromic GAS elements near the centre of the peak which respond to EPO in luciferase assays [26], and bind pSTAT5 by EMSA. There is induction of newly transcribed RNA over the gene body (red track) compared with unstimulated cells (blue). There is also a 4-fold upregulation of Cish primary transcripts by qRT-PCR at 30 mins, and later upregulation of mRNA and promoter CAGE tag counts at 60 mins (Fig 3B and  3C). Thus, these complimentary data sets provide a detailed and consistent account of the direct transcriptional response of Cish to pSTAT5. Similarly, the Bcl2l1 gene contains a strong pSTAT5 ChIP-seq peak towards the 3 0 end of a long second intron; this site overlaps with two GATA1 ChIP-seq peaks (purple bar, Fig 3D).  Some GATA1 peaks overlap with KLF1 peaks (orange) but the pSTAT5 peak does not. The primary transcripts for Bcl2l1 are slightly increased at 30 mins but are not maximally induced until~2 hours post EPO stimulation (Fig 3E). Processed mRNA induction peaks at~4 hours post-stimulation which corresponds to the timing of maximal accumulation of promoter and enhancer CAGE tags (Fig 3F and S4A Fig). So, Bcl2l1 is a direct target of EPO-pSTAT5 but the dynamics of mRNA accumulation are delayed compared to Cish.

Direct pSTAT5 target genes with delayed RNA induction dynamics
The Suv420h2 gene has a strong intronic pSTAT5 ChIP-seq peak and obvious 4sU RNA induction at 30 mins (Fig 3G, red versus blue tracks), but primary transcripts peak at 2 hours and remain high at 4 hours post EPO treatment (Fig 3H). Similarly, CAGE tags at the promoter and first intron both slowly accumulate in response to EPO (Fig 3I). Interestingly, CAGE tags accumulate more rapidly at the pSTAT5-occupied enhancer in intron 6 (purple bar in Fig 3I  and S4B Fig). There was significant upregulation of 4sU-RNA for Suv420h2 at 30 mins but induction was greater at later time points. We hypothesised additional direct pSTAT5 target genes with similarly delayed induction could be missed in the 30 min 4sU RNA sample. So, we undertook qRT-PCR and CAGE data analysis at multiple time points for 24 hours post EPO stimulation for candidate delayed target genes (Fig 4).
Gypc encodes Glycophorin C, a known direct target of KLF1 [22,48]. It has a large first intron with a robust central pSTAT5 peak (Table 1). There are multiple evenly spaced KLF1 and GATA1 peaks throughout the 5 0 part of intron 1 with the genomic architecture of a super enhancer (Fig 4A) [49,50]. The strongest KLF1/GATA1 co-occupied region is also bound by pSTAT5 whereas the lesser peaks and the promoter are not. There is some baseline expression of Gpyc in the absence of EPO (blue track) but limited induction of 4sU-labelled RNA by 30 mins (red), so Gpyc was not called as a DEG. However, qRT-PCR and promoter CAGE data clearly show significant induction by EPO (Fig 4B and 4C), albeit delayed (>30 mins). We found similar delayed induction of Podxl which has two strong upstream pSTAT5 bound regions (Table 1 and S5A Fig, ochre track), both of which are co-occupied by GATA1 (brown) and KLF1 (orange). Once again there is delayed induction of primary RNA, mRNA and CAGE tags (S5B and S5C Fig).
Lastly, there are some genes which have a robust pSTAT5 ChIP-seq peak nearby but display no obvious induction even at late time points. For example, there is a strong pSTAT5 ChIPseq peak at the Cdk5rap1 promoter (Table 1 and S5D Fig), but no evidence or induction of transcription by 4sU-RNA-seq, qRT-PCR or promoter CAGE data (S5E and S5F Fig). It is possible the promoter-bound pSTAT5 site is actually a distant enhancer for some other DEG, but this is not obvious upon inspection of the UCSC Browser. So, there appear to be some pSTAT5 bound sites which do not direct local gene expression. The function of pSTAT5 at these sites is uncertain.

EPO induced pSTAT5-independent genes
We also considered genes which showed rapid induction in response to EPO (called as DEGs), but which had no pSTAT5 ChIP-seq peak within 50kb of the TSS. For example Thbs, which  (Table 2) with obvious accumulation of 4sU RNA (Fig 4D, red). We confirmed gene induction by qRT-PCR and CAGE with similar dynamics to direct pSTAT5 target genes (Fig 4E and 4F). However, there is no pSTAT5 peak within 50 kb of the Thbs1 TSS (Fig 4D). The data for Furin is similar although there is a weak pSTAT5 peak downstream of the neighbouring gene, Fes, which could be a distant 3 0 enhancer for Furin (S5G-S5I Fig). Without 3C data this possibility remains speculative. In short, we found 38 EPO-induced genes which are not obvious direct pSTAT5 target genes (Fig 2 and  S1A Fig). One likely scenario is they are responding to a pSTAT5-independent signal, such as a PI3K or MAPK.

New EPO-responsive genes
We employed ChIP-seq to determine pSTAT5 occupancy and 4sU-RNA-seq to detect immediate transcriptional targets of EpoR signalling in murine erythroid cells. We found >300 pSTAT5-occupied regions in the erythroid genome. A reduction in the q value threshold from 0.01 to 0.2 only increased peak calls by about 25%, but it also reduced the quality of the peaks (i.e. central GAS sites fell to 50%), so the true number of pSTAT5-occupied sites is low compared with other erythroid TFs. It is important to remember we undertook the ChIP at one time point after EPO stimulation and in cell lines immortalised at the early erythroblast stage of differentiation. Additional pSTAT5 occupancy may occur at delayed times after EPO stimulation or in more differentiated cells. About 15% of the peaks occur at gene promoters but the majority occur at intronic or intergenic sites. These mostly have features characteristic of enhancers such as bi-directional CAGE tags, the H3K4me1 mark and DNase 1 hypersensitivity.
Of the 63 genes identified as EPO immediate early induced, some were unexpected. Manual curation of the encoded proteins places most of them within four functional categories ( Table 2) including proteins involved in: (i) receptor down regulation or degradation, (ii) signalling or proliferation responses, (iii) pro-survival pathways, and (iv) nuclear functions such as transcription, chromatin modification and RNA splicing. While 38 did not have obvious pSTAT5 ChIP-seq peaks at promoters or intronic regions, there may be more distant pSTA-T5-occupied enhancers. However, it is likely that many will be pSTAT5-independent EPO target genes. Small molecule inhibitors of STAT5-independent pathways and ChIP-seq for other transcriptional effectors of EPO signalling such as FOXO3 [51] might help to resolve this issue.

STAT5-dependent negative feedback loops down regulate the EpoR
We found some genes with known roles in down regulation of EpoR signalling are direct targets of pSTAT5; e.g. Cish and Socs3 (S6 Fig). CISH is recruited to activated pY401 in the EpoR [26]. Like many SOCS family members, CISH recruits a complex of proteins including elongin B, elongin C, cullin5 and Rbx2 which together provide E3 ubiquitin ligase activity to add ubiquitin to nearby substrates [52] leading to degradation in the proteasome. In this way CISH provides a negative feedback to dampen EPO-induced signalling (S6 Fig). SOCS3 functions in a similar way but it also contains an N-terminal KIR peptide which has been shown to directly influence STAT5 binding by JAK2 [53] (S6 Fig). The structure of the KIR-JAK2 complex has been solved in the context of the gp130 receptor and is presumably similar for the EpoR. Interestingly, the dynamics of upregulation of CISH and SOCS3 are slightly different [45], so they can provide distinct feedback dynamics. Clint1 (EpsinR) is involved in clathrin-mediated endocytosis (CME) [54], which is the mechanism by which erythroid progenitors procure iron via transferrin-iron binding to the transferrin receptor-1 (Tfr1); it has also been reported to be involved in EpoR internalisation [55]. The N-terminal ENTH domain of Clint1 binds cargos and the central domain binds clathrin and the AP-1 complex which mediates CME. So, we suggest EpoR downregulation at the cell surface may be facilitated by Clint1 which is likely to associate with clathrin-coated vesicles to enhance internalisation (Fig 5A).

EPO-induced changes in RNA splicing factor, RBM25
The Rbm25 gene is induced by EPO via binding of pSTAT5 to an intronic enhancer (Fig 5B). It encodes an RNA-binding protein which has been reported to be involved in alternative splicing. RBM25 binds the CGGGCA sequence in exon 2 of Bcl2l1 RNA to stabilize pre-mRNA-U1 snRNP binding, which favours generation of the Bcl-x(S)-coding in preference to the Bcl-x(L)-coding mRNA [56]. So, EPO could enhance red blood cell survival via regulation of Rbm25 and thereby splicing of Bcl2l1 (Fig 5B).

Nuclear compaction and H4K20 methylation
The erythroid nucleus must undergo compaction prior to enucleation. Setd8 (PR-Set7) is a lysine methyltransferase which induces mono-methylation at histone H4K20. It is essential for nuclear compaction and enucleation [57]. We found Suv420h2 is a direct target of pSTAT5 via a putative intronic enhancer (Figs 3G and 5C). Suv420h2 is a methyl transferase which adds methyl groups to mono-methylated K20 of histone H4, so its function is dependent upon the prior activity of Setd8. The biological function of the di-and tri-methyl marks on H4K20 is controversial. It has been reported to play roles in transcriptional pausing [58], chromatin condensation, and cell cycle control [59]. All of these activities are critical for terminal erythroid maturation. So, we postulate H4K20 tri-methylation by Suv420h2 is essential for terminal erythroid differentiation (Fig 5C).

Cell culture and treatment with erythropoietin
The J2E cell line [60] was maintained in 10% FBS, 1% GlutaMAX™ (ThermoFisher; #35050061), and 1% Penicillin-Streptomycin (ThermoFisher; #10378016). Cells were treated with erythropoietin (EPO) at 10 U/mL prior to harvesting for RNA and ChIP [45] (S1A Fig). qPCR primer design ChIP primers were designed to determine occupancy at predicted binding sites according to STAT5 ChIP performed in mammary tissue [30] (S3 Table). Primers for analysing 4sU-RNA  Table 1). Clint1 directly interacts with Clathrin to facilitate clathrin-mediated endocytosis (CME) of the EpoR and associated proteins. The EpoR can either be recycled from early endosomes to the cell surface for re-use, or degraded via late endosomes, multivesicular bodies and eventually lysosomes. (B) The Bcl2l1 gene encodes Bcl-x which is essential for terminal erythroid differentiation. pSTAT5 binds with KLF1 and GATA1 to upregulate its expression in response to EPO. pSTAT5 also binds the Rmb25 promoter with GATA1 to drive expression. The encoded protein, RBM25, is an RNA-binding protein which can bind into the second exon of the Bcl2l1 RNA via a CGGGCA element to induce preferential splicing to generate Bcl-x(S) isoform in preference to the Bcl-x(L) isoform. The former isoform is pro-apoptotic in some contexts but may play an independent role in erythroid maturation and enucleation. (C) The Suv420h2 gene is a direct target of pSTAT5 via an intronic enhancer. SUV4-20h2 is a histone methyltransferase which recognises H4K20me1 and adds two additional methyl groups. H4K20 is first methylated by Setd8/PR-Set7, a methyltransferase that is essential for erythropoiesis. H4K20me3 inhibits access to H4K16 by the MSL acetylase machinery. Acetylation at H4K16 is associated with active gene transcription whereas H4K20me2/3 is associated with transcriptional pausing. So, upregulation of Suv420h2 by EPO-pSTAT5 is likely to promote global pausing of erythroid gene transcription and reduced production of RNA, a feature of terminal erythroid cell maturation. H4K20me3 has also been associated with cell cycle arrest and chromatin compaction, two hallmarks of erythroid cell maturation. https://doi.org/10.1371/journal.pone.0180922.g005 Direct targets of pSTAT5 signalling in erythropoiesis enrichment and DEG validation (S4 Table) were designed to distinguish primary transcripts covering intronic and exonic regions, and mature RNA transcripts incorporating splice junctions.

Bioinformatics
Computational analysis of raw sequencing data was performed as previously described [37]. Heatmaps of signal intensity were generated using EaSeq [62] and regions were sorted according to hierarchical clustering using the nearest neighbour chain algorithm. Clustering was based on values from all datasets quantified from 200 bp surrounding the pSTAT5 peak summits. The area was divided in to 50 bins and values were log transformed and normalized to the maximum signal in the clustered areas.

Accession numbers
All sequencing data generated by this study have been deposited in the Gene Expression Omnibus (GEO) under the accession GSE94301. Supporting DNase I data were from GSE37074, ChIP-seq datasets were accessed from GSE31039 (H3K4me1 and H3K4me3), GSE51338 (GATA1), and additional STAT5 ChIP-seq were from GSE36890, GSE34986, GSE31578 and GSE40930.
Supporting information S1 Fig. (A) Overview of experimental design and results. The J2E murine erythroid cell line was stimulated with EPO (10 U/ml) for the indicated times. An analog of uracil, 4-thiouridine (4sU), was added after 10 minutes to label newly transcribed RNA. 4sU-labelled RNA was isolated after 30 minutes of labelling (see Methods) and used to generate sequencing libraries. DNA was cross-linked at 30 minutes post-EPO stimulation for pSTAT5 ChIP-seq. qRT-PCR samples were collected at indicated time points and CAGE libraries were generated by the FANTOM5 consortium as reported [28]. (B) Western blot for pSTAT5 in J2E cells pre-and 30 mins post-stimulation with EPO (10 U/ml). Cytoplasmic and nuclear extracts were loaded at 'cell equivalent' volumes. (C) EPO-induced pSTAT5 occupancy of a reported GAS element within the Bcl2l1 gene, from mammary epithelia. ChIP was performed on 5 replicate samples following 30 mins EPO induction with the following treatments serving as controls: pSTAT5 Ab (+) or IgG control (−), and treatment with EPO (+) or without (−) for 30 mins. Enrichment of bound DNA was determined by qPCR and expressed as a % of input DNA. (D) ChIP-seq in multiple cell types following 30 mins of EPO induction across the Bcl2l1 gene. Read density profiles for STAT5 across multiple non-erythroid tissues illustrates the basis for ChIP primer design (red tracks). Primers were designed to the previously reported GAS element, and to 1 kb upstream and downstream based upon the binding profiles. Enrichment of pSTAT5 observed in sequencing data (yellow) is consistent with qPCR (C) and prior studies. (E) EPOinduced pSTAT5 occupancy of the Abcg2 gene enhancer. ChIP qPCR was performed as in panel (C), however enrichment was only observed at GAS element 4. (F) Read density profiles of STAT5 ChIP at the Abcg2 gene as described in panel (D). Primers were designed to the previously reported GAS3 and GAS4 elements, and to 1 kb upstream and downstream based upon the binding profiles. Enrichment of pSTAT5 is observed only at GAS4 in sequencing data (yellow), and at a downstream erythroid specific site. (G) CentriMO analysis of discovered MEME motifs i-iii (see Fig 1) within pSTAT5-occupied regions +/-500 bp from the peak summits. The STAT5 motif (i) is significantly enriched around the summit highlighting the specificity and quality of the ChIP. Central enrichment is also observed to a lesser extent for the GATA (ii) and KLF (iii) motifs. There are two strong pSTAT5 peaks 25kb and 50kb upstream of the Podxl TSS which overlap with GATA1 and KLF1 ChIP-seq peaks. However, Podxl is not upregulated by 4sU-RNA-seq (red track) after 30 mins, but shows weak and delayed upregulation of expression which is not significant until 4 hours in qRT-PCR and CAGE. There is a strong pSTAT5 peak at the Cdk5rap1 promoter but no significant upregulation in 4sU-RNA-seq (red track), qRT-PCR or CAGE following EPO induction. Significant upregulation of Furin transcription (red track) can be seen following EPO stimulation. Only a weak pSTAT5 peak can be seen downstream of Furin near to the neighbouring gene, Fes, but there are no other pSTAT5 peaks at the Furin promoter or within 30 kb of the Furin TSS. qRT-PCR for Furin primary pre-spliced transcripts shows dynamic upregulation peaking at 2 hours post EPO stimulation but processed mRNA is not substantially upregulated. CAGE tags at the two alternative Furin promoters show basal expression and delayed gradual upregulation from the second promoter (light green bar) until 4 hours post -EPO stimulation. (TIF)

S6 Fig. The
Cish and Socs3 genes are both direct targets of pSTAT5 and are both rapidly upregulated by EPO. CISH binds activated pY401 in the EpoR via its SH2 domain. It recruits Elongin B via a C-terminal SOCS box and indirectly recruits Elongin C, which then recruits cullin-5 and Rbx1. This complex functions as an E3 ubiquitin ligase, to target EpoR and associated proteins for degradation in the proteasome. SOCS3 functions in a similar way via binding to pY401 but in addition it directly inhibits the kinase activity via competitive displacement of STAT5 binding in the active pocket of JAK2. It achieves this additional function via the N-terminal KIR domain which is not present in CISH. (TIF) S1

Author Contributions
Conceptualization: Andrew C. Perkins.