Skip to main content
Advertisement
  • Loading metrics

Correlated protein-RNA associations and a requirement for HNRNPU in the long-range recruitment of Polycomb Repressive Complexes by the lncRNAs Airn and Kcnq1ot1

  • McKenzie M. Murvin,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing

    Current address: National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, Maryland, United States of America

    Affiliations Department of Pharmacology, University of North Carolina, Chapel Hill, North Carolina, United States of America, RNA Discovery Center, University of North Carolina, Chapel Hill, North Carolina, United States of America, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America, Curriculum in Mechanistic, Interdisciplinary Studies of Biological Systems, University of North Carolina, Chapel Hill, North Carolina, United States of America, Curriculum in Genetics and Molecular Biology, University of North Carolina, Chapel Hill, North Carolina, United States of America

  • Shuang Li,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – review & editing

    Affiliations Department of Pharmacology, University of North Carolina, Chapel Hill, North Carolina, United States of America, RNA Discovery Center, University of North Carolina, Chapel Hill, North Carolina, United States of America, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America

  • Elizabeth W. Abrash,

    Roles Investigation, Writing – review & editing

    Affiliations Department of Pharmacology, University of North Carolina, Chapel Hill, North Carolina, United States of America, RNA Discovery Center, University of North Carolina, Chapel Hill, North Carolina, United States of America, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America, Curriculum in Genetics and Molecular Biology, University of North Carolina, Chapel Hill, North Carolina, United States of America

  • Bridget A. Peck,

    Roles Investigation

    Affiliations Department of Pharmacology, University of North Carolina, Chapel Hill, North Carolina, United States of America, RNA Discovery Center, University of North Carolina, Chapel Hill, North Carolina, United States of America, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America

  • Samuel P. Boyson,

    Roles Investigation, Methodology

    Affiliations Department of Pharmacology, University of North Carolina, Chapel Hill, North Carolina, United States of America, RNA Discovery Center, University of North Carolina, Chapel Hill, North Carolina, United States of America, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America, Curriculum in Genetics and Molecular Biology, University of North Carolina, Chapel Hill, North Carolina, United States of America

  • Zhiyue Zhang,

    Roles Methodology, Software

    Current address: Computer Science Graduate Program, Princeton University, Princeton, New Jersey, United States of America

    Affiliation Department of Pharmacology, University of North Carolina, Chapel Hill, North Carolina, United States of America

  • Rachel E. Cherney,

    Roles Investigation

    Affiliations Department of Pharmacology, University of North Carolina, Chapel Hill, North Carolina, United States of America, RNA Discovery Center, University of North Carolina, Chapel Hill, North Carolina, United States of America, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America, Curriculum in Genetics and Molecular Biology, University of North Carolina, Chapel Hill, North Carolina, United States of America

  • J. Mauro Calabrese

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    jmcalabr@med.unc.edu

    Affiliations Department of Pharmacology, University of North Carolina, Chapel Hill, North Carolina, United States of America, RNA Discovery Center, University of North Carolina, Chapel Hill, North Carolina, United States of America, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America

Abstract

The lncRNAs Airn and Kcnq1ot1 recruit Polycomb Repressive Complexes (PRCs) and repress genes over multi-megabase genomic intervals, but how they interact with proteins to direct repression remains poorly understood. We conducted formaldehyde-based RNA-immunoprecipitations (RIPs) of 27 proteins from mouse trophoblast stem cells (TSCs), using a protocol exhibiting similar signal-to-non-specific signal and post-lysis reassociation ratios as crosslinking immunoprecipitation (CLIP) and crosslinking affinity purification (CLAP). Patterns of protein associations across Airn and Kcnq1ot1 were more similar to each other than to nearly all other transcripts and partitioned to extents that mirrored the degree of repression each lncRNA induced, implying connections to mechanism. Indeed, HNRNPU, a factor essential for Xist’s localization to chromatin, was enriched over Airn and Kcnq1ot1 and required to maintain normal levels of PRC1- and PRC2-directed chromatin modifications across the Airn and Kcnq1ot1 target domains, yet was dispensable for both lncRNAs’ localization to chromatin and for their association with PRC1. HNRNPU depletion caused a greater reduction in PRC-directed chromatin modifications and gene repression across the inactive X and the ~ 15 Mb Airn target domain than across the ~ 3 Mb Kcnq1ot1 domain. Perhaps relatedly, HNRNPU depletion significantly reduced the overall levels of Xist and Airn but not Kcnq1ot1. Our study reports architectures of protein association along Airn and Kcnq1ot1 compared to the transcriptome at large, highlights shared and distinct features between the two lncRNAs, and provides new perspective on the role of HNRNPU in long-range chromatin regulation by lncRNAs.

Author summary

Mammalian genomes express thousands of long noncoding RNAs (lncRNAs). While most remain functionally uncharacterized, a handful are known to regulate genes via epigenetic pathways. Whether or not these known regulatory lncRNAs operate through shared or divergent mechanisms remains unclear. To gain insight, we compared RNA-protein associations across three known regulatory lncRNAs—Airn, Kcnq1ot1, and Xist— relative to the broader transcriptome, using a protocol to immunoprecipitate RNA from formaldehyde-crosslinked cells. Benchmarking of our RNA-immunoprecipitation protocol against methods called crosslinking immunoprecipitation (CLIP) and crosslinking affinity-purification (CLAP) revealed similar results and complementary strengths among methods. We discovered similarities in RNA-protein associations between Airn and Kcnq1ot1 that nominated them as mechanistically important. One associated protein, HNRNPU, was required for Airn and Kcnq1ot1 to coordinate long-range recruitment of Polycomb complexes, without giving the appearance of being required to tether either lncRNA to chromatin, as HNRNPU has been proposed to do for Xist. Our study identifies RNA-protein associations that correlate with long-range gene regulation by lncRNAs and provides new perspectives on the potential mechanisms through which HNRNPU participates in lncRNA-mediated epigenetic control. We also benchmark a protocol to recover RNA-protein interactions that is simple to execute and has complementary strengths relative to other methods.

Introduction

Many RNAs produced by mammalian genomes are long and have little potential to encode protein even after splicing (long noncoding RNAs; lncRNAs). While most lncRNAs are functionally unannotated, several play important roles in health by regulating gene expression in cis, on the same allele from which they are transcribed [1]. The sequence-based logic that lncRNAs employ to carry out regulatory functions remains poorly understood, leaving much unclear about mechanism and the extent to which uncharacterized lncRNAs regulate gene expression.

The best-studied and most potent cis-acting regulatory lncRNA is Xist, which silences gene expression over the 165 megabase (Mb) X-chromosome [2]. On autosomes, at least two lncRNAs also repress gene expression in cis: Airn and Kcnq1ot1. Both are imprinted and expressed from paternally inherited alleles, where they can repress genomic regions spanning ~15 Mb and ~3 Mb around their sites of transcription, respectively [310]. While Xist is 18 kilobases (kb) long, robustly spliced, and abundant, Airn and Kcnq1ot1 are upwards of 90 kb long, predominantly unspliced, and lower in abundance [4,1116]. In our prior studies of mouse trophoblast stem cells (TSCs), a cell type derived from the polar trophectoderm of the developing blastocyst in which all three lncRNAs are expressed and active, we have observed that at the steady state, Xist accumulates to about 230 copies per cell, whereas Airn and Kcnq1ot1 each accumulate to only 7 or 8 copies, a 25–30-fold difference [4].

Despite their differences, Airn, Kcnq1ot1, and Xist require several of the same proteins to carry out repression. These include the histone-modifying enzymes Polycomb Repressive Complex 1 (PRC1), which catalyzes monoubiquitination of lysine 119 on Histone H2A; Polycomb Repressive Complex 2 (PRC2), which catalyzes trimethylation of lysine 27 on Histone H3; and G9a/EHMT2, which catalyzes the demethylation of lysine 9 on Histone H3 [9,10,1723]. Additionally, all three lncRNAs require the RNA-binding protein (RBP) HNRNPK to recruit both PRC1 and PRC2 to chromatin, possibly in a sequential manner, with PRC1-directed modifications being required for the ultimate recruitment of PRC2 [4,15,16,24,25]. Lastly, the SPEN-binding region of Kcnq1ot1 is important for maintaining silencing of its target genes in mouse embryonic stem cells (ESCs), suggesting that like Xist, Kcnq1ot1 may also require the RBP SPEN to induce gene silencing [26].

It remains unclear whether Airn, Kcnq1ot1, and Xist engage with additional shared cofactors or different ones to maintain repression. It also remains unclear whether the lncRNAs exhibit distinct properties in relation to the broader transcriptome or even to each other. Although protein interactions within Xist have been characterized [2,2740], the RBPs that interact with Airn and Kcnq1ot1, their patterns of interaction within each lncRNA, and the functional relevance of these interactions are incompletely defined. In addition to providing insights into mechanism, addressing these unknowns could help identify predictive features of mammalian regulatory RNAs and develop a better understanding of the functions of lncRNA-associated RBPs, many of which are essential for health, such as HNRNPU [4145].

Herein, we used a formaldehyde-based RNA immunoprecipitation-sequencing (RIP-seq) protocol [46], which we demonstrate exhibits similar signal-to-noise and post-lysis RNA association ratios as crosslinking immunoprecipitation (CLIP) and crosslinking affinity purification (CLAP; [37]), to examine protein associations across Airn, Kcnq1ot1, and Xist in TSCs. We found that protein associations within Airn and Kcnq1ot1 were more correlated to each other than to Xist and essentially all other chromatin-enriched RNAs, and that the extent to which the associations partitioned into communities in our network analysis correlated with the degree of repression induced by each lncRNA. HNRNPU, an RBP necessary for Xist’s localization to the X chromosome, exhibited enriched associations with Airn and Kcnq1ot1 and was required to maintain PRC1- and PRC2-directed modifications over their target domains, yet was not required for their proper localization. Our study highlights a RIP protocol that offers advantages and complementary insights relative to CLIP and CLAP, provides a roadmap for interrogation of RNA-protein associations across Airn, Kcnq1ot1, and other chromatin-associated RNAs, and offers new insights into how HNRNPU may modulate long-range gene regulation mediated by lncRNAs.

Results

A formaldehyde-based RIP protocol returns similar signal-to-non-specific-signal and post-lysis reassociation ratios as CLIP and CLAP

Xist directly or indirectly associates with hundreds of proteins, many of which bind RNA, are expressed at near-micromolar concentrations, and regulate many processes, including splicing, RNA export, and translation [2736,47,48]. Prior analysis by formaldehyde-based RIP revealed that one such protein, HNRNPK, exhibited enriched associations not only within Xist but also within Airn and Kcnq1ot1, in patterns implicating HNRNPK and its RNA-bound regions in PRC recruitment [4,15]. We reasoned that other proteins important for biogenesis or repressive function of Airn, Kcnq1ot1, or Xist might also exhibit distinct patterns of enrichment within the lncRNAs.

To investigate, we performed RIP-seq from formaldehyde-crosslinked, female, F1-hybrid mouse TSCs examining a targeted panel of 25 different RBPs, two components of PRC1 (RING1B and RYBP), and six biological replicates of non-specific IgG control (Fig 1A and S1 Table [2736]). Airn and Kcnq1ot1 are ~ 25-fold lower in abundance than Xist [4], rendering proteomics-based approaches to identify RNA-protein interactomes more challenging to execute and less suitable for direct comparisons than RIP-seq [50]. In addition, RIP-seq enables mapping of patterns of protein association in Airn, Kcnq1ot1, and Xist relative to the transcriptome at large, including direct and protein-bridged associations that may underlie higher-order assemblies, which was a central goal of our study [15,46,5153]. We focused our panel of RIPs on proteins previously found to be associated with Xist, because like Xist, Airn and Kcnq1ot1 are also chromatin-associated and might draw from a similar pool of proteins to enact repression. Moreover, among the many protein constituents of PRC1 and PRC2, we limited our RIP analyses to the variant PRC1 complex members RING1B and RYBP, because prior data have implicated variant PRC1 in initiating the cascade of PRC1 and PRC2 recruitment directed by Xist; [25,36]; also because, consistent with those data, our prior analyses of RIP-seq performed in TSCs identified RING1B and RYBP as the only two PRC1 or PRC2 complex members exhibiting robust enrichment over Airn, Kcnq1ot1, and Xist (out of eight PRC1 and PRC2 complex members profiled) [15]. We elected to perform our experiments in F1-hybrid female TSCs, because prior work has shown that in these cells, Airn, Kcnq1ot1, and Xist are expressed and are each required for long-range Polycomb recruitment and gene repression within multi-megabase target domains [16,5456]. Moreover, Airn and Kcnq1ot1 have been shown to exhibit augmented activity in TSCs and extra-embryonic tissues compared to those derived from the embryo proper [310], making TSCs one of the few cell culture systems available that enable the study of Airn, Kcnq1ot1, and Xist simultaneously and in a setting in which the former two lncRNAs exhibit heightened effects on chromatin. Owing to their ex vivo maintenance of imprinted X-chromosome inactivation, TSCs express Xist exclusively from the paternally inherited X chromosome, providing a natural setting that enabled us to connect protein associations over Xist with underlying effects on the inactive X chromosome; likewise, the F1-hybrid nature of our TSC lines enabled us to make analogous connections within the autosomal imprinted regions repressed by Airn and Kcnq1ot1 [16,5456].

thumbnail
Fig 1. A formaldehyde-based RIP protocol returns similar signal-to-non-specific-signal and post-lysis reassociation ratios as CLIP and CLAP.

(A) 27 proteins selected for RIP-seq in TSCs, along with classifications. “Xist Interaction,” Xist-associated protein; “Repeat A/B/C/E,” protein enriched over Xist-Repeat [2736]. (B) Normalized median of ratios (MoR) values calculated from ERCC Spike-In control RNAs ([46,49]; “ERCC normalized MoR”) in RIP replicates relative to the same values averaged for non-specific IgG control. (C) FLAG westerns in ESCs expressing FLAG-tagged GFP, HNRNPK, and HNRNPU. Abbreviations used in panels (C-I): HK/K, HNRNPK; HU/U, HNRNPU. (D) Overview of comparison between the formaldehyde RIP protocol used in this study [46] and CLIP and CLAP from [37]. (E) IP-specific signal percentages by peak class. “Top10k/5k/2.5k/1k”, the top 10000/5000/2500/1000 peaks ranked by the product of IP_RPM_under_peak * IP-specific signal percentage. (F) Signal-under-peaks within Xist/XIST, Kcnq1ot1/KCNQ1OT1. %_sig, 100*IP_RPM/(IP_RPM + non-specific control_RPM) under peak regions. (G) ERCC normalized MoR for FLAG-RBP RIPs relative to FLAG-GFP. (H) Species-specific recovery percentages (percent of reads or RPKM aligning to tag-expressing vs. non-expressing genome). (I) Wiggle density profiles of RIP, CLIP, and CLAP data over Xist/XIST. Black bars, peaks called in sample. Red bars, Repeats B/D. RPM, reads per million uniquely aligned reads per 50nt bin. Max RPM values were set in different rows to enable visualization of relevant trends. Intensity can be compared across rows as a way to gauge the relative levels of enrichment for each factor over the genomic intervals being displayed. See also S1S4 Figs, S1S6 Tables.

https://doi.org/10.1371/journal.pgen.1012215.g001

Because formaldehyde captures direct and indirect (protein-bridged) RNA-protein interactions, we use the term “association” rather than “binding” to describe RNA-protein interactions detected by RIP. As with other antibody- or antisera-based approaches to detect RNA- or DNA-protein interactions, such as CLIP, CLAP, chromatin immunoprecipitation (ChIP), and Cleavage Under Targets and Release Using Nuclease (CUT&RUN), our formaldehyde-based RIPs are expected to recover both specific and non-specific associations, which we model below using non-specific IgG and other controls.

RIPs for all but two of 27 proteins (NXF1 and U2AF2) were performed in at least biological duplicate (S1 Table). To identify RNA regions enriched by RIP over non-specific IgG control, we employed a peak calling strategy that requires peaks to exhibit a > 2-fold increase in signal in ≥2 replicates over averaged IgG control (for the 25 RIPs replicated in this study). This threshold was empirically selected to reduce false positives due to noise and false negatives due to naturally high non-specific binding over certain regions, such as those associated with the splicing machinery [15,46,5759]. Read densities suggest the minimum resolution of our protocol to be 200–300 nucleotides and we set our minimum peak size to be 200 nucleotides, accordingly [46]. As expected, read densities under peaks were highly correlated between replicates (S1 Table). Likewise, in 15 of 19 cases where motifs were previously defined, sequences under peaks were enriched for expected motifs (S1 Fig). In remaining cases, an estimate of antibody or antisera specificity was made using western blot. While certain antibody or antisera returned single bands of expected size, others did not, evidence for recovery of non-specific interactions in those instances, at least by western blot (S2 Fig). Expected patterns of enrichment for several RBPs were observed over known binding regions in Xist, additional evidence that the protocol can recover expected RNA-protein associations (S3 Fig; [2,3740]).

We next sought to estimate amount of RNA enriched by each RIP relative to non-specific IgG control. As part of our standard protocol, we add ERCC Spike-In control RNAs to immunoprecipitated RNA prior to library preparation. After sequencing, ERCC-counts can be used to infer amount of non-ribosomal RNA recovered by RIP; the lower the ERCC-counts, the greater the fraction of non-ribosomal RNA recovered [46,49]. Across all RIPs, we subjected ERCC-counts to median of ratios normalization and inverted values to facilitate comparisons to IgG (“normalized MoR”; higher values equate to more RNA recovered). 54 out of 59 individual RIPs returned MoR values greater than that of averaged IgG control (Fig 1B and S1 Table). Even between replicates of the same RIP that recovered more and less RNA than IgG control, respectively (e.g., Ciz1), signal patterns were still highly correlated (Pearson’s r., > 0.86; S1 Table). These analyses underscore the general success of our protocol and document which RIPs returned the highest levels of overall signal relative to non-specific IgG control.

Recent work has suggested CLAP as an alternative to RIP and CLIP, noting that CLAP employs covalent-epitope-tagging and denaturing washes and is expected to reduce non-specific association in post-lysis protein-recovery steps [37]. It has also been suggested that at least one native RIP protocol can be confounded by post-lysis RNA-protein reassociations [60]. Therefore, as a way to benchmark our version of formaldehyde-based RIP [46] versus CLIP and CLAP, we sought to compare signal versus non-specific signal and post-lysis RNA-protein association ratios between all three methods. To this end, we performed a series of epitope-tagging experiments that were modeled after ones reported by Guo and colleagues, to facilitate comparisons between methods [37]. We stably introduced doxycycline-inducible 3xFLAG-tagged cDNAs of nuclear-localized GFP, HNRNPK, and HNRNPU – the latter being Xist-associated RBPs studied in more depth below – into mouse embryonic stem cells (ESCs) that also express Xist [58,61]. We induced expression of Xist and FLAG-tagged proteins, formaldehyde-crosslinked the ESCs and 293T human cells, mixed them in a 1:1 ratio, then performed FLAG RIP (Fig 1C-1D). We also performed HNRNPK and HNRNPU RIP from unmixed 293T extracts using mouse monoclonal antibodies raised against the endogenous proteins, to identify 293T peaks (S4 Fig). We examined signal-to-noise and species-specific RNA recovery ratios in RIPs versus those from Halo-V5-tagged HNRNPU CLIP and CLAP performed in reciprocally mixed Xist-expressing mouse SM33-ESCs and human 293Ts (S4 Fig; [37]). Data analyzed can be viewed in S1 Table and (https://genome.ucsc.edu/s/mmurvin/mm10_12_18_25_RIP_reassoc_CLIP_CLAP_ChIP_RNA; https://genome.ucsc.edu/s/mmurvin/hg38_RIP_reassoc_01_08_25; https://data.mendeley.com/datasets/3b3xys2743/2).

We first analyzed ratios of signal versus non-specific signal. For all three methods, peaks were defined as above, requiring 2x greater signal in both replicates relative to non-specific control (when replicates were performed; [37]). FLAG-GFP served as non-specific control for FLAG RIPs, and tag-lacking datasets served as non-specific controls for CLIP and CLAP (S4 Fig; [37]). We used CLAP peaks to analyze CLIP data to simplify comparisons and because high variation between human CLIP replicates prevented clear peaks from being called [37]. We calculated RPM-normalized signal-under peak values for IPs and non-specific controls (S4 Fig), then calculated “IP-specific signal” percentages, defined as 100 * [IP_RPM_under_peak]/([IP_RPM_under_peak]+[control_RPM_under_peak]). We ranked peaks using the product of IP_RPM_under_peak * IP-specific signal. Lastly, we summed RPM-normalized signal-under peak values for IPs and non-specific controls to calculate aggregate IP-specific signal percentage for different classes of peaks. Across classes, IP-specific signal percentages were highest in human CLAP (92–94) and lower but similar between RIP, CLIP, and mouse CLAP (RIP/CLIP, 78–81; mouse CLAP, 81–84; Fig 1E).

We next examined signal versus non-specific signal within Xist/XIST and Kcnq1ot1/KCNQ1OT1 because of their relevance to our study below. Airn is neither robustly expressed in ESCs nor conserved in humans, precluding its analysis here. We summed IP_RPM_under_peak and control_RPM_under_peak values over the length of Xist/XIST and Kcnq1ot1/KCNQ1OT1 and examined IP-specific signal. We dropped Xist from CLIP and CLAP analyses because it was not expressed in the mouse mixed with human-tagged samples that served as non-specific controls for mouse-tagged samples (Figs 1I and S4; [37]). Across methods, IP_RPM_under_peak values were ~10–100-fold higher in RIP versus CLIP and CLAP, perhaps reflecting increased crosslinking efficiency of formaldehyde relative to UV light (Fig 1F). Comparing our lncRNAs of interest, IP-specific signal percentages in RIP ranged between 85 and 94, covered a broad range in CLIP, and ranged from 92 to 100 in CLAP (Fig 1F). Normalized MoR values were 10-fold higher in FLAG-HNRNPK and FLAG-HNRNPU versus FLAG-GFP RIPs (Fig 1G), implying the RBP-RIPs recovered 10-times the amount of non-ribosomal RNA as non-specific IgG control, and that their absolute IP-specific signal percentages may be higher than those calculated using the proportion-based RPM.

We next examined post-lysis reassociation in RIP, CLIP, and CLAP by considering read alignments within combined mouse-and-human genomes. We specify “species-specific recovery” to describe the percentage of reads uniquely aligning to the genome expressing the epitope-tagged protein over the sum of reads uniquely aligning to both genomes (epitope-tag-expressing plus not-expressing). Comparing all reads sequenced in each method (including those not within peaks), species-specific recovery percentages in FLAG-HNRNPK and FLAG-HNRNPU RIPs were 89, falling toward the upper range detected by HNRNPU CLIP and CLAP (53–97; Fig 1H). Across the same peak classes from Fig 1E, species-specific recovery percentages remained relatively constant in mouse CLIP and CLAP but gradually decreased in RIPs and human CLIP and CLAP (Fig 1H). The decrease was particularly apparent in RIP peaks within Xist/XIST, where FLAG-tagged HNRNPK expressed in mouse clearly recovered known HNRNPK-binding regions in human XIST (Repeats B, D [62]; Xist species-specific percentage, 64; Fig 1H-1I). In contrast, peaks within the more-lowly expressed Kcnq1ot1/KCNQ1OT1 exhibited species-specific recovery percentages of 95 or greater across all datasets except mouse CLIP (Figs 1H and S4D).

Thus, our formaldehyde-based RIP protocol recovers expected motifs across the transcriptome, known RNA-protein interactions in Xist, more non-ribosomal RNA than non-specific IgG controls, and by our analyses, signal-to-non-specific-signal and post-lysis reassociation ratios that are comparable with CLIP and CLAP. CLAP generally although not always recovers less non-specific signal, whereas RIP can recover higher signal intensity under peak regions, presumably due to efficiency of formaldehyde crosslinking. Across all methods, different RNA targets experienced varying levels of post-lysis reassociation, in a manner we presume scales with affinity and abundance of the RNA targets and cognate RBPs.

Protein association profiles across Airn and Kcnq1ot1 closely resemble each other and exhibit similarity to intron-containing transcripts from lncRNA and protein-coding genes

Having benchmarked our formaldehyde-linked RIP protocol [46], we next sought to determine the extent to which our selected proteins associated with Airn, Kcnq1ot1, and Xist above non-specific IgG control and relative to other chromatin-enriched transcripts. We defined a set of chromatin-enriched transcripts using a combination of total RNA-seq and fractionation RNA-seq previously performed in TSCs [15].Transcripts that were expressed above a median kallisto-defined expression threshold of 0.125 TPM (counts from total RNA-seq datasets), and whose ratio of chromatin versus cytoplasmic counts from fractionated RNA-seq datasets were greater than 0.75 ([chromatin_RNA-seq_TPM]/([chromatin_RNA-seq_TPM]+[cytoplasmic_RNA-seq_TPM])), were defined as chromatin-enriched (S2, S3 Tables; [63]). To determine significance of enrichment for each protein across each chromatin-enriched transcript, we subtracted non-specific IgG signal underneath peaks for individual replicates of each protein over the length of each chromatin-associated transcript (referred to as “IgG-corrected signal-under-peak values”), summed IgG-corrected signal-under-peak values for each replicate, then used Wilcoxon rank testing to compare the corrected signals to six replicates of non-specific IgG control summed under the same sets of peaks (S2 Table). We also ranked the set of chromatin-enriched transcripts by overall levels of association with each protein, enabling transcriptome-wide comparisons (Fig 2A; [15]). Significant associations are denoted as purple-shaded boxes with black text in Fig 2A, and individual replicate values for select proteins and non-specific IgG control are shown in Fig 2B. Summed, IgG-corrected RIP signal-under-peak values were significantly higher than non-specific IgG control for 23, 24, and 16 proteins over Airn, Kcnq1ot1, and Xist, respectively (Fig 2A; S2 Table). Relative to their own expression levels (assessed by Total and Chromatin-associated RNA-seq), Airn and Kcnq1ot1 exhibited enriched associations with many of the same proteins, some of which were also enriched within Xist (e.g., HNRNPU, HNRNPK, RBM15, RING1B, RYBP, YTHDC1; Fig 2A). Others exhibited enrichments specific to Airn and Kcnq1ot1 (HNRNPL, SAFB, XRN2; Fig 2A). Examining the PRC1 components RING1B and RYBP, Airn, Kcnq1ot1, and Xist exhibited among the most highly-ranked associations in TSCs (#2, #3, and #1 ranked associations with RING1B, respectively; #2, #6, and #1 ranked associations with RYBP, respectively; Fig 2A). Underneath the most highly enriched PRC1-associated regions within each lncRNA, read density exhibited strand specificity and did not co-localize with sites of RING1B and RYBP chromatin binding detected by ChIP [16], consistent with our prior development efforts that showed undetectable levels of DNA contamination upon RNA purification after typical RIPs — likely the combined result of multiple steps in our protocol that remove DNA contamination, including its limited use of sonication, its reliance on Trizol purification of RNA, and its application of DNase treatment after Trizol purification (S5 Fig; [46]).

thumbnail
Fig 2. Protein association profiles across Airn and Kcnq1ot1 closely resemble each other and exhibit similarity to intron-containing transcripts from lncRNA and protein-coding genes.

(A) RIPs ranked by summed, IgG-corrected signal-under-peak values within Airn, Kcnq1ot1, and Xist relative to expressed and chromatin-associated TSC RNAs and other select regulatory lncRNAs. “Total RNA-seq”, “Chrom-assoc.”: rankings from total RNA-seq, chromatin-associated RNA-seq [15]. Black/grey numbers, significant/non-significant associations (p < 0.05 Wilcoxon rank). U2AF2, NXF1 RIPs were excluded from significance analyses (singly replicated). (B) Summed, IgG-corrected signal-under-peak values for select RIPs over Airn, Kcnq1ot1, Xist. “NA”, no peak in lncRNA. (C) Pearson’s r histograms comparing RIP protein association profiles of Airn, Kcnq1ot1, Xist. vs chromatin-enriched transcripts. Labels, percentile of similarity ranking. (D) RIP protein association profile similarity rankings between Airn, Kcnq1ot1, Xist and select regulatory lncRNAs (same data analyzed in panel (C) shown for select lncRNAs). (E) GENCODE genic biotype and intron status of 1000-transcripts most similar to Airn, Kcnq1ot1, Xist [64]. See also S1S3 and S5 Figs, S1S3 and S6 Tables.

https://doi.org/10.1371/journal.pgen.1012215.g002

Many additional chromatin-enriched regulatory lncRNAs are expressed in TSCs, including but not limited to the speckle-associated lncRNA Malat1 [65,66]; the paraspeckle-associated lncRNA Neat1 [67]; the lncRNA Pvt1, which produces many different transcripts and has been observed to exert both locally activating and repressive functions on the adjacent protein-coding gene, c-MYC [6876]; the lncRNA Tsix, which functions to repress transcription of Xist on the active X chromosome [77]; and the lncRNA Meg3, which functions to prevent expression of the protein-coding gene Dlk1 in cis [78,79]. These lncRNAs served as additional points of comparison to Airn, Kcnq1ot1, and Xist, and highlighted intriguing differences. Namely, while Malat1 exhibited high association with speckle-associated proteins such as SRSF1 and SAFB (#1 ranked transcript in TSCs), as would be expected due to Malat1’s known association with speckles, Malat1 did not associate above non-specific controls with proteins linked to the Polycomb system, namely HNRNPK, RING1B, and RYBP (Fig 2A). Likewise, while Neat1 exhibited a high association with the paraspeckle protein SFPQ (#4 ranked transcript in TSCs), it did not associate with HNRNPK, RING1B, and RYBP above non-specific controls (Fig 2A). Pvt1 exhibited a more highly ranked association with HNRNPK than Airn, Kcnq1ot1, or Xist (#1 ranked transcript in TSCs), yet it exhibited a lower ranking with the PRC1 component RYBP and did not associate with RING1B above non-specific controls (Fig 2A). Relatedly, Meg3 exhibited a more highly ranked association with HNRNPK than Airn or Kcnq1ot1 and did not associate with either RING1B or RYBP above non-specific controls (Fig 2A). Lastly, relative to Airn, Kcnq1ot1, and Xist, Tsix exhibited lower ranked associations for all factors profiled, although Tsix did associate with both RING1B and RYBP above non-specific controls — other than Airn, Kcnq1ot1, and Xist, the only regulatory lncRNA discussed here that did so (Fig 2A).

We next sought to quantify how similar levels of protein associations across Airn, Kcnq1ot1, and Xist were relative to other chromatin-enriched transcripts. For each transcript, we defined its “protein association profile” as a 27-value vector comprising IgG-corrected RIP signal-under-peak values for that transcript across each protein profiled (S2 Table). We then compared association profiles between transcripts using Pearson’s correlation. Of the 19295 chromatin-enriched transcripts in TSCs, 18820 had non-zero IgG-corrected RIP signal-under-peak values for at least one protein examined and were included in comparisons below. Examining these transcripts, we found that the protein association profiles of Kcnq1ot1 and Xist were the 92nd and 6888th most similar to Airn (>99th and 63rd percentiles, respectively); the protein association profiles of Airn and Xist were the 270th and 9744th most similar to Kcnq1ot1 (99th and 48th percentiles, respectively); and the protein association profiles of Airn and Kcnq1ot1 were the 2352nd and 2601st most similar to Xist (88th and 86th percentiles, respectively; Fig 2C and S3 Table). To provide a frame of reference, the protein association profile rankings of Neat1, Malat1, Pvt1, Meg3, and Tsix are shown relative to Airn, Kcnq1ot1, and Xist in Fig 2D; none of these other regulatory lncRNAs exhibited protein association profiles rankings higher than 767 and 748 relative to Airn, Kcnq1ot1, or Xist (96th percentile; Tsix’s similarity rankings relative to Airn and Kcnq1ot1, respectively; Fig 2D). We observed no significant correlations between protein association profile rankings and RNA expression levels. We observed weak but significant positive or negative correlations between protein association profile rankings and RNA length (Pearson’s r values of 0.13, 0.28, and -0.11 comparing length and protein association profile rankings relative to Airn, Kcnq1ot1, and Xist, respectively).

Airn, Kcnq1ot1, and Xist are among the few well-documented cis-acting repressive lncRNAs, yet it remains unclear whether they exhibit preferential similarities to protein-coding versus lncRNA genes. To gain insight, we used Pearson’s correlation to identify the top 1,000 transcripts whose protein association profiles were most similar to each lncRNA. In these sets, we evaluated proportions of transcripts produced from lncRNA versus protein-coding genes and compared to proportions from all chromatin-associated RNAs. We also compared proportions of intron-excluding (presumably spliced) and intron-including (presumably nascent) transcripts. These analyses revealed that in terms of protein association profiles, transcripts similar to Airn, Kcnq1ot1, and Xist are mostly produced from protein-coding genes (Fig 2E; > 70% in each set). Whereas Kcnq1ot1- and Xist-similar transcripts exhibited a mild but significant enrichment originating from lncRNA genes, Airn-similar transcripts exhibited a mild but significant enrichment originating from protein-coding genes (Fig 2E; p-adj. < 0.01; Fisher’s exact). Airn- and Kcnq1ot1-similar transcripts exhibited significant enrichments for intron-including RNAs, whereas Xist-similar transcripts exhibited significant enrichment for intron-excluding RNAs (Fig 2E; p-adj. < 0.01; Fisher’s exact).

Thus, the protein association profiles measured over Airn and Kcnq1ot1 are more similar to each other than to the vast majority of other chromatin-enriched transcripts and exhibit a more moderate degree of similarity to Xist. Relative to other known regulatory lncRNAs, Airn and Kcnq1ot1 exhibit strikingly enriched associations with the PRC1 components, RING1B and RYBP, among other more nuanced differences. Relative to other chromatin-enriched transcripts, the protein association profiles of Airn and Kcnq1ot1 most frequently resemble intron-containing RNAs produced from protein-coding genes. Xist more frequently resembles intron-excluding transcripts, the majority of which are also produced from protein-coding genes.

Protein association networks in Airn, Kcnq1ot1, and Xist exhibit above average similarities and a correlation between modularity and repressive potency

Our analyses revealed that Airn and Kcnq1ot1 exhibited significant associations with the majority of Xist-associated proteins profiled. We therefore sought to compare the spatial distribution patterns of protein association detected across Airn, Kcnq1ot1, and Xist. Because RIP-seq performed after formaldehyde crosslinking can capture both direct protein-RNA interactions as well as presumed indirect associations through protein-protein contacts, correlated enrichments of different proteins across transcripts might provide clues about coordinated assemblies or functional partnerships, even if not through direct binding. Indeed, discovery of correlated protein associations in different regions of Xist has revealed insights into its mechanism [15,27,39,51,57,80].

For these analyses, across each chromatin-enriched transcript, we subtracted non-specific IgG control from RIP-seq read densities in 25 nucleotide bins for each protein profiled (referred to as “IgG-corrected read densities”). Then, for all possible pairs of proteins in all chromatin-enriched transcripts, we compared IgG-corrected read densities using Pearson’s correlation. We included all RIP datasets in these analyses to standardize comparisons across the transcriptome. The correlations represent protein association networks in which edge lengths are inversely proportional to correlation coefficients, and node sizes are directly proportional to the number of connected edges. Within each transcript’s protein association network, communities (representing highly correlated nodes that cluster together) were defined using the Leiden algorithm [81]. Networks for Airn, Kcnq1ot1, and Xist are shown in Fig 3A, and IgG-corrected read densities for select proteins color-coded by community are shown in Fig 3B. Note that while networks were defined using 25 nucleotide bins, IgG-corrected read densities in Fig 3B are displaying data summarized in 50 nucleotide bins.

thumbnail
Fig 3. Protein association networks in Airn, Kcnq1ot1, and Xist exhibit above average similarities and a correlation between modularity and repressive potency.

(A) Network graphs of protein association in (i) Airn, (ii) Kcnq1ot1, (iii) Xist. Edges, nodes connected by r values >0.0. (B) Input, non-specific IgG, and IgG-corrected RIP-seq read densities for select proteins across different communities. Nodes and density profiles are colored by community. Grey text, non-enriched proteins (Fig 2A). (C) Pearson’s r histograms comparing protein association networks in Airn, Kcnq1ot1, Xist relative to other chromatin-enriched transcripts. (D) Protein association network similarity rankings in Airn, Kcnq1ot1, Xist relative to select regulatory lncRNAs. (E) Metrics quantifying separation between communities: (i) Distributions of inter- versus intra-community Pearson’s r; (ii) modularity; (iii) average silhouette distance. (F) Number (i) and percent (ii) of rare edges in Airn, Kcnq1ot1, Xist. See also S3 and S5 Figs; S3, S4 and S6 Tables.

https://doi.org/10.1371/journal.pgen.1012215.g003

We next quantified similarity of protein association networks in Airn, Kcnq1ot1, and Xist relative to all other chromatin-enriched RNAs. Of the 19295 chromatin-enriched RNA transcripts, 19223 had non-zero IgG-corrected read density for at least one protein profiled by RIP-seq and were included below. Relative to 19223 chromatin-enriched transcripts, the protein association network of Kcnq1ot1 ranked 6th in similarity to Airn, and that of Airn ranked 1st in similarity to Kcnq1ot1, indicating that patterns of protein association across the two lncRNAs are more similar to each other than to essentially any other chromatin-enriched transcript (>99.9th percentiles; Fig 3C and S3 Table). The protein association networks of Airn and Kcnq1ot1 were also more similar to Xist than to most other chromatin-enriched transcripts, ranking 292nd and 1464th, respectively (98th and 92nd percentiles; Fig 3C and S3 Table). Comparing protein association networks within the same five additional regulatory lncRNAs that served as points of comparison in Fig 2, we observed 98th percentile or higher rankings between Airn and Tsix, and between Kcnq1ot1 and Pvt1, Meg3, and Tsix (Fig 3D). We observed 95th percentile or higher rankings between Airn and Pvt1 and Meg3, and between Xist and Meg3 (Fig 3D).

We next used three orthogonal approaches to examine community structures of protein association networks. First, to quantify the average separation between communities, we examined two distributions within networks: the distribution of Pearson’s r values comparing all proteins within the same communities (intracommunity r values) and the distribution comparing all proteins between different communities (intercommunity r values). While average intracommunity r values were similar between all three lncRNAs, average intercommunity r values were lowest in Xist, followed by Airn, then Kcnq1ot1 (Fig 3E, panel (i); p < 2e-16, Wilcoxon rank test). This indicates that community structures within Xist are more distinct from each other than those within Airn and Kcnq1ot1. Next, we calculated modularity, a metric to assess how strongly a network partitions into communities relative to a null model. Xist’s network had the highest modularity, followed by Airn’s, then Kcnq1ot1’s, consistent with our intra- vs. inter-community r value analysis (Fig 3E, panel (ii); S3 Table). Finally, to assess the degree to which individual nodes (proteins) fit within their assigned community versus other communities, we calculated average silhouette coefficients of all nodes in each transcript’s network [82]. Higher silhouette coefficients indicate a clearer separation between a node-of-interest’s own community and other communities in the network. Average silhouette coefficients by node were highest in Xist, followed by Airn, followed by Kcnq1ot1 (Fig 3E panel (iii); S3 Table).

RBPs bind across the transcriptome, and it remains unclear whether correlated associations across Xist, Airn, and Kcnq1ot1 occur in other transcripts. To gain insight, we devised a method to quantify which intracommunity connections within Airn, Kcnq1ot1, and Xist represent prevalent versus rare events. Across all chromatin-enriched transcripts, we calculated p values describing the likelihood that each possible pair of proteins would be detected in the same community relative to randomized controls. Protein pairs detected more frequently than chance were classified as “prevalent,” and those detected less frequently than by chance, “rare”. Of 351 possible protein pairs, 145 and 191 were classified as prevalent and rare, respectively (p-adj < 0.05; S4 Table). Xist had a higher number of rare intracommunity edges than many other chromatin-associated RNAs (87th percentile), whereas Airn and Kcnq1ot1 had closer to average (58th) or lower (14th) percentile values, respectively (Fig 3F, panel (i)). About 30% (Xist and Airn) and 19% (Kcnq1ot1) of the edges within the three lncRNAs were classified as rare (Fig 3F, panel (ii)). Notably, edges connecting the PRC1 components RING1B and RYBP to each of the RBPs HNRNPK, HNRNPL, and HNRNPU were rare, whereas edges between any combination of HNRNPK, HNRNPL, and HNRNPU were prevalent, as were edges between PRC1 components and the exoribonuclease XRN2, which was found in communities adjacent to those containing RING1B and RYBP in Airn and Kcnq1ot1 (S4 Table).

Thus, the protein association networks within Airn and Kcnq1ot1 were more similar to each other than to essentially all other chromatin-enriched transcripts, and likewise exhibited similarity to the network within Xist and several other known regulatory lncRNAs that ranked above the 90th percentile. In terms of network structure, communities of association within Xist were more separated from each other than those within Airn, which were in turn more separated from each other than those within Kcnq1ot1. This trend is a striking mirror of the genomic ranges over which each lncRNA represses genes and recruits PRCs; whereas Xist’s repressive range spans the entire X chromosome, Airn’s spans 15 Mb, and Kcnq1ot1’s spans 3 Mb [4]. Lastly, relative to other chromatin-enriched transcripts, protein association networks of Airn, Kcnq1ot1, and Xist contain both prevalent and rare connected edges, suggesting the lncRNAs engage with proteins in ways that are common in certain instances and uncommon in others. While HNRNPK, HNRNPL, and HNRNPU appear to commonly associate across chromatin-enriched transcripts, their correlated associations with PRC1 are relatively rare, suggesting a connection to repressive mechanism.

HNRNPU is required for maintenance of PRC-directed modifications induced by Airn and to a lesser degree, by Kcnq1ot1

Recognizing Polycomb recruitment as a core function of Airn and Kcnq1ot1, our attention was drawn to network edges that linked PRC1 components RING1B and RYBP to HNRNPK, HNRNPU, and XRN2, respectively. All three RBPs exhibited enriched associations with Airn and Kcnq1ot1 by RIP-seq (Fig 2B). Whereas HNRNPK is known to be important for PRC1 and PRC2 recruitment by Airn and Kcnq1ot1 [4,15,36]. HNRNPU and XRN2 have not been investigated vis-a-vis Airn or Kcnq1ot1. However, prior data suggest that HNRNPU is required for Xist function, presumably by stabilizing its association with chromatin [8389], and XRN2 has been proposed to promote H3K27me3 accumulation on autosomes [34,90,91]. An additional factor of interest for us was YTHDC1: although in the protein association networks we analyzed, YTHDC1 was not obviously positioned within or near the PRC1 communities, it nevertheless exhibited highly enriched association with Airn, Kcnq1ot1, and Xist (Fig 2A and 2B), has been linked to gene silencing by Xist, and has been proposed to interact with PRC2 [9294]. PRC1 and PRC2 recruitment are tightly coupled within the Airn, Kcnq1ot1, and Xist target domains, and data would suggest that PRC2 recruitment in these regions is dependent on PRC1-directed chromatin modifications [16,24,25]. Thus, it is conceivable that factors that have been proposed to promote PRC2 activity elsewhere in the genome, such as XRN2 and YTHDC1, may do the same in the regions targeted for repression by Airn and Kcnq1ot1. We sought to investigate that hypothesis below.

To determine whether HNRNPU, XRN2, or YTHDC1 were required for repression by Airn and Kcnq1ot1, we depleted each protein in TSCs using doxycycline-inducible CRISPR-Cas9 technology [56]. As a positive control, we depleted the PRC1-bridging factor HNRNPK [4,15]. As a negative control, we delivered a non-targeting (NTG) sgRNA along with the same doxycycline-inducible Cas9 construct [16]. First comparing HNRNPU to our positive control HNRNPK, we observed that levels of both proteins were substantially depleted following four days of Cas9 induction (“(+) dox”; Fig 4A and 4B). Next, to examine whether PRC1- and PRC2-directed modifications in the Airn-, Kcnq1ot1-, and Xist-target regions were altered by HNRNPK and HNRNPU depletion, we performed ChIP-seq for H2AK119ub and H3K27me3 (PRC1- and PRC2-directed chromatin modifications, respectively), in (-) and (+) dox TSCs, and analyzed SNP-overlapping reads. These analyses enabled us to distinguish effects on paternally-inherited “B6” alleles, on which the lncRNAs are expressed, versus maternally-inherited “CAST” alleles, on which the lncRNAs are silent [4,16,54].

thumbnail
Fig 4. HNRNPU is required for maintenance of PRC-directed modifications induced by Airn and to a lesser degree, by Kcnq1ot1.

(A, B) Western blots of HNRNPK, HNRNPU, and H3 after four days of doxycycline treatment (to induce Cas9) in TSCs expressing sgRNAs against the indicated genes. (C, D) Allelic H2AK119ub and H3K27me3 levels in (-) vs. (+) doxycycline conditions in TSCs expressing sgRNAs targeting HNRNPK (C), HNRNPU (D), over Airn (i, ii), Kcnq1ot1 (iii, iv) domains. Tiling density (left, averaged data) and box-and-whisker plots (right, data by replicate) of RPM-normalized ChIP-seq signal per 10 kb bin, normalized by number of informative SNPs per bin, as in [16]. Δvalues, %-fold change of median [B6 minus CAST] RPM-normalized allelic read counts per bin between (-) and (+) dox conditions within Airn and Kcnq1ot1 target domains (Airn domain: chr17, 3 Mb to 24 Mb; Kcnq1ot1 domain: chr7, 142 Mb to 144.4 Mb; both mm10). ****: p ≤ 0.0001; Student’s t-test comparing (-) vs. (+). See also S6-S8 Figs and S6 Table.

https://doi.org/10.1371/journal.pgen.1012215.g004

As expected, in cells expressing NTG sgRNA, H2AK119ub and H3K27me3 were unchanged or, in the case of H2AK119ub over the inactive X-chromosome, modestly changed (by ~3.3%; S6 Fig). Also as expected, in cells depleted of HNRNPK, H2AK119ub and H3K27me3 were significantly reduced in the Airn, Kcnq1ot1, and Xist target regions (Figs 4C and S7). Strikingly, depletion of HNRNPU significantly reduced H2AK119ub and H3K27me3 in the Airn, Kcnq1ot1, and Xist target regions, and in the Airn target domain, the changes exceeded those observed in the Kcnq1ot1 target domain (Figs 4D and S7).

In contrast, depletion of XRN2 or YTHDC1 did not reduce H2AK119ub or H3K27me3 in the Airn or Kcnq1ot1 target regions (S8 Fig). Over the inactive X-chromosome, depletion of XRN2 led to minimal but statistically significant changes, and depletion of YTHDC1 led to significant but more moderate changes than those observed after depletion of HNRNPK (S7 and S8 Figs).

We conclude that HNRNPU is required to maintain PRC-directed histone modifications not only over the X-chromosome [86], but also in genomic regions repressed by Airn and to a lesser degree, by Kcnq1ot1. XRN2 and YTHDC1 did not exhibit evidence of being required, possibly because those two proteins are not required for PRC recruitment by the lncRNAs, or because the sgRNAs used did not deplete XRN2 and YTHDC1 to levels needed to observe a requirement.

HNRNPU and HNRNPK depletions cause low-level derepression of genes in Airn and Xist but not Kcnq1ot1 target domains

To determine whether HNRNPU and HNRNPK are required for gene repression by Airn, Kcnq1ot1, and Xist, we performed RNA-seq in four biological replicates, comparing TSCs with and without doxycycline addition in the same HNRNPU and HNRNPK sgRNA-expressing cells used in Fig 4 above. Then, to examine the extent to which the paternal expression biases of their target genes were altered by HNRNPU and HNRNPK depletion, we performed allele-specific analyses of RNA-seq data (all three lncRNAs are expressed from paternal alleles in TSCs) [4,54,55]. We restricted our analyses to genes previously shown to be repressed by Airn and Kcnq1ot1 in TSCs [4], and in the case of Xist, to genes that are subject to, weakly, or strongly, escape X-chromosome inactivation, which we operationally defined from our RNA-seq data as genes whose overall allelic expression values from the inactive X chromosome were less than 1% (inactivated), between 1 and 10% (weakly escaping X-chromosome inactivation) and above 10% (strongly escaping X-chromosome inactivation).

Depletion of HNRNPU led to a modest but significant de-repression of Airn and Xist target genes of all classes but did not affect repression by Kcnq1ot1 (Figs 5A-5C; Wilcoxon rank sum). We next examined changes induced by depletion of HNRNPK and HNRNPU on a gene-by-gene basis. To detect significant differences, we arcsine-transformed paternal expression ratios [4], and subsequently used Wilcoxon rank sum test to identify genes whose paternal expression ratios were significantly increased upon depletion of HNRNPK or HNRNPU. Examining the X chromosome, we found 47 genes whose paternal expression ratios increased significantly upon depletion of HNRNPK or HNRNPU (i.e., “de-repressed”; p < 0.05 Wilcoxon rank sum). Five genes, Utp14a, Flna, 5530601H04Rik, Kdm5c, and Zrsr2, were significantly de-repressed upon depletion of both HNRNPK and HNRNPU; 20 genes were de-repressed at the p < 0.05 level upon depletion of HNRNPK; and 22 genes were de-repressed at the p < 0.05 level upon depletion of HNRNPU (Fig 5D and S5 Table). In certain cases, such as Kdm6a or Eif2sx3, paternal expression ratios were significantly de-repressed upon depletion of one protein and trending towards significant de-repression upon depletion of the other (Fig 5D and S5 Table). In other cases, such as Ogt and Bclaf3, paternal expression ratios were significantly de-repressed upon depletion of one protein and exhibited little evidence of change upon depletion of the other (Fig 5D and S5 Table). Upon depletion of both HNRNPK and HNRNPU, a significantly higher number of strongly escaping genes were de-repressed than would have been expected by chance (p < 0.001; Fisher’s exact). Considering the expression of all significantly de-repressed genes, the average increase in percent of expression derived from the paternal allele was four and six percent upon depletion of HNRNPK and HNRNPU, respectively (Fig 5D and S5 Table).

thumbnail
Fig 5. HNRNPU and HNRNPK depletions cause low-level derepression of genes in Airn and Xist but not Kcnq1ot1 target domains.

(A, B, C) Average %_paternal_expression [B6 reads)/(B6 reads plus CAST reads)] by gene within the Airn (A), Kcnq1ot1 (B), Xist (C) target regions in (-), (+) doxycycline conditions. Dots, average %_paternal_expression calculated from independent replicates of sgRNAs used in Fig 4A and 4B. *, **, ****: p ≤ 0.05, 0.01, 0.0001, respectively; one-sided Wilcoxon rank sum. For HNRNPK depletions, three biological replicates of RNA-seq were performed using sgRNAs3 + 4 and a single replicate was performed with sgRNA1; data were then analyzed together for statistical analyses. For HNRNPU depletions, three biological replicates of RNA-seq were performed using sgRNA5 and a single replicate was performed with sgRNAs1 + 3; data were then analyzed together for statistical analyses. (D) %_paternal_expression scaled by row for genes whose %_paternal_expression significantly increases upon depletion of HNRNPK (gene names in cyan), HNRNPU (gene names in orange), or both proteins (gene names in black). Significance was determined at the p < 0.05 level using Wilcoxon rank sum test. As a frame of reference, %_paternal_expression is also shown for Airn and Xist (gene names in grey; neither lncRNA exhibited a significant change upon HNRNPK or HNRNPU depletion). Genes are ordered by their genomic location, starting from the centromere at the top of each heatmap. (E) Average %-fold change of median [B6 minus CAST] RPM-normalized allelic read counts per bin between (-) and (+) dox conditions within lncRNA target domains (averaged Δvalues from Figs 4 and S7). For (A-D), gene dots or gene names are colored by whether the %_paternal_expression was significantly increased upon depletion of HNRNPK, HNRNPU, or both proteins. See also S5 Table.

https://doi.org/10.1371/journal.pgen.1012215.g005

In the Airn target domain, we found 11 genes whose paternal expression ratios increased significantly upon depletion of HNRNPK or HNRNPU (i.e., “de-repressed”; p < 0.05 Wilcoxon rank sum). Two genes, Rps6ka2 and Qk, were significantly de-repressed upon depletion of both HNRNPK and HNRNPU; three genes were significantly de-repressed upon depletion of HNRNPK only; and six genes were significantly de-repressed upon depletion of HNRNPU only (Fig 5D and S5 Table). Considering all significantly de-repressed genes in the Airn target domain, the average increase in percent of expression derived from the paternal allele was eight and seven percent upon depletion of HNRNPK and HNRNPU, respectively (Fig 5D and S5 Table). Connecting these results to a previous study, of the 14 genes whose paternal bias was significantly altered upon truncation of Airn in TSCs [4], nine were detected with high enough allelic coverage to be included in our analyses here; and of these nine genes, four exhibited significant levels of de-repression upon depletion of HNRNPK or HNRNPU or both (Qk, Arid1b, Map3k4, and Dact2), and five did not (Pde10a, Slc22a3, Igf2r, Mas1, and Tcp1). No genes were significantly de-repressed in the Kcnq1ot1 target domain upon depletion of HNRNPK or HNRNPU.

Thus, in TSCs, HNRNPK and HNRNPU are required for maintenance of wild-type levels of gene repression mediated by Airn and Xist, but not Kcnq1ot1. Partially overlapping sets of genes were de-repressed upon depletion of HNRNPK and HNRNPU in both the Airn and Xist target domains. The de-repression observed was weak, with the average level ranging from four to eight percent. No genes were significantly de-repressed within the Kcnq1ot1 target domain. These trends largely mirrored the changes in PRC-directed modifications observed upon knockdown of HNRNPU and HNRNPK, which could be classified as partial reductions that were generally larger in magnitude in the Airn and Xist than the Kcnq1ot1 target domains (Fig 5E).

HNRNPK but not HNRNPU depletion reduces associations between PRC1 and Airn, Kcnq1ot1, and Xist

HNRNPK is required to promote accumulation of PRC1- and PRC2-directed modifications within the Airn, Kcnq1ot1, and Xist target domains, presumably by bridging association between PRC1 and regions within each lncRNA [4,15,36]. We sought to determine whether the requirement for HNRNPU in maintenance of PRC-directed modifications in each lncRNA’s target region could also be explained by reduced associations with PRC1. We performed RIP followed by quantitative PCR (RIP-qPCR) and RIP-seq for RING1B from formaldehyde-crosslinked TSCs after HNRNPK and HNRNPU depletion, as well as in non-depleted controls. By RIP-qPCR, after normalizing for levels of each lncRNA in RNA input, we observed that HNRNPK depletion led to significant reductions in RING1B association with peaked regions in Airn and Xist (Fig 6A; [15]). In contrast, HNRNPU depletion did not significantly reduce input-normalized levels of association between RING1B and any lncRNA, although we observed a downward trend in Airn (Fig 6A). Western blots demonstrated that levels of HNRNPK and RING1B were unaltered by HNRNPU depletion (Fig 6B). HNRNPU depletion did not alter patterns of RING1B association in Airn, Kcnq1ot1, or Xist (Fig 6C). Finally, while patterns of association of HNRNPK and HNRNPU over Airn and Kcnq1ot1 were highly correlated, they were less correlated over Xist, where HNRNPK, but not HNRNPU, was enriched in peaks over the same regions that also exhibited peaked association with RING1B (Fig 6C). We conclude that HNRNPU is required for maintenance of PRC-directed modifications induced by Airn, Kcnq1ot1, and Xist via a mechanism at least partly distinct from that of HNRNPK.

thumbnail
Fig 6. HNRNPK but not HNRNPU depletion reduces associations between PRC1 and Airn, Kcnq1ot1, and Xist.

(A) Input-normalized RING1B RIP-qPCR in Airn (i), Kcnq1ot1 (ii), Xist (iii) peaks in HNRNPK-/HNRNPU-depleted and non-depleted controls. Specific sgRNAs used for these experiments are written under the bar graphs. Dots, averaged qPCR technical triplicates from each biological replicate. *, p ≤ 0.05; Student’s t-test. (B) Western blots of HNRNPK, HNRNPU, RING1B, H3. (C) RIP-seq density profiles of Input, HNRNPU, HNRNPK in WT TSCs (data from Figures 1/2) and HNRNPU-depleted TSCs (data from Fig 4A). “peak/RepC”, regions assessed in (A).

https://doi.org/10.1371/journal.pgen.1012215.g006

Divergent effects of HNRNPU depletion on localization and abundance of Airn, Kcnq1ot1, and Xist

HNRNPU promotes association between chromatin and Xist and other nuclear-retained RNAs [8389,95]. We therefore sought to determine effects of HNRNPU depletion on localization, abundance, and half-lives of Airn, Kcnq1ot1, and Xist. We used single-molecule-sensitivity RNA FISH to examine lncRNA localization in HNRNPU-depleted versus non-depleted controls. Looking first at Xist, after HNRNPU depletion, the primary phenotype observed was a reduced ability to detect RNA FISH signal, consistent with observations from human cells [87,89]. After setting imaging thresholds to visualize the less-intense signal, we did observe dispersal of Xist in HNRNPU-depleted cells, consistent with prior observations (Figs 7A and S9; [83,8588]). In contrast, while overall sizes of Airn and Kcnq1ot1 RNA FISH foci were qualitatively smaller in HNRNPU-depleted cells, we detected similar numbers of foci for both Airn and Kcnq1ot1 in HNRNPU-depleted and non-depleted cells and did not observe diffusion away from their sites of transcription (Figs 7A and S9). RNA-seq data revealed that HNRNPU and HNRNPK depletion significantly reduced the abundance of Airn and Xist but not Kcnq1ot1 (Fig 6C). RNA-seq read densities and half-lives of each lncRNA were unaffected by either protein’s depletion (Fig 7D and 7E).

thumbnail
Fig 7. Divergent effects of HNRNPU depletion on localization and abundance of Airn, Kcnq1ot1, and Xist.

(A, B) RNA FISH detecting (A) Xist and (B) Airn and Kcnq1ot1 (magenta and yellow, respectively) in HNRNPU-depleted and non-depleted TSCs. Signal thresholds set against non-depleted TSCs (“(-) dox thresh.”) or HNRNPU-depleted (“(+) dox thresh.”). Bar plots, lncRNA foci per nucleus in (-) and (+) dox. ****, p ≤ 0.0001; Chi-squared. (C,D,E) Expression (C), RNA-seq read density (D), and expression quantified by RT-qPCR after flavopiridol (E) of Xist (i), Airn (ii), Kcnq1ot1 (iii) in (-) dox, (+) dox conditions in TSCs expressing HNRNPK-, HNRNPU-targeting sgRNAs. Dots, individual biological replicates. ****, p ≤ 0.0001; Student’s t-test. Location of qPCR primers used in (E) labeled in (D). (F) Representative western blots. See also S9 Fig.

https://doi.org/10.1371/journal.pgen.1012215.g007

Thus, both HNRNPU and HNRNPK are required to maintain wild-type levels of Airn and Xist without altering their stability and are not required to maintain levels or stability of Kcnq1ot1. Moreover, HNRNPU depletion does not lead to obvious dispersal of Airn or Kcnq1ot1 from their sites of transcription. Therefore, although HNRNPU is required for the accumulation of PRC-directed modifications in genomic regions targeted by Airn, Kcnq1ot1, and Xist, there are divergent effects of its depletion on localization and abundance of each lncRNA.

Discussion

Examining 27 proteins previously shown to associate with Xist, we compared enrichments and patterns of protein association across Airn, Kcnq1ot1, and Xist, evolutionarily unrelated lncRNAs that serve as models to understand the regulatory functions of protein-RNA interactions in the nucleus. Despite differing linear sequence, we identified significant enrichments and networked relationships for several proteins across all three lncRNAs and within other chromatin-enriched transcripts, providing perspectives on protein interactions that may coincide with repressive function in RNA. Our work also provides insights into chromatin regulation by HNRNPU, an essential protein whose mutation causes neurodevelopmental disorders [4145].

Central to our study was a formaldehyde-based RIP protocol that employs sonication and IP washes borrowed from ChIP. This protocol has been used in prior studies [4,15,5759,96,97] and was recently outlined in depth [46]. Relative to native RIP protocols, our protocol differs in its use of formaldehyde, its sonication prior to IP, and its wash conditions [46,60]. Relative to the formaldehyde-linked RIP protocol upon which ours was originally based (fRIP; [98]), our protocol differs in its manner of crosslinking, its wash conditions, and its addition of ERCC Spike-In controls prior to library preparation [46,49]. Relative to CLIP and the denaturing-based CLAP, our protocol differs in its manner of crosslinking, post-IP washing, and library preparation [37].

However, in terms of experimental outcomes, it has remained unclear how most IP-based enrichment strategies directly compare. The benchmarking we report here against published CLIP and CLAP experiments [37] underscores the utility of our formaldehyde-based protocol and highlights strengths that are complementary to CLIP and CLAP. Using a standardized set of analysis procedures to compare all three approaches, we found that our RIP protocol returned higher signal-under-peak values, as well as signal-to-non-specific-signal and post-lysis reassociation ratios that were largely equivalent with CLIP and CLAP. While CLAP recovered less non-specific signal than CLIP or our version of RIP, as expected, CLAP still recovered non-specific signal and returned measurable levels of post-lysis reassociation (e.g., over Xist/XIST). In some instances, our version of RIP appeared to outperform CLIP and CLAP (e.g., post-lysis reassociation of HNRNPU RIP peaks vs. human CLIP and CLAP peaks; Fig 1H). While some differences may be epitope-tag related (3xFLAG for RIP versus Halo-V5 for CLIP/CLAP), the overall similarity in outcomes provides evidence that all three approaches can be used to study RNA-protein associations in vivo. This conclusion is supported by prior studies which demonstrated that our protocol exhibits robust and knockout-dependent recovery of RNA associated with endogenous protein targets [5759]. Likewise, using our same RIP protocol, we arrived at conclusions similar to those made by Guo and colleagues using CLIP and CLAP and by a third study that developed an RNA-editing strategy to identify RNA-protein associations in situ, which collectively observed that direct interactions between RNA and PRC2, if they occur in vivo, are likely to be low frequency events that do not often rise above thresholds of noise [15,37,99].

Our protocol offers complementary strengths to CLAP in that it returns higher overall levels of signal, does not require epitope-tagging, and is simpler to execute, relying on a standard RNA-seq workflow. Its inclusion of ERCC Spike-In controls also enables estimates of post-IP RNA recovery and absolute scaling of RIPs relative to non-specific controls or across experimental conditions. Moreover, our approach to identify enriched regions uses an intuitive method that can be applied downstream of any peak-caller and defines peaks as those regions that are enriched above a minimum threshold value in multiple replicates relative to a non-specific control (typically two-fold above IgG, non-specific epitope-tag, or protein knockout in at least two replicates) [46]. Because our protocol uses formaldehyde crosslinking, it recovers both direct and protein-bridged interactions, presumably with minimal RNA sequence bias, again complementing UV- and editing-based approaches designed to recover direct protein-RNA interactions [37,99101]. In studying mechanisms of protein engagement by lncRNAs such as Xist, Airn, and Kcnq1ot1, which presumably enact functions by assembling multilayered ribonucleoprotein complexes, parallel applications of methods that recover direct versus protein-bridged interactions could prove informative.

Examining our data, we found that the spatial patterns of protein associations within Airn, Kcnq1ot1, and Xist were more similar to each other than most other chromatin-enriched transcripts (Fig 3C), supporting the notion that aspects of mechanism or biogenesis are shared among all three lncRNAs. The similarities were particularly striking between Airn and Kcnq1ot1, which across a 27-antibody panel, exhibited IgG-corrected protein association patterns that were more similar to each other than to ~19,000 other chromatin-enriched transcripts. The absolute levels of RING1B and RYBP detected in association with Airn and Kcnq1ot1 surpassed that which was observed in essentially all other chromatin-enriched transcripts save Xist, including surpassing the levels of association across a small cohort of known regulatory lncRNAs: Neat1, Malat1, Pvt1, Meg3, and Tsix. At the same time, there were differences between the lncRNAs. Namely, Airn and Kcnq1ot1 exhibited enrichments for proteins that were not enriched over Xist and vice versa. More strikingly, across our 27-antibody panel, modularity of protein association correlated with the genomic range over which Airn, Kcnq1ot1, and Xist recruit PRCs and repress genes. In TSCs, Xist exerts repression over ~165 Mb, Airn over ~15 Mb, and Kcnq1ot1 over ~3 Mb. [4,16]. By several metrics, Xist partitioned protein associations to a greater degree than Airn, which partitioned to a greater degree than Kcnq1ot1. While some partitioning in Xist may derive from our selection of Xist-associated proteins for RIPs, we favor a model in which the partitioning directly reflects the local organization of RNA–protein assemblies across each lncRNA. Specifically, a higher degree of partitioning in Xist and Airn versus Kcnq1ot1 may be due to the presence of a greater number of domains in the former lncRNAs that coordinate interactions with multiple proteins simultaneously. Indeed, tandem repeat domains in Xist are known protein interaction hubs [27], and Airn and Kcnq1ot1 are expressed at equivalent levels and have equivalent half-lives in TSCs [4], supporting the notion that the different modularity in the latter two lncRNAs is driven by RNA sequence and not by expression level.

Collectively, the RNA-protein associations we detected by RIP likely represent a combination of direct and protein-bridged interactions, some degree of non-specific signal and, in the case of Xist and presumably other RNAs harboring ultra-high-affinity RBP binding sites, post-lysis reassociations which may or may not reflect those that occur in vivo. Future work is needed to distinguish between these possibilities and determine the mechanisms and significance of protein engagement by Airn, Kcnq1ot1, and Xist.

To that end, based on its enriched association with Airn, Kcnq1ot1, and Xist, its association patterns that correlated with the PRC1 components RING1B and RYBP, and its requirement for Xist function [8587], we examined the role of HNRNPU in repression by Airn and Kcnq1ot1. We observed that HNRNPU was required to maintain long-range PRC1- and PRC2-directed modifications induced by all three lncRNAs, although Xist and Airn exhibited a greater dependence on HNRNPU than Kcnq1ot1. We also found that HNRNPU (and HNRNPK) were required to maintain wild-type levels of gene repression by Airn and Xist but not Kcnq1ot1. The reductions in gene repression were weak compared to reductions in PRC-directed modifications, leading us to speculate that the former arises from the latter.

Reductions in PRC-directed modifications upon HNRNPU depletion occurred without an apparent loss of association between PRC1 and Airn, Kcnq1ot1, or Xist, suggesting that unlike HNRNPK, HNRNPU does not bridge the lncRNAs with PRC1 [15]. We likewise observed no role for HNRNPU in tethering Airn or Kcnq1ot1 to chromatin, as has been observed for Xist [8587]. Considered together, our data provide new perspectives on the role of HNRNPU in long-range gene regulation mediated by lncRNAs. HNRNPU is thought to play an architectural role in the nucleus, helping to keep transcribed regions accessible by forming what has been proposed to be a mesh-like network in concert with chromatin-associated RNAs [8389,95]. Thus, rather than or in addition to tethering lncRNAs to chromatin, HNRNPU may promote lncRNA-induced, long-range PRC-directed modifications by additional mechanisms. On the one hand, we found that HNRNPU is required to maintain normal levels of the Airn and Xist lncRNAs. Prior data have linked the abundance of Airn and Xist to their effects on chromatin [4,15], and it stands to reason that at least some if not all of the reductions in PRC-directed modifications observed upon depletion of HNRNPU are due to reduced abundance of the two lncRNAs. At the same time, we found that HNRNPU depletion did not reduce the half-life of Airn or Xist. Thus, the data suggest that HNRNPU promotes the abundance of Airn and Xist at the level of their transcription, perhaps by protecting the lncRNAs from repression by the HUSH complex [102], or by maintaining chromatin accessibility around the lncRNA genes in what is otherwise likely to be a naturally repressive genomic environment [8386,88,95]. In an interesting contrast, HNRNPU depletion did not reduce the levels of Kcnq1ot1 and had a more modest effect on PRC-directed modifications in the Kcnq1ot1 target domain. However, by expression-normalized ranking, Kcnq1ot1 and Airn associated with HNRNPU to equivalent degrees (Fig 2A). Thus, the difference in phenotype cannot be explained by a differential interaction between HNRNPU and the Airn and Kcnq1ot1 lncRNAs. We speculate instead that HNRNPU’s lack of requirement in maintaining Kcnq1ot1 levels may be because the protein plays less of an architectural role in rendering the Kcnq1ot1 locus permissive to expression of the lncRNA.

Along those same lines, it is also possible that HNRNPU may promote the propagation of PRC-directed chromatin modifications over large genomic intervals through its architectural role, by maintaining distal regions of chromatin accessible to cis-acting lncRNA-ribonucleoprotein assemblies. In addition to chromatin tethering, an architectural role of HNRNPU may help Xist to access chromatin across the inactive X-chromosome [8386,88,95], and may likewise account for the differing effects of HNRNPU depletion in the 15 Mb Airn versus the 3 Mb Kcnq1ot1 target domain; the larger the genomic interval targeted by a lncRNA, the more sensitive it may be to disruptions in chromatin architecture. Additional research is needed to investigate these models and their potential relation to HNRNPU-mediated gene regulation in health and disease [4145].

Lastly, our study provides perspectives on the possible spectrum of chromatin-enriched RNAs that might regulate genes or chromatin in cis. Xist has served as a model to understand mechanisms of RNA-mediated gene silencing since its discovery [103,104]. However, relative to other chromatin-enriched RNAs, the genomic range over which Xist functions, its nuclear abundance, and its stability are exceptional (Fig 2A; [4,15]). For those reasons, we might expect other cis-regulatory RNAs to share more features with the somewhat less exceptional Airn and Kcnq1ot1. In that regard, our observation that protein association profiles of Airn and Kcnq1ot1 most frequently resembled those of nascent transcripts produced from protein-coding genes may be important. Splicing can occur co-transcriptionally and often leads to rapid export of processed RNA [105]. However, post-transcriptional splicing and delayed export after splicing are common [105114]. Thus, in addition to lncRNAs, RNAs produced from protein-coding genes might carry out lncRNA-like regulation prior to clearance from chromatin, and much of this regulation might be mediated by introns, given their aggregate length relative to exons. Indeed, the TTN pre-mRNA forms an architectural hub in the nucleus that regulates gene expression at multiple levels, providing precedent for this model [115].

Limitations of the study

Many RIPs in this study relied on antibodies raised against endogenous proteins, raising the possibility that non-specific antibody binding confounds interpretation of specific protein-RNA associations. Our sense is that this limitation is most relevant to low-affinity, weakly-enriched target sites, where non-specific binding is more likely to rival specific binding in affinity. Most antibodies from our work have been validated in different ways over many years, including by us in prior works and above, providing measures of confidence in their specificity. Still, orthogonal experiments, including epitope-tagging or genetic validation, are important for in-depth study of any RNA-protein interaction, especially those hovering near thresholds of specific over non-specific signal [15,37,59].

Another limitation is an inability to know with absolute certainty whether an interaction detected by RIP, CLIP, or CLAP occurs in vivo or purely in extracts, due to post-lysis reassociation. Intriguingly, we observed that across RIP, CLIP, and CLAP peaks, IP- and species-specific signal ratios were correlated, suggesting that post-lysis reassociation can occur owing to high on-target affinity (Fig 1 and S2 Table). In our view, high-affinity regions are more likely to reflect bona fide in vivo RNA-protein binding events than not, particularly when RNA and cognate RBP are present in the same cellular compartment (e.g., Xist, HNRNPK, and HNRNPU), and when IP regimes involve high quality antibodies, robust post-IP washes, and high dynamic range.

Methods

TSC culture

The C/B TSC line used in this study was derived in [54]. TSCs were cultured as in [116]. Briefly, TSCs were cultured on gelatin-coated, pre-plated irradiated mouse embryonic fibroblast (irMEF) feeder cells in TSC media (RPMI [Gibco 11875093], 20% qualified FBS [Gibco 26140079], 0.1 mM penicillin-streptomycin [Gibco 15140122], 1 mM sodium pyruvate [Gibco 11360070], 2 mM L-glutamine [Gibco 25030081], 100 μM β-mercaptoethanol [Sigma-Aldrich 63689]) supplemented with 25 ng/mL FGF4 (Gibco PHG0154) and 1 μg/mL Heparin (Sigma-Aldrich H3149) just before use, at 37°C in a humidified incubator at 5% CO2. At passage, TSCs were trypsinized with 0.125% trypsin-EDTA in PBS solution (Gibco 25200–072) for ~4 min at room temperature and gently dislodged from the plate with a sterile, cotton-plugged Pasteur pipette. To deplete irMEFs from TSCs prior to all harvests, TSCs were pre-plated for 45 min at 37°C on a nongelatinized plate, which was then rinsed twice with a serological pipette, and the solution was transferred to a fresh culture plate. MEF-depleted TSCs were then cultured for three to four days in 70% irMEF-conditioned TSC media supplemented with growth factors prior to harvesting for molecular or genomic assays.

ESC culture

B6/CAST F1-hybrid Rosa26-RMCE ESCs from [58] were grown on gelatin-coated plastic dishes in a humidified incubator at 37°C and under 5% CO2. Cells were grown in DMEM high glucose plus sodium pyruvate (Gibco 11995–065), 15% ESC-qualified fetal bovine serum (Gibco 26140–079), 0.1mM non-essential amino acids (Gibco 11140–050), 100 U/mL penicillin-streptomycin (Gibco 15140–122), 2 mM L-glutamine (Gibco 25030–081), 0.1 mM β-mercaptoethanol (Sigma-Aldrich 63689), and 1:500 LIF conditioned media produced from Lif-1Cα (COS) cells.

HEK293T cell culture

HEK293T human embryonic kidney cells were grown in DMEM (Gibco) with 10% Fetal Bovine Serum (Gibco), 1% Pen/Strep (Gibco), and 1% L-glutamine (Gibco). Cells were maintained in incubators set at 37°C and 5% CO2. Media was replaced every 3 days.

Generation of Cas9/sgRNA-expressing TSCs

sgRNAs targeting HNRNPK, HNRNPU, XRN2, and YTHDC1 were designed using the Benchling CRISPR Guide RNA Design tool then cloned into BbsI-digested rtTA-BsmbI piggyBac vector (Addgene plasmid #126028; [57]) using NEB Quick Ligase (NEB M2200S). HNRNPK sgRNAs are from [4]. The NTG sgRNA is from [16]. Plasmids were purified from High-Efficiency 5-alpha competent cells (NEB C2987H) using the PureLink HiPure Plasmid Midiprep Kit (Invitrogen K210004). DNA was quantified via Qubit 2.0 fluorometer (dsDNA broad-range kit, Thermo Fisher Q32853). Cloned sgRNA plasmids are deposited in Addgene. Individual or pooled sgRNA vectors were mixed with the TRE-Cas9-Cargo vector (Addgene plasmid #126029; [57]) and the piggyBac transposase vector from [117] at a 1:1:1 ratio in a total of 10 μL. The 10-μL (2.5 μg) plasmid mixtures were mixed with 1 million WT TSCs suspended in 100 μL Neon Buffer R (Invitrogen MPK10025). TSCs were then electroporated using a Neon Transfection System (Invitrogen) with two 30-ms pulse of 950 V (program 5) before seeding onto a well of a 6-well of gamma-irradiated drug-resistant (DR4) irMEF feeder cells (ATCC SCRC-1045) in TSC growth medium lacking penicillin-streptomycin. Media was changed the following morning with 70% conditioned media and growth factors. Starting 24 hours after electroporation, cells were selected with G418 (200 μg/mL; GIBCO) and Hygromycin B (150 μg/mL; GIBCO) for 9 days. TSCs were split every 2–3 days, and media was changed daily with 70% conditioned media, growth factors and antibiotics. Following 9 days of selection, TSCs were then split on to irMEFs with standard TSM and growth factors for expansion.

Western blots

TSCs were harvested by washing twice with 3 mL ice-cold PBS, scraping into 1 mL ice-cold PBS, and transferring to 1.7-mL tubes. Following centrifugation (1200 × g, 10 min, 4°C), supernatants were removed and cell pellets stored at -80°C. Cell pellets were thawed on ice and lysed by resuspension in 75–150 μL ice-cold RIPA buffer (10 mM Tris-HCl pH 7.5, 140 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 1% [v/v] IGEPAL CA-630, 0.1% [w/v] sodium deoxycholate, 0.1% [w/v] SDS) supplemented with fresh 1 mM PMSF and 1:100 protease inhibitor cocktail (Sigma P8340-5ML). Cells were thoroughly resuspended using 10 up-down strokes with a P1000 pipette, incubated with rocking for 30 minutes at 4 C, then sonicated on ice using a Sonics Vibracell VCX130 probe-tip sonicator at 30% amplitude (two 10-second pulses with 1-minute rest on ice). Cell debris was removed by centrifugation (16,100 × g, 15 min, 4°C), and cleared lysates were transferred to pre-chilled tubes. Protein concentrations were determined using the Bio-Rad DC Protein Assay (Bio-Rad 5000112) and a standard curve of BSA.

Lysates were diluted 3:1 with 4 × Laemmli sample buffer (Bio-Rad 1610747) containing 10% (v/v) β-mercaptoethanol. Equal amounts of protein (20 μg) were loaded onto pre-cast polyacrylamide SDS-PAGE gels (4–20% polyacrylamide, Bio-Rad) selected based on target protein molecular weights and ran in Tris-glycine-SDS running buffer (2.5 mM Tris, 19.2 mM glycine, 0.01% [w/v] SDS, pH 8.3) at 130 V at room temperature. Proteins were transferred to methanol-activated Immobilon-P PVDF membranes (Millipore Sigma IPVH00010), in Tris-glycine-methanol-SDS transfer buffer (2.5 mM Tris, 19.2 mM glycine, 20% [v/v] methanol, 0.001% [w/v] SDS, pH 8.3) at 4°C, either for 1 hour at 100 V or overnight at 20 V. Membranes were blocked for 30 minutes at room temperature with orbital shaking in 1x TBS with 1% Casein blocking buffer (Bio-Rad #1610782). Primary antibodies or antisera were applied at concentrations listed in S6 Table and incubated overnight at 4°C with gentle rocking. After washing (one quick TBS-T rinse followed by three 10-minute washes in TBS-T with orbital shaking), membranes were incubated with HRP-conjugated secondary antibodies (1:20,000 dilution, goat anti-mouse HRP (Thermo Fisher A16072); or 1:100,000 dilution, goat anti-rabbit HRP (Thermo Fisher G-21234); or 1:100,000 dilution, rabbit anti-goat HRP (Thermo Fisher 31402)) for 45 minutes at room temperature. Following a final round of TBS-T washes, protein bands were visualized using SuperSignal West Femto ECL substrate (Thermo Fisher 34094) and imaged on a Bio-Rad ChemiDoc system in chemiluminescence mode. Western blots assessing the extent of depletion of protein targets were performed for each (-) and (+) dox replicate in this study.

RIP-seq

See [46] for a line-by-line description of the RIP protocol used in this study, along with many of its standard analytical procedures. Prior to RIP, TSCs were grown to 75–85% confluency for one passage off irMEFs, trypsinized, and counted. TSCs were washed twice in cold 1X PBS and then rotated for 30 min in 10 mL of 0.3% formaldehyde (1 mL 16% methanol-free formaldehyde (Pierce, #28906) in 49 mL 1X PBS) at 4°C. Formaldehyde was quenched with 1.2 mL of 2 M glycine for 5 min at room temperature with rotation. TSCs were washed 3x in cold 1xPBS, then resuspended in 1xPBS at 10 million cells per mL, aliquoted to 10 million cells per 1.7-mL tube and spun down. PBS was aspirated, and the pellets were snap frozen in a dry ice methanol bath and immediately transferred to -80°C. RIPs were performed as in [15]. Protein A/G agarose beads (25 μL; Santa Cruz sc-2003) were washed three times in blocking buffer (0.5% BSA in 1xPBS). The washed beads were then resuspended in 300 μl of blocking buffer, and 10 µg/µl of antibody or antisera was added (see S6 Table for exceptions). The beads and antibodies/antisera were then rotated overnight at 4°C. The next day, 10 million cells were resuspended in 500 μL RIPA Buffer (50 mM Tris-HCl, pH 8.0, 1% Triton X-100, 0.5% sodium deoxycholate, 0.1% SDS, 5 mM EDTA, 150 mM KCl) supplemented with 1:100 Protease Inhibitor Cocktail (PIC; Sigma Product #P8340), 2.5 μL SUPERase-In (Thermo Fisher Scientific AM2696) and 0.5 mM DTT and sonicated twice for 30 s on and 1 min off at 30% output using the Sonics Vibracell Sonicator (Model VCX130, Serial# 52223R). Cell lysates were then centrifuged at 15,000 × g for 15 min. Supernatants were transferred to a new tube and diluted 1:2 with 500 μl in 500 μL fRIP Buffer (25 mM Tris-HCl pH 7.5, 5 mM EDTA, 0.5% NP-40, 150 mM KCl) supplemented as above with PIC, SUPERase-In, and DTT. The total lysate (25 μL) was saved for input. Antibody/antisera-bound beads were washed three times with 1 mL fRIP Buffer. To the washed beads, 500 μl of diluted cell lysate (5 million cell equivalents) was added. The beads and lysate were then rotated overnight at 4°C. The beads were then washed once with 1 mL fRIP Buffer and resuspended in 1 mL PolII ChIP Buffer (50 mM Tris-HCl pH 7.5, 140 mM NaCl, 1 mM EDTA, 1 mM EGTA, 1% Triton X-100, 0.1% sodium deoxycholate, 0.1% sodium dodecyl sulfate) before transferring to a new 1.7-mL tube. Samples were rotated at 4°C for five minutes, spun down at 1200 x g, and the supernatant was aspirated. Samples were washed twice more with 1 mL PolII ChIP Buffer, once with 1 mL High Salt ChIP Buffer (50 mM Tris-HCl pH 7.5, 500 mM NaCl, 1 mM EDTA, 1 mM EGTA, 0.1% sodium deoxycholate, 0.1% sodium dodecylsulfate, 1% Triton X-100), and once in 1 mL LiCl Buffer (20 mM Tris pH 8.0, 1 mM EDTA, 250 mM LiCl, 0.5% NP-40, 0.5% sodium deoxycholate); each wash included a five-minute rotation at 4°C. At the final wash, samples were transferred to a new 1.7-mL tube. After the final wash, inputs were thawed on ice and bead samples were resuspended in 100 μL of 1X reverse crosslinking buffer (1X PBS, 2% N-lauroylsarcosine, 10 mM EDTA) supplemented with 1 μL SUPERase-In, 20 μL Proteinase K, and 0.5 mM DTT. Samples were incubated for 1 h at 42°C, 1 h at 55°C, and 30 min at 65°C, and mixed by pipetting every 15 min. To the bead and buffer mixture, 1 mL TRIzol was added, vortexed, and 200 μL of CHCl3 was added. Samples were vortexed and spun at 12,000 × g for 15 min at 4°C. The aqueous phase was extracted, and one volume of 100% ethanol was added. Samples were vortexed and applied to Zymo-Spin IC Columns (Zymo #R1013) and spun for 30 s at top speed on a benchtop microcentrifuge. Next, 400 μL of RNA Wash Buffer (Zymo #R1013) was added, and the samples were spun at top speed for 30 s. For each sample, 5 μL DNase I and 35 μL of DNA Digestion Buffer (Zymo #R1013) was added directly to the column matrix and incubated at room temperature for 20 min. Next, 400 μL of RNA Prep Buffer (Zymo #R1013) was added, and the columns were spun at top speed for 30 s. Next, 700 μL RNA Wash Buffer (Zymo #R1013) was added, and the columns were spun at top speed for 30 s. Next, 400 μL RNA Wash Buffer was added, and the columns were spun at top speed for 30 s. The flow-through was discarded, and the columns were spun again for 2 min to remove all traces of the wash buffer. Columns were transferred to a clean 1.7-mL tube, 15 μL of ddH2O was added to each column, and after a five-minute incubation, samples were spun at top speed to elute. 9 μL from each RIP-seq replicate was mixed with 1 μL of a 1:250 dilution of ERCC RNA spike-in controls (Thermo Fisher 4456740) and subjected to strand-specific RNA-seq using the KAPA RNA HyperPrep Kit with RiboErase (HMR) (Roche 08098140702) and KAPA Unique Dual-Indexed Adapters (Roche 08861919702) or TruSeq adapters. RNA fragmentation was performed at 85°C for 5 min. Libraries were quantified using a Qubit 2.0 fluorometer (dsDNA broad sensitivity kit, Thermo Fisher Q32850), pooled, and sequenced on Illumina NextSeq500 or Illumina NextSeq1000 platforms.

Generation and crosslinking of FLAG-cDNA-expressing ESCs

FLAG-HNRNPU, -HNRNPK, and -GFP were all constructed as part of [58], and are available on Addgene. One day prior to transfection, ESCs generated in [58], which contain a doxycycline-inducible Xist in Rosa26 but have had the adjacent hygromycin B resistance gene removed by Flp recombinase, were seeded at 0.5 × 106 cells per well of a six-well plate. The following day, 625 ng of rtTa plasmid, 625 ng of cDNA plasmid, and 1.25 µg of piggyBac transposase from Kirk et al. (2018) (2.5 µg total DNA) were mixed with 5 µL P3000 reagent, 7.5 µL Lipofectamine 3000 reagent, and Opti-MEM media (Gibco #31985–070) to a final volume of 250 µL. The reagents were incubated for 10 min at room temperature before being added to cells with fresh media. After 24 h, cells underwent one week of selection by hygromycin B (50 µg/mL) and G418 (200 μg/mL).

Reassociation RIP-seq experiments

RIPs for the reassociation experiments described in Fig 1 were performed as above, with some exceptions. ESCs containing FLAG- expressing constructs were induced with 1000 ng/mL doxycycline for four days prior to crosslinking. Crosslinking for both ESCs and HEK293Ts was performed as above. 10 μg of FLAG antibody (Sigma F1804) was used for each RIP. A 1:1 volumetric ratio of sonicated extracts made from 5 million human cells (293T) and 5 million mouse cells (ESCs) were combined for a total of 1 mL lysate (in 1:1 RIPA/fRIP buffer, as above). 50 μL of the 1:1 human-mouse lysate was saved for 5% input. Mixed lysate was added to washed, antibody-conjugated beads and RIP-seq was performed as above. HNRNPK and HNRNPU RIPs in 293T cells were performed as described above, using 5 µg of Rabbit IgG (Invitrogen #02–6102) and 5 µL of the same HNRNPK and HNRNPU monoclonal antibodies used for RIP in TSCs.

ChIP-seq

ChIP-seq experiments examining effects of protein depletion on chromatin modifications were performed after four days of addition of 1000 ng/mL of doxycycline (Sigma D9891) to sgRNA-expressing cells. (-) and (+) dox TSC cultures were crosslinked on the same day for each replicate performed. Prior to crosslinking for ChIP, TSCs were passaged once off irMEFs. Adhered TSCs were crosslinked with 0.6% formaldehyde (Fisher Scientific, cat #: BP531–500) in RPMI media with 10% FBS for 10 min at room temperature, then quenched with 125 mM glycine for 5 min at room temperature. Crosslinked TSCs were then washed twice with ice-cold PBS and scraped with ice-cold PBS containing 0.05% Tween (Fisher Scientific, cat #: EW-88065–31), PMSF, and PIC (Sigma Aldrich, cat #: P8340). TSCs were then spun at 3,000 × g for 5 min at 4°C to remove PBS, followed by resuspension in ice-cold PBS with PIC and PMSF and divided into 5-million cell aliquots. All ChIPs were performed using 5 million cells, 5 µL of antibody, and 30 μL of Protein A/G agarose beads (Santa Cruz, cat #: sc-2003). Antibody-conjugated beads were prepared by incubating antibody with beads in 300 μL Blocking Buffer (PBS, 0.5% BSA [Invitrogen, cat #: AM2616]) overnight at 4°C with rotation. Crosslinked TSCs were resuspended in 1 mL Lysis Buffer 1 (50 mM HEPES pH 7.5, 140 mM NaCl, 1 mM EDTA, 10% glycerol, 0.5% NP-40, 0.25% Triton X-100, PIC) and incubated with rotation for 10 min at 4°C. Cells were then resuspended in 1 mL Lysis Buffer 2 (10 mM Tris-HCl pH 8.0, 200 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, PIC) for 10 min at room temperature. All buffer removal steps were performed with 5-min 1,200 × g spins at 4 °C. The extracted nuclei pellet was then resuspended and sonicated in 500 μL Lysis Buffer 3 (10 mM Tris-HCl pH 8.0, 100 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 0.1% sodium-deoxycholate, 0.5% N- lauroylsarcosine, PIC) using a Vibra-Cell VX130 (Sonics) with the following parameters: 8–10 cycles of 30% intensity for 30 s with 1 min of rest on ice between cycles to obtain 100–500 bp fragments. Lysates were then spun for 30 min max speed at 4°C. The detergent compatible protein assay kit (Biorad 5000113) was used along with BSA protein standards to determine protein quantity of lysates, as well as sonicated chromatin from HEK293T cells crosslinked as above with 0.6% formaldehyde. For each ChIP in each sgRNA genotype, equal amounts of TSC lysate by protein content comparing the (-) and (+) dox conditions was retained. 5% of the standardized TSC lysate amount was then removed, and an equivalent quantity sonicated chromatin from HEK293T cells was added as a spike in control. Lysate volumes were brought to 500 μl, Triton X-100 was then added to a final concentration of 1%, and 25 µL was removed to serve as input. Lysates were mixed with pre-conjugated antibody bead mixes and incubated overnight at 4°C with rotation. The next day, the beads were washed five times with 1 mL RIPA Buffer (50 mM HEPES pH 7.5, 500 mM LiCl, 1 mM EDTA, 1% NP-40, 0.7% sodium deoxycholate, PIC) and once with 1 mL TE, each for 5 min at 4°C with rotation and spun at 2,000 × g for 2 min for buffer removal. The washed beads were then resuspended in elution buffer (50 mM Tris-HCl pH 8.0, 10 mM EDTA, 1% SDS) and placed on a 65°C heat block for 17 min with frequent vortexing. ChIP DNA was then reverse crosslinked in 0.5% SDS and 100 mM NaCl overnight at 65°C, followed by a 1 h RNaseA (3 μL; Thermo Scientific, cat #: EN0531) treatment at 37°C and a 2.5 h Proteinase K (10 μL; Invitrogen, cat #: 25530015) treatment at 56°C. DNA was then extracted with 1 volume of phenol:chloroform:isoamyl alcohol (Sigma-Aldrich, cat #: P3803) and precipitated with 2 volumes 100% ethanol, 1/10 volume 3M sodium-acetate pH 5.4, and 1/1000 volume linear acrylamide (Invitrogen, cat #: AM9520) overnight at -20 °C. DNA was then precipitated with a 30 min max speed spin at 4°C, washed once with ice-cold 80% ethanol, and resuspended in TE buffer. ChIP-seq libraries were prepared with NEBNext End Repair Module (NEB, cat #: E6050S), A-tailing by Klenow Fragment (3’/50 exo-; NEB, cat #: M0212S), and TruSeq 6-bp index adaptor ligation by Quick ligase (NEB, cat #: M2200S), and NEBNext High-Fidelity 2X PCR Master Mix (NEB, cat #: M0541S). All DNA size-selection purification steps were performed using AMPure XP beads (Beckman Coulter, cat #: A63880). Single-end, 75- and 100-bp sequencing was performed using Illumina NextSeq 500 and 1000 systems, respectively.

Total RNA preparation

irMEF feeder-depleted TSCs were washed twice with ice-cold PBS and harvested with 1 mL TRIzol (Thermo Fisher 15596026). Total RNA was prepared via standard TRIzol-chloroform extraction, per manufacturer’s instructions, except for the addition of 4 μL 5 μg/μL linear acrylamide (Thermo Fisher AM9520) to promote precipitation in isopropanol. RNase-free water was added to dried RNA pellets, allowed to incubate overnight at 4°C, and resuspended. RNA concentrations were measured via Qubit 2.0 fluorometer (RNA high sensitivity kit, Thermo Fisher Q32852), and integrity of RNA was confirmed by visualizing rRNA bands on an agarose gel.

Total RNA-seq

9 μL of 100 ng/μl solution of TSC RNA was mixed with 1 μL of a 1:250 dilution of ERCC RNA spike-in controls (Thermo Fisher 4456740), and subject to strand-specific RNA-seq using the KAPA RNA HyperPrep Kit with RiboErase (HMR) (Roche 08098140702) and KAPA Unique Dual-Indexed Adapters (Roche 08861919702) or TruSeq adapters. RNA fragmentation was performed at 94°C for 6 min, diluted adapter stocks were 7 μM, and libraries were amplified using 11 cycles of PCR. Libraries were quantified using a Qubit 2.0 fluorometer (dsDNA broad sensitivity kit, Thermo Fisher Q32850), pooled, and sequenced on Illumina NextSeq500 or Illumina NextSeq1000 platforms.

Fractionated RNA-seq and definition of chromatin-associated RNAs

The fractionated RNA-seq data analyzed in this study were generated and first reported in [15]. We defined the set of chromatin-enriched transcripts using a combination of total RNA-seq and fractionation RNA-seq datasets previously collected in TSCs and reported in [15]. Transcripts that were expressed above a kallisto-detected threshold of 0.125 TPM in total RNA-seq datasets, and whose ratio of [chromatin RNA-seq]/([chromatin RNA-seq]+[cytoplasmic RNA-seq]) were greater than 0.75 were defined as chromatin-associated [63].

RIP-qPCR

RIP-qPCR was performed as described above for RIP-seq, with some exceptions. Crosslinking was performed as described above for RIP-seq. Protein A/G agarose beads were prepared as described, except that all bead centrifugation steps in this RIP-qPCR protocol were 2000 × g for 1 min; 5 μL of 219 ng/μL anti-RING1B antibody (Cell Signaling 5694S) was used for each RIP. 10-million-cell pellets were thawed on ice, resuspended in 508 μL RIPA Buffer containing PIC, SUPERase-In, and DTT, and sonicated and centrifuged as described. A total of 510 μL of cleared lysate was transferred to new tubes on ice, to which 510 μL fRIP Buffer containing PIC, SUPERase-In, and DTT was added and mixed well. 25 μL of this mixture was removed and stored at -20°C for later processing as a 5% input sample. Antibody-bound beads were washed twice with 1 mL fRIP Buffer before adding 490 μL lysate mixture (~5 million cell equivalents) and 490 μL of a 1:1 mixture of RIPA Buffer and fRIP Buffer containing PIC, SUPERase-In, and DTT. Overnight bead-lysate incubation and wash steps were performed as previously described. The input-sample Reverse Crosslinking Buffer was prepared on ice by mixing, for each sample, 33 μL 3X reverse crosslinking buffer (3x PBS, 6% [w/v] N-lauroylsarcosine, 30 mM EDTA), 41 μL RNase-free water, 5 μL 100 mM DTT, 1 μL 20 U/μL SUPERase-In, and 10 μL 20 mg/mL proteinase K (Thermo Fisher 25530049). Bead-sample Reverse Crosslinking Buffer was prepared on ice by mixing, for each sample, 33 μL 3X reverse crosslinking buffer (3x PBS, 6% [w/v] N-lauroylsarcosine, 30 mM EDTA), 66 μL RNase-free water, 5 μL 100 mM DTT, 1 μL 20 U/μL SUPERase-In, and 10 μL 20 mg/mL proteinase K. The 25-μL 5% input samples were thawed on ice and mixed thoroughly by pipetting with 90 μL input-sample reverse crosslinking buffer. To washed beads, 115 μL bead-sample reverse crosslinking buffer was added forcefully to resuspend the beads, but the mixture was not otherwise mixed. Samples were heated as described to reverse crosslinks; bead-containing samples had their beads resuspended every 30 min by forcefully ejecting ~100 μL of the supernatant onto the settled beads to avoid losing beads to the inside of pipet tips. After heating, RNA was purified with TRIzol and the RNA Clean & Concentrator-5 kit (Zymo R1013) as described above, during which 580 μL of the TRIzol/chloroform aqueous phase was mixed with 580 μL of 100% ethanol.

Reverse transcription was performed using the Applied Biosystems High-Capacity cDNA Reverse Transcription Kit (Thermo Fisher 4368814) to assemble 20-μL random-primed reverse transcription reactions in 0.2-mL PCR strip-tubes on ice, each containing 5 μL of 5% input or 100% RING1B IP RNA sample and 0.5 μL 40 U/μL RNaseOUT RNase Inhibitor (Thermo Fisher 10777019). The reverse transcription was run with the following thermocycler parameters: 25°C for 10 min, 37°C for 120 min, 85°C for 5 min, hold at 4°C. A six-standard curve of 4-fold serial dilutions was constructed from each 5% (-) dox input sample: 8 μL of the previous standard was mixed with 24 μL of nuclease-free water. For each qPCR primer pair, a qPCR master mix was generated, containing for each 10-μL qPCR reaction: 5 μL iTaq Universal SYBR Green Supermix (Bio-Rad 1725124), 3 μL nuclease-free water, 0.5 μL 10 μM forward primer, and 0.5 μL 10 μM reverse primer (see S6 Table for oligonucleotide sequences). Technical-triplicate qPCR was performed with 9 μL qPCR master mix and 1 μL of standard-curve or 2-fold-diluted reverse-transcription product. A standard curve was run on every plate for each primer pair. Plates were firmly sealed with Microseal ‘B’ PCR Plate Sealing Film (Bio-Rad MSB1001), mixed several times by inversion and low intensity vortexing, and briefly pulsed down in a centrifuge. Reactions were run on a Bio-Rad C1000 Touch Thermal Cycler equipped with a CFX96 Real-Time System with the following program: 95°C for 10 min, 40 cycles of (95°C for 15 s, 60°C for 30 s, 72°C for 30 s, plateread). Using Bio-Rad CFX manager, first setting “Cq Determination Mode” to “Regression” and then using each primer pair’s standard curve (Cq vs. starting quantity) to convert Cq values to starting quantities for each technical replicate, the percent IP/input value for each RIP sample qPCR triplicate was calculated. qPCR technical triplicates were averaged for each biological replicate and plotted as a single dot in GraphPad Prism. Significance was determined via one-tailed t-test comparing the average of triplicate-averaged biological replicate values of (-) and (+) dox conditions.

Single-molecule RNA FISH

Stellaris RNA FISH probes were designed against Kcnq1ot1 utilizing the Stellaris RNA FISH Probe Designer (LGC, Biosearch Technologies, Petaluma, CA) available online at www.biosearchtech.com/stellarisdesigner (version 4.2; parameters: masking level 5; max number of 48 probes; oligo length 20 nt; min. spacing length 2 nt; Quasar 670). Kcnq1ot1 probe sequences are listed in S6 Table. Probes to Airn were designed in [15]. Probes to Xist were designed by Stellaris (product # SMF-3011–1, Quasar 570). TSCs were hybridized with the Stellaris RNA FISH probes following the manufacturer’s instructions for adherent cells, available online at www.biosearchtech.com/stellarisprotocols. Briefly, following four days of culture in 70% conditioned media with growth factors on coverslips, and with the addition of 1000 ng/mL of doxycycline (Sigma D9891), or no doxycycline, sgRNA-expressing cells were fixed with 3.7% formaldehyde and permeabilized overnight in 70% ethanol. Cells were then incubated in Wash Buffer A with formamide for five minutes at room temperature, before hybridization with 125 nM of Airn/Kcnq1ot1 or Xist probes in hybridization buffer at 37°C for 4 hours, then washed for 30 minutes at 37°C with Wash Buffer A. Cells were then incubated with Wash Buffer A consisting of 5 ng/mL DAPI for 30 minutes at 37°C. Cells were then washed for 5 minutes at room temperature once with Wash Buffer B, then fixed with ProLong Glass Antifade Mountant (Thermo Fisher P36982) for 48 hours at room temperature in the dark.

Two biological replicates per (-) dox and (+) dox condition were imaged. Images were taken with a Leica DMi8 inverted brightfield microscope, using Leica Application Suite X software version 3.7.5.24914 and the 63x/1.6 objective (pixel size.104 uM). Fields of cells were chosen by scanning for areas with flat easily identifiable individual nuclei. Acquisition setting for each channel (DAPI, 570, 670) were optimized to avoid saturated pixels (low florescence intensity, and exposure time with log histogram signal not piling up on the right) using the (-) dox sample and kept consistent for the (+) dox sample of that replicate. For each condition, 3–5 Z-stack images with a Z-slice of size 0.2 μm were acquired for each channel sequentially, starting with the highest wavelength channel and moving to the lowest wavelength channel to avoid photobleaching. Z-stacks were set by identifying the limits of the Z position where cells went out of focus. Images were deconvolved with Huygens Essential version 20.04.0p3 64b (Scientific Volume Imaging, The Netherlands, http://svi.nl) using the standard deconvolution profile under batch express. For representative images, Z-series are displayed as maximum z-projections, and (-) dox and (+) dox images were simultaneously adjusted in FIJI (version 2.16.0/1.54p) to have the same threshold, brightness and contrast.

RIP-seq alignments and peak calling

Reads from individual RIP-seq replicates were aligned to mm10 using STAR [118] and filtered for MAPQ of ≥30 using SAMtools [119]. Code used for peak definition was the same code used in [15] and is outlined on https://github.com/CalabreseLab/Airn_Xist_manuscript/tree/main. For each protein profiled by RIP, after alignment and MAPQ of ≥30 filtering, sequencing data from all replicates were concatenated and split into two files using SAMtools [119], corresponding to alignments that mapped to the positive and negative strands of the genome, respectively. Using a custom perl script, the strand information within the positive and negative strand alignment files was randomized so as to better match the criteria of the MACS peak caller, which uses the average distance between positive and negative strand alignments to estimate the fragment length [120]. Putative peaks were called on strand-randomized positive and negative strand alignment files, respectively, using default MACS parameters and not providing a background file [120]. Peak bed files were converted to SAF format and reads under each putative peak were counted for each RIP replicate as well as for the concatenated set of MAPQ of ≥30 filtered reads from all IgG replicates using featureCounts. [121]. Counts per peak per dataset were converted into RPM by dividing by the total number of sequenced reads per dataset and multiplying by 1 million. We defined as peaks those regions whose RPM-normalized values in the RIPs were > 2x those in IgG control in at least two RIP replicates. In cases in which peaks were defined from a single replicate, we required that RPM-normalized values in the single RIP were > 2x than those in IgG control.

Identification of motifs enriched under RIP-seq peaks

For each protein analyzed by RIP-seq, peaks were ranked by the amount of averaged RPM-normalized RIP signal per replicate subtracted by the signal from RPM-normalized IgG control. The top 1000 peaks most enriched in each RIP-seq were selected for motif analyses. The sequences underneath each peak were extracted from mm10 genome using bedtools getfasta [122]. For each protein analyzed, a set of 1000 control sequences of lengths identical to those in the corresponding peak file was generated, but whose sequence content was randomly generated in python using the mononucleotide frequency of the mm10 primary assembly. Peak sequences were then searched for motifs relative to control sequences using STREME and the options: --maxw 8 --minw 4 --nmotifs 10 –rna [123].

Motif comparisons

The motif comparison tool Tomtom [124] from the MEME suite was used to compare the query motifs generated by STREME in our RIP datasets (right side) and the motifs generated from prior studies (left side) in S1 Fig. To generate the target database to search, we first combined all motifs in the left-hand side of S1 Fig into a single file. The RIP-derived motifs on the right-hand side of the Figure were used as queries. Tomtom was run with the code: tomtom -oc output_folder_name query_file target_database. p-values and q-values are reported in the Figure.

Wiggle track generation

Wiggle tracks were generated using protocols and code from [125]. Sequencing reads were aligned to the mm10 genome using STAR (v2.7.10a; [126]). Replicates were downsampled and merged using SAMtools (v1.18; [119]). Merged BAM files were filtered for MAPQ ≥ 30 and then split by strand with SAMtools and a custom script. Bam files were converted to BED12 format using Bedtools (v2.29; [122]). Wiggle tracks were generated using a custom script which included normalizing wiggles to total aligned reads. WIG files were converted to BigWig format and then plotted over respective genome coordinates using PlotGardner in RStudio [127].

ERCC control median of ratios analysis

ERCC sequence information (ERCC92.fa and ERCC92.gtf) downloaded from the github page associated with [128] were used to generate an index against which all relevant RIP samples were aligned using STAR. featureCounts was used to count the number of reads aligning to each ERCC sequence in each RIP. Medians of ratios were calculated in two groups, the first corresponding to RIPs reported in Fig 1B; and the second corresponding to RIPs reported in Fig 1G. A per-sample size factor for each group was generated based on raw counts using the median-of-ratios method. Specifically, for each ERCC control sequence, we calculated the ratio of its raw counts in each RIP replicate to the geometric mean across all RIP samples in the group. The median of these 92 ERCC ratios for each RIP replicate is the un-normalized size factor. Size factors were additionally normalized by calculating: (the arithmetic means of size factor across all Control replicates)/ (size factor of each RIP sample). The normalized values are shown as points in Fig 1B and 1G.

Reassociation analyses

Data from mixed human 293T and mouse SM33-ESC Halo-V5-HNRNPU CLIPs and CLAPs were downloaded from NCBI GEO (GSM8021138–44; [37]), and along with data from our own mixed human 293T and mouse ESC FLAG-GFP, -HNRNPU, and -HNRNPK RIPs, were aligned to concatenated mm10 and hg38 genomes using STAR and the --outFilterMultimapNmax 1 option to only retain uniquely aligned reads [126]. Mouse RIP peaks for FLAG-HNRNPK and FLAG-HNRNPU were called as described above, requiring that RPM-normalized signal in both replicates was > 2-fold higher than that of the averaged FLAG-GFP control in the same region of the mouse genome (i.e., putative peak). Human RIP peaks for HNRNPK and HNRNPU were called as described above, using IgG as a control and data collected from RIPs performed in pure 293T extracts (not mixed with mouse). Peaks in CLIP and CLAP datasets were also called as described above, using as the negative controls the alignments from the sample lacking the tagged construct but matched to the species expressing the tagged construct [37]. For example, to call CLIP or CLAP peaks in 293T cells, the RPM values for human alignments using the “minus-tag” samples were used as negative controls. Conversely, to call CLIP or CLAP peaks in SM33-ESCs cells, mouse alignments using the “plus-tag” samples were used as negative controls (the “plus tag” and “minus tag” nomenclature from [37] is assigned using a human-centric perspective). See S4 Fig.

To rank peaks, we took the product of overall signal strength and the percent of signal represented by IP versus non-specific control, all using the RPM unit value. Specifically, for each peak, we subtracted the averaged RPM of the non-specific control (control_RPM_under_peak) within the peak from the averaged RPM of the IP within the peak, to derive IP_RPM_under_peak, then multiplied that value by the proportion of RPM signal within the peak calculated as [IP_RPM_under_peak]/[RPM_under_peak]. To calculate RIP/CLIP/CLAP signal over Xist/XIST and Kcnq1ot1/KCNQ1OT1, used the approach described in the “RIP-seq signal per transcript” section below.

To calculate species-specific recovery ratios under peaks within each dataset, we calculated RPM values under each peak in the tag-expressing species counting unique alignments to the tag-expressing genome, and separately, calculated RPM values under each peak in the tag-lacking species using alignments to the tag-lacking genome from the same mixed (human plus mouse) alignments described in the first paragraph of this methods section. For each class of peaks analyzed in Fig 1H, we then summed the tag-expressing and tag-lacking RPM values for each peak within the peak class in question, and converted those sums of RPKM values, using the respective peak lengths of each peak class in question. The “species-specific recovery %” values plotted in Fig 1H were derived from the following calculations. In the case of analyzed peaks: a normalized version of ∑RPKM_tag_expressing/(∑RPKM_tag_expressing + ∑RPKM_tag_lacking). In the case of “All reads in IP”, we used the same equation except we replaced the sums of peak RPKM values with all uniquely aligned reads to the tag_expressing and tag_lacking genomes.

A normalization method (“norm_factor”) is needed in these analyses because mouse-to-human ratios as calculated from RNA Inputs did not equal exactly 1:1 in any RIP, CLIP, or CLAP experiment. Norm_factor was calculated using the RNA-seq data from Input samples, in the following manner:

For each RNA Input sample, we defined A as all uniquely mapped reads (or read pairs) in the tag_expressing species and we defined B as all uniquely mapped reads (or read pairs) in the tag_lacking species. For each CLIP,CLAP, or RIP sample, we defined C as the sum of uniquely aligning reads (in the “all reads in IP” comparisons) or the sum of RPKM values across the specific peak class being analyzed (for example, the top10k ranked peaks) in the tag_expressing species, and we defined D as the sum of the analogous peak class in the tag_lacking species. The normalization was then performed as . Raw mouse and human read values from all mixing experiments including CLIP and CLAP are reported in S1 Table.

RIP-seq signal per transcript

RIP-seq signal per transcript was calculated as in [15]. To rank each expressed chromatin-associated transcript by its level of association with each protein profiled by RIP, we first created a version of the GENCODE vM25 basic transcriptome that included one representative intron-containing (i.e., nascent) transcript for each gene that began at the gene’s first annotated transcription start and the last annotated transcription end [129] We used kallisto [63] with the options “--single -s 300 -l 100 --rf-stranded” to align RNA-seq data from total cellular RNA, as well as cytoplasmic and chromatin-associated fractions. We defined as expressed those transcripts whose median TPM expression value from 8 independent replicates of total RNA-seq was > 0.0625 ([15]). We defined as chromatin-associated any expressed transcript for which [chromatin_TPM]/([chromatin_TPM] +[cytoplasmic_TPM]) was > 0.75. We excluded any transcript whose length was less than 500 nt. Total and fractionated data from TSCs are from [15]; total and fractionated data from ESCs are from [15,130].

To calculate RIP signal over each expressed transcript and generate the ranking table shown in Fig 2A, we selected only the subset of reads from each RIP-seq and IgG dataset that aligned under the peaks defined for each protein target of RIP. To do this for a given protein target, RIP-seq and IgG reads were aligned to mm10 using STAR and filtered for MAPQ ≥30 using SAMtools [118,119]. Samtools view was used to split alignments by strand. For each stranded alignment file, we used samtools view to select the subset of RIP-seq and IgG reads that aligned under each peak. Still using samtools view, we converted bam alignments back into fastq format. Fastq data were aligned with kallisto to the same GENCODE vM25 basic transcriptome used above, which also contained one representative nascent transcript for each gene, using the options [-l 200 -s 50 --rf-stranded] [63]. For each transcript, TPM counts from the IgG datasets were subtracted from TPM counts in RIP-seq datasets, and these IgG-corrected values were used to generate transcript rankings shown in Fig 2. The analogous approach was used to calculate signal and non-specific control data in Fig 1, using the corresponding RIP/CLIP/CLAP datasets and their non-specific controls.

RIP-seq network analyses

Bedtools multicov was used to calculate replicate-averaged, RPM-normalized read densities in 25 nt bins across the set of 19295 chromatin-associated transcripts in TSCs for each RIP-seq protein target along with IgG controls [122]. Signal in IgG was subtracted from signal in RIP; negative values were set to 0. Pairwise Pearson’s correlation values for all 27 proteins were calculated between all transcript rows in the resulting matrix. Pearson’s correlations were assigned the value “NA” if they involved signal values of “zero” across all bins in a given transcript. For transcripts that had non-zero signal values in at least one RIP, NA values were replaced with “0” values. Seventy-two transcripts had NA values for all positions across all RIPs and were removed from subsequent network comparisons. To assign RIP protein targets to communities within each transcript’s association network, we required that all RIPs had non-zero values in at least one 25 nt bin; this requirement removed 5464 transcripts and enabled our subsequent calculation of p values ascribing likelihood of the prevalence and rarity of edges between protein pairs across the chromatin-associated TSC transcriptome. Communities were assigned using the Leiden algorithm, using only non-negative edge weights/Pearson’s r values in the adjacency matrix. To make the plot in Fig 3D, panel i, we calculated the distribution of edge weights (Pearson’s r values) between all nodes within communities and across different communities. We calculated the modularity of each transcript’s network using the function modularity from the R package igraph [131] and calculated silhouette width using the function silhouette in the R package cluster [132].

To determine whether protein pairs present within the same communities represented prevalent or rare events among the chromatin-associated TSC transcriptome we analyzed networks from the 13831 chromatin-associated transcripts that had at least one non-zero bin value in each RIP. Within these networks, the 27 proteins profiled by RIP-seq could be present in 351 possible pairs (27*26/2). If the network structure is fixed, and node IDs are shuffled to simulate randomization, for each network, the probability that any two proteins fall in the same community (the intracommunity probability) can be estimated as the sum of intracommunity edges divided by the total number of edges in that specific network. Because each network has a different intracommunity probability, (defined below as “prob”), we used an application of the Poisson Binomial Distribution and the cumulative distribution function in the poibin R package to determine if an intracommunity protein pairs that appears x number of times among the 13831 networks were rare [ppoibin (x, prob)] or prevalent [1 - ppoibin(x - 1, prob)] [133]. Due to numerical instability/ rounding errors in computing the Poisson binomial cumulative distribution function, p values corresponding to values of less than 1.0e-12 were set to <1.0e-12, before correcting for multiple testing using the Benjamini-Hochberg method.

ChIP-seq analysis

Although we spiked chromatin from human 293T cells into TSC samples prior to performing ChIPs, we elected to not utilize a spike-in normalization strategy [134]. We came to recognize that using total protein concentration to standardize across samples was potentially flawed; protein depletions may alter overall protein content per cell, meaning that spike-in amounts calculated from protein concentration may no longer accurately reflect chromatin input, and have the potential to distort cross-sample comparisons. Still, because human extracts were mixed with mouse extracts, we needed to select for reads that uniquely aligned to the mouse genome only. Hence, ChIP-seq and input reads were aligned to a custom genome consisting of concatenated mm10 and hg38 genomes, and independently, also aligned to a concatenated genomes consisting of a version of mm10 that had been modified to incorporate CAST single-nucleotide polymorphisms (SNPs) downloaded from the Sanger Mouse Genomes Project on 7/30/2020 [135] as well as the hg38 genome. Alignments were performed using Bowtie2 with default parameters [136]. Aligned reads that had a MAPQ greater than or equal to 30 were extracted with SAMtools [119]. Reads that aligned to uniquely to the mouse genome were retained. Allele-specific read retention (i.e., reads that overlap at least one B6 or CAST SNP) was performed as in [54,55] using a custom Perl script (intersect_reads_snps18.pl). For tiling density plots, B6- or CAST-specific reads were summed in 10 kb using a custom perl script (ase_analyzer8_hDbed.pl). Binned counts of the ChIP were rpm normalized then divided by the number of B6/CAST SNPs detected in the bin genomic coordinates (i.e., SNP-norm RPM). Finally, bins were averaged every 9 bins in 1 bin increments. For tiling plots with multiple replicates, bins were then averaged among samples. For box plots showing difference in B6 and CAST alleles, bins were averaged every 9 bins in 1bin increments for each allele and CAST reads were subtracted from B6 reads in each bin. The B6- minus CAST value for bins within the lncRNA target regions identified in [56] were then plotted as box plots for individual replicates. Statistical significance of differences for each replicate between the (-) and (+) dox conditions was determined using one-sided Student’s t-tests. All plots were generated using ggplot2 [137] in RStudio.

Allelic RNA-seq analysis

Allelic RNA-seq analysis was performed essentially as described [15]. RNA-seq reads were aligned to mm10, and independently, aligned to a version of mm10 [138] that had been modified to incorporate CAST single-nucleotide polymorphisms (SNPs) downloaded from the Sanger Mouse Genomes Project on 7/30/2020 [135] Alignment was performed with STAR (v2.7.9a; [118]), using multi-sample two-pass mapping and the option “--outFilterMultimapNmax 1” to consider only uniquely mapping reads. Using a custom Perl script (intersect_reads_snps16.pl) aligned reads were parsed to identify reads clearly originating from either the B6 or CAST allele (i.e., reads that overlap at least one B6 or CAST SNP). Reads marked as either B6 or CAST were then assigned to genes using a custom Perl script (ase_analyzer10.pl) and the GTF file gencode.vM25.basic.annotation.gtf [64], The ratio of B6/(B6 + CAST) reads was calculated for each gene from each sample. The Airn- and Kcnq1ot1-target genes analyzed in this study were reported in of [4]. Lists of different X-linked genes analyzed in this study (inactivated, weak escapees, strong escapees) were defined as follows: we required that each gene was represented by an average of at least 5 (B6 + CAST) reads in each HNRNPK and HNRNPU (-) and (+) dox replicate (sixteen total replicates). From this list, B6/(B6 + CAST) ratio were calculated for genes in the (-) dox replicates from HNRNPK and HNRNPU and average for each gene. Genes whose average, B6/(B6 + CAST) ratios were < 0.01 were defined as “inactivated”, those whose ratios were ≥ 0.01 but < 0.10 were defined as “weak escapees”, and those whose ratios were ≥ 0.1 were defined as “strong escapees”. Statistical significance of differences for each gene class between the (-) and (+) dox conditions were determined using average, within-genotype B6/(B6 + CAST) ratios and two-tailed Student’s t-tests.

Quantitation of Airn, Kcnq1ot1, and Xist abundance by RNA-seq

For each sample, RNA-seq reads from each replicate were aligned with STAR (v2.7.9a; [118]) to the mouse GRCm38 (mm10; [138]) genome with reference to GTF file gencode.vM25.basic.annotation.gtf [64], using multi-sample two-pass alignment that incorporated novel splice junctions discovered among all samples in a given experiment. STAR was run with the option “--outFilterMultimapNmax 1” to consider only uniquely mapped reads. For each sample, reads mapping to Airn, Kcnq1ot1, and Xist were counted using featureCounts (Subread v2.0.4; [121]) and RPM normalized using the total number of reads uniquely mapped by STAR to the mm10 genome.

RNA FISH quantification

RNA FISH signal was quantified in FIJI (version 2.16.0/1.54p) as follows. First, channels were split and maximum z-projections were generated. Then, a threshold was set on the (– dox) projection and applied identically to the paired (+ dox) image to create binary masks of RNA FISH foci (presence/absence as shown in S8A Fig). For each image, with only the DAPI channel visible, identifiable individual nuclei were manually outlined using the FIJI selection tool. These outlines were then saved as regions of interest (ROIs) as an overlay (example shown in S9A Fig, right). Then, the overlay was applied to the corresponding binary RNA FISH image and as each nucleus was counted and labeled, the number of discrete foci (0, 1, or ≥ 2) for each nucleus was recorded. At least 3 fields per condition ((-) and (+) dox) were analyzed among two biological replicates, fixed and imaged on separate days, for a total of > 200 nuclei per condition. The fraction of nuclei in each foci category (0, 1, ≥ 2) was plotted in GraphPad Prism and statistical significance assessed by chi square test.

Half-life analysis

RNA half-lives were measured using a timecourse of flavopiridol (Alvocidib) (Selleck chemicals S1230). MEF-depleted Cas9/sgRNA-expressing TSCs were induced for 3 days with 1000 ng/mL doxycycline prior to start of the flavopiridol time course, for a total of 4 days. TSCs were then treated with 1 uM flavopiridol (24 μl of 50 ug/ml flavopiridol in DMSO added to 3 mL of media) for 30 minutes, 1 hour, 3 hours, 8 hours, (or without flavopiridol, “0 h”) and lysed with TRIzol. To measure levels of Airn, Kcnq1ot1, Xist, and Gapdh at each time point, RNA was extracted and subjected to RT-qPCR as above, with the following exceptions: cDNA was diluted 1:20 in nuclease-free water and qPCR was carried out with 4 µL of diluted cDNA and 6 µL of master mix, composed of 5 µL iTaq Universal SYBR Green Supermix (Bio-Rad 1725124) and 1 µL of a 5 µM primer mix per well (see S6 Table for oligonucleotide sequences). RNA levels were normalized to Gapdh at each time point and calculated as the percentage of RNA relative to the 0 h time point. For (-) dox HNRNPU depletion cells, only one biological replicate was used, as the 0 h sample was compromised. To estimate half-life, Gapdh-normalized data for each replicate were averaged and fit to a non-linear one-phase decay model in GraphPad Prism using the equation Y=(Y0) *exp(-K*X). Error bars represent the standard deviation between biological replicates.

Data deposition

RIP-, ChIP-, and RNA-seq data from this study have been deposited in GEO, under the accession numbers: GSE299584, GSE299585, GSE299587.

Code has been posted in GitHub:

https://github.com/CalabreseLab/AKX_HNRNPU_manuscript.

Data in Figures have been deposited into Mendeley:

https://data.mendeley.com/datasets/3b3xys2743/2

Supporting information

S1 Fig. Motifs for RNA-binding proteins derived from RIP-seq in this study versus motifs derived from prior studies (in vitro binding assays, CLIP, or knockout validated RIP-seq).

Left-hand panels, motifs reported in CISBP-RNA [139], mCROSS [140], YTHDC1 CLIP [93], or in the case of SAFB and SPEN, motifs derived from studies which used the same antibodies or antisera in this study and derived motifs by comparing RIP-seq from wild-type cells versus cells in which the denoted protein was knocked out [57,58]. Prior CLIP-seq performed to map the RNA targets bound by ALYREF, NXF1, and NUDT21 (a.k.a. CPSF5/CFIm25) failed to identify strong consensus motifs [141,142]. Right-hand panels, up to the top 5 motifs derived from RIP-seq experiments in this study. Tomtom was used to compare previously identified motifs to those identified in this study [124]; motifs on the right are marked by the number of the motif on the left with which they share significant similarity. Numbers on the right that are colored in grey signify a p-value of motif similarity of <0.05, and numbers colored in black signify a q-value of <0.05. The p-value of similarity between MATR3 CisBP-RNA motif #1 and RIP motif #5 was 0.057. RING1B, RYBP, and SUPT16H were not included in these analyses.

https://doi.org/10.1371/journal.pgen.1012215.s001

(TIF)

S2 Fig. Western blots examining specificity of RIP antibodies and antisera raised against RIP-target proteins that lack previously reported consensus RNA interaction motifs or whose RIP-defined motifs lacked significant similarity to CLIP- or in vitro-defined motifs.

CIZ1 was only detectable by IP-Western; we used the CIZ1 antibody from Novus for RIP-seq in this study. S.C., CIZ1 antibody from Santa Cruz Biotechnology (sc-393021). IP western was performing using RIP-seq washes as described.

https://doi.org/10.1371/journal.pgen.1012215.s002

(TIF)

S3 Fig. Wiggle density tracks of RIP-seq data over Airn (A), Kcnq1ot1 (B), and Xist (C) genes. y-axes, RPM per 50 nt bin.

https://doi.org/10.1371/journal.pgen.1012215.s003

(TIF)

S4 Fig. Overview of peak definition and reassociation analyses in RIP, CLIP, and CLAP comparisons.

(A,B) Schematics demonstrating peak calling, signal over control, and reassociation strategies for CLIP and CLAP data from Guo and colleagues (A) [37] versus RIP experiments performed in this study (B). In all CLIP and CLAP samples: human 293T and mouse ESCs were mixed in 1:1 ratios. Peaks (shown as thick horizontal bars drawn above enriched regions) were called by comparing signal from the tag-expressing and tag-lacking genomes of the same species. For example, human peaks are called by comparing +Tag samples (which express the tagged RBP in human cells) to -Tag samples (which express the tagged RBP in mouse cells), while mouse peaks are called by comparing -Tag (which express the tagged RBP in mouse cells) samples to +Tag samples (which express the tagged RBP in human cells). Signal was assigned from the tag-expressing genome and non-specific signal was assigned from the tag-lacking genome of the same species. Reassociation analyses were conducted by calculating RPM signal under each peak in the tag-expressing species counting unique alignments to the tag-expressing genome, and separately, calculated RPM signal under each peak in the tag-lacking species using alignments to the tag-lacking genome and counting unique alignments to the tag-lacking genome. For example, in the mouse Xist HNRNPU CLAP reassociation analysis in Fig 1H, CLAP RPM signal under mouse peaks from the mouse-tagged-expressing ESCs (-Tag, third row, lefthand side of Fig 1I) was compared to CLAP RPM signal under human peaks in the tag-lacking human 239T cells that were mixed with the mouse-tagged-expressing ESCs (-Tag, third row, righthand side of Fig 1I). In FLAG RIP samples: FLAG-tag-expressing mouse ESCs were mixed with tag-lacking human 293T cells in 1:1 ratios. Peaks in mouse were called by comparing signal from the FLAG-RBP and FLAG-GFP samples. Peaks in human were called by comparing signal from RBP RIPs using the antibodies raised against endogenous RBPs versus IgG control. Reassociation analyses were conducted by comparing RPM signal under peaks assigned from the tag-expressing mouse genome to RPM signal under peaks assigned from the tag-lacking human genome. For example, in the mouse Xist HNRNPU RIP analysis in Fig 1H, RIP RPM signal under mouse peaks from the mouse-tagged-expressing ESCs (tenth row, lefthand side of Fig 1I) was compared to RIP RPM signal under human peaks in the tag-lacking human 239T cells that were mixed with the mouse-tagged-expressing ESCs (tenth row, righthand side of Fig 1I). (C) Schematic representing our approach to calculate IP_RPM_under_peak and Control_RPM_under_peak. IP_RPM_under_peak = (RPM signal under peak from the IP minus RPM signal under peak from the non-specific control). (D) Wiggle density profiles of RIP, CLIP, and CLAP data over Kcnq1ot1/ KCNQ1OT1. Black bars, peaks called in sample. Same format as Xist/ XIST data in Fig 1I.

https://doi.org/10.1371/journal.pgen.1012215.s004

(TIF)

S5 Fig. RIP- and ChIP-seq of density of RING1B and RYBP over Airn, Kcnq1ot1, and Xist in TSCs.

Wiggle density profiles of RIP, and ChIP data over Xist, Airn, and Kcnq1ot1 regions. Strand of interest (-) or (+) for RIP-seq and lncRNA in pink, antisense strand in grey. RPM, reads per million uniquely aligned reads per 50nt bin. Black bar above the Airn gene diagram marks the location of a co-localized RIP-seq peak for RING1B and RYBP whose genomic coordinates are proximal to but not overlapping in location to the ChIP peak for the same two proteins in TSCs. Maximum RPM values displayed in different rows were set to enable visualization of relevant trends. Intensity can be compared across rows as a way to gauge the relative levels of enrichment for each factor over the genomic intervals being displayed.

https://doi.org/10.1371/journal.pgen.1012215.s005

(TIF)

S6 Fig. H2AK119ub and H3K27me3 ChIP-seq over Xist, Airn, and Kcnq1ot1 target domains in TSCs expressing non-targeting (NTG) control sgRNA.

RPM-normalized H2AK119ub (A) and H3K27me3 (B) ChIP-seq data from TSCs expressing non-targeting sgRNAs shown over Xist, Airn, and Kcnq1ot1 loci. H2AK119ub and H3K27me3 levels are shown in panels (i, iii, v) and (ii, iv, vi), respectively, over the Xist (i, ii), Airn (iii, iv), and Kcnq1ot1 (v, vi) target domains, on B6 and CAST alleles with and without Cas9 induction by doxycycline treatment. Tiling density plots and box-and-whisker plots of spike-in normalized H2AK119ub and H3K27me3 ChIP-seq signal per 10 kb bin are displayed on the left and right, respectively, in both (A) and (B). Δ values above box plots show the percent fold change of median B6 minus CAST values between (-) dox and (+) dox within each lncRNA’s target domain. *, **, ****: p ≤ 0.05, 0.01, and 0.0001, respectively; Student’s t-test.

https://doi.org/10.1371/journal.pgen.1012215.s006

(TIF)

S7 Fig. H2AK119ub and H3K27me3 over Xist target domain in TSCs expressing HNRNPK- and HNRNPU-targeting sgRNAs.

H2AK119ub and H3K27me3 ChIP-seq data in TSCs expressing HNRNPK-targeting (A) and HNRNPU-targeting (B) sgRNAs shown over Xist target domain. H2AK119ub and H3K27me3 levels are shown in panels (i) and (ii), respectively, on B6 and CAST alleles before and after four days of Cas9 induction with dox. Tiling density plots and box-and-whisker plots of spike-in normalized H2AK119ub and H3K27me3 ChIP-seq signal per 10 kb bin are displayed on the left and right, respectively, in (A-D). Tiling density plots represent the data averaged between the two different sgRNA-expressing populations, and box-and-whisker plots show [B6 - CAST] values for each sgRNA experiment. Δ values above box plots show the percent fold change of median B6 minus CAST values between (-) dox and (+) dox for each replicate over the X chromosome. *, **, ****: p ≤ 0.05, 0.01, and 0.0001, respectively; Student’s t-test.

https://doi.org/10.1371/journal.pgen.1012215.s007

(TIF)

S8 Fig. H3K27me3 ChIP-seq over Xist, Airn, and Kcnq1ot1 target domains in TSCs expressing XRN2- and YTHDC1-targeting sgRNAs.

(A, D) Western blot of XRN2 and histone H3 in (-) and (+) dox conditions, in TSCs expressing Xrn2-targeting (A) and Ythdc1-targeting (D) sgRNAs. (B, E) RPM normalized H3K27me3 ChIP-seq data in TSCs expressing Xrn2-targeting (B) and Ythdc1-targeting (E) sgRNAs, shown over Xist, Airn, and Kcnq1ot1 target domains. H3K27me3 levels are shown on B6 and CAST alleles in (-) and (+) dox conditions (four days of Cas9 induction). Tiling density plots and box-and-whisker plots of RPM normalized H3K27me3 ChIP-seq signal per 10 kb bin are displayed on the left and right, respectively, in both (B) and (E). Δ values above box plots show the percent fold change of median B6 minus CAST values between (-) dox and (+) dox in each lncRNA’s target domain. *, **, ****: p ≤ 0.05, 0.01, and 0.0001, respectively; Student’s t-test. Data shown are from a single experiment. (C, F) Wiggle density plots showing RNA-seq data over Xist, Airn, and Kcnq1ot1 in (-) and (+) dox conditions in TSCs expressing Xrn2-targeting (C) and Ythdc1-targeting (F) sgRNAs.

https://doi.org/10.1371/journal.pgen.1012215.s008

(TIF)

S9 Fig. Overview of RNA FISH quantitation strategy and representative images.

(A) Overview of foci counting strategy. (B) Representative Xist (i), Airn (ii), and Kcnq1ot1 (iii) RNA FISH images thresholded by signal in the (-) dox condition (left-hand panels) and (+) dox conditions (right-hand panels).

https://doi.org/10.1371/journal.pgen.1012215.s009

(TIF)

S1 Table. Signal versus non-specific signal and reassociation analyses; RIP, CLIP, and CLAP.

Sheet 1; vet_peaks_27pr_5stats_251007. Column name, column description: Reads_per_rip, total number of sequenced reads in each replicate; Reads_aligned_per_rip_MAPQ>3, number of reads with MAPQ>30; Num_total_peaks, total number of putative peaks identified; Num_peaks>2igg_in>=2rips, final peak number; Corr_rpm_per_rip_in_peaks>2igg_in>=2rip, correlation of read density under peaks by replicate pair; IgGnorm_MoR, normalized MoR values relative to averaged IgG control. Sheet 2; Fig 1H, raw. Column name, column description: Methods, RIP, CLIP, or CLAP; TaggedSpecies, species expressing epitope-tagged protein; mouse or human; RBP, protein that was tagged; exps, feature being evaluated; peak_class, class of peak being evaluated; reps, replicate ID number; count_or_rpkmsum_TaggedSp, signal associated with the tag-expressing genome; count_or_rpkmsum_UntaggedSp, signal associated with the tag-lacking genome that was mixed with the tag-expressing genome; perct, raw percentage of signal associated with the tag-expressing genome; norm_perct normalized, percentage of signal associated with the tag-expressing genome. Sheet 3; hg38CLAP_10xk_rpkm. Column name, column description: ranking, peak ranking based on rank_rpm; Chr, peak chromosome number; Start, peak start genomic coordinates; End, peak end genomic coordinates; Strand, peak strand; Length, peak length in nt; SAFA_PlusTag_CLAP_hg38_HpeakHdata_Rep1_rpm, CLAP PlusTag Rep1 rpm under hg38 peaks (A); SAFA_PlusTag_CLAP_hg38_HpeakHdata_Rep2_rpm, CLAP PlusTag Rep2 rpm under hg38 peaks (B); SAFA_PlusTag_CLAP_hg38_HpeakHdata_meanRep12_rpm, mean of A and B (C); SAFA_PlusTag_CLAP_hg38_HpeakMdata_Control_rpm, CLAP MinusTag rpm under hg38 peaks (D); SAFA_PlusTag_CLAP_hg38_HpeakHdata_Rep1_rpm_lessControl, A – D; SAFA_PlusTag_CLAP_hg38_HpeakHdata_Rep2_rpm_lessControl, B – D; SAFA_PlusTag_CLAP_hg38_HpeakHdata_meanRep12_rpm_lessControl, C - D = E; rank_rpm, E x (E/ C). Sheet 4; mm10CLAP_10xk_rpkm. Column name, column description: ranking, peak ranking based on rank_rpm; Chr, peak chromosome number; Start, peak start genomic coordinates; End, peak end genomic coordinates; Strand, peak strand; Length, peak length in nt; SAFA_MinusTag_CLAP_mm10_MpeakMdata_rpm, CLAP MinusTag rpm under mm10 peaks (A); SAFA_MinusTag_CLAP_mm10_MpeakHdata_Control_rpm, merged CLAP PlusTag rpm under mm10 peaks (D); SAFA_MinusTag_CLAP_mm10_MpeakHdata_Control1_rpm, CLAP PlusTag Rep1 rpm under mm10 peaks; SAFA_MinusTag_CLAP_mm10_MpeakHdata_Control2_rpm, CLAP PlusTag Rep2 rpm under mm10 peaks; SAFA_MinusTag_CLAP_mm10_MpeakMdata_rpm_lessControl, A - D = E;rank_rpm,E x (E/ A). Sheet 5; hg38CLAP10xk_CLIP_rpkm. Column name, column description: ranking, peak ranking based on rank_rpm; Chr, peak chromosome number; Start, peak start genomic coordinates; End, peak end genomic coordinates; Strand, peak strand; Length, peak length in nt; hg38CLAP10k_CLIP_HpeakHdata_Rep1_rpm, CLIP PlusTag Rep1 rpm under hg38 peaks (A); hg38CLAP10k_CLIP_HpeakHdata_Rep2_rpm, CLIP PlusTag Rep2 rpm under hg38 peaks (B); hg38CLAP10k_CLIP_HpeakMdata_Control_rpm, CLIP MinusTag rpm under hg38 peaks (D); hg38CLAP10k_CLIP_HpeakHdata_Rep1_rpm_lessControl, A - D; hg38CLAP10k_CLIP_HpeakHdata_Rep2_rpm_lessControl, B – D. Sheet 5; mm10CLAP10xk_CLIP_rpkm. Column name, column description: ranking, peak ranking based on rank_rpm; Chr, peak chromosome number; Start, peak start genomic coordinates; End, peak end genomic coordinates; Strand, peak strand; Length, peak length in nt; mm10CLAP10k_CLIP_MpeakMdata_rpm, CLIP MinusTag rpm under mm10 peaks (A); mm10CLAP10k_CLIP_MpeakHdata_Control_rpm, merged CLIP PlusTag rpm under mm10 peaks (D); mm10CLAP10k_CLIP_MpeakHdata_Control1_rpm, CLIP PlusTag Rep1 rpm under mm10 peaks; mm10CLAP10k_CLIP_MpeakHdata_Control2_rpm, CLIP PlusTag Rep2 rpm under mm10 peaks; mm10CLAP10k_CLIP_MpeakMdata_rpm_lessControl, A – D. Sheet 6; RIP_hg38peak_HNRNPK_10xk_rpkm. Column name, column description: ranking, peak ranking based on rank_rpm; Chr, peak chromosome number; Start, peak start genomic coordinates; End, peak end genomic coordinates; Strand, peak strand; Length, peak length in nt; h_HNRNPK_HpeakHdata_Rep1_rpm, 293T HNRNPK RIP Rep1 rpm under hg38 peaks (A); h_HNRNPK_HpeakHdata_Rep2_rpm, 293T HNRNPK RIP Rep2 rpm under hg38 peaks (B); h_HNRNPK_HpeakHdata_meanRep12_rpm, mean of A and B (C); h_HNRNPK_HpeakHdata_Control_rpm, 293T IgG RIP rpm under hg38 peaks (D); h_HNRNPK_HpeakHdata_Rep1_rpm_lessControl, A - D; h_HNRNPK_HpeakHdata_Rep2_rpm_lessControl, B - D; h_HNRNPK_HpeakHdata_meanRep12_rpm_lessControl, C - D = E; rank_rpm, E x (E/ C). Sheet 7; RIP_hg38peak_HNRNPU_10xk_rpkm. Column name, column description: ranking, peak ranking based on rank_rpm; Chr, peak chromosome number; Start, peak start genomic coordinates; End, peak end genomic coordinates; Strand, peak strand; Length, peak length in nt; h_HNRNPU_HpeakHdata_Rep1_rpm, 293T HNRNPU RIP Rep1 rpm under hg38 peaks (A); h_HNRNPU_HpeakHdata_Rep2_rpm, 293T HNRNPU RIP Rep2 rpm under hg38 peaks (B); h_HNRNPU_HpeakHdata_meanRep12_rpm, mean of A and B (C); h_HNRNPU_HpeakHdata_Control_rpm, 293T IgG RIP rpm under hg38 peaks (D); h_HNRNPU_HpeakHdata_Rep1_rpm_lessControl, A - D; h_HNRNPU_HpeakHdata_Rep2_rpm_lessControl, B - D; h_HNRNPU_HpeakHdata_meanRep12_rpm_lessControl, C - D = E; rank_rpm, E x (E/ C); hm_HNRNPU_flag_hg38_HpeakMdata_Rep1_rpm, 293T+RMCE HNRNPU Flag Rep1 rpm under hg38 peaks (F); hm_HNRNPU_flag_hg38_HpeakMdata_Rep2_rpm, 293T+RMCE HNRNPU Flag Rep2 rpm under hg38 peaks (G); hm_HNRNPU_flag_hg38_HpeakMdata_Control_rpm, 293T+RMCE GFP Flag rpm under hg38 peaks (H); hm_HNRNPU_flag_hg38_HpeakMdata_Rep1_rpm_lessControl, F - H; hm_HNRNPU_flag_hg38_HpeakMdata_Rep2_rpm_lessControl, G – H. Sheet 8; RIP_mm10_HNRNPK_10xk_rpkm. Column name, column description: ranking, peak ranking based on rank_rpm; Chr, peak chromosome number; Start, peak start genomic coordinates; End, peak end genomic coordinates; Strand, peak strand; Length, peak length in nt; hm_HNRNPK_flag_MpeakMdata_Rep1_rpm, 293T+RMCE HNRNPK Flag Rep1 rpm under mm10 peaks (A); hm_HNRNPK_flag_MpeakMdata_Rep2_rpm, 293T+RMCE HNRNPK Flag Rep2 rpm under mm10 peaks (B); hm_HNRNPK_flag_MpeakMdata_meanRep12_rpm, mean of A and B (C); hm_HNRNPK_flag_MpeakMdata_Control_rpm, 293T+RMCE GFP Flag rpm under mm10 peaks (D); hm_HNRNPK_flag_MpeakMdata_Rep1_rpm_lessControl, A - D; hm_HNRNPK_flag_MpeakMdata_Rep2_rpm_lessControl, B - D; hm_HNRNPK_flag_MpeakMdata_meanRep12_rpm_lessControl, C - D = E; rank_rpm, E x (E/ C). Sheet 9; RIP_mm10_HNRNPU_10xk_rpkm. Column name, column description: ranking, peak ranking based on rank_rpm; Chr, peak chromosome number; Start, peak start genomic coordinates; End, peak end genomic coordinates; Strand, peak strand; Length, peak length in nt; hm_HNRNPU_flag_MpeakMdata_Rep1_rpm, 293T+RMCE HNRNPU Flag Rep1 rpm under mm10 peaks (A); hm_HNRNPU_flag_MpeakMdata_Rep2_rpm, 293T+RMCE HNRNPU Flag Rep2 rpm under mm10 peaks (B); hm_HNRNPU_flag_MpeakMdata_meanRep12_rpm, mean of A and B (C); hm_HNRNPU_flag_MpeakMdata_Control_rpm, 293T+RMCE GFP Flag rpm under mm10 peaks (D); hm_HNRNPU_flag_MpeakMdata_Rep1_rpm_lessControl, A - D; hm_HNRNPU_flag_MpeakMdata_Rep2_rpm_lessControl, B - D; hm_HNRNPU_flag_MpeakMdata_meanRep12_rpm_lessControl, C - D = E; rank_rpm, E x (E/ C).

https://doi.org/10.1371/journal.pgen.1012215.s010

(XLSX)

S2 table. IgG-corrected, signal-under-peak values and p values describing significance of enrichment over non-specific IgG control for all RIPs over all chromatin-enriched transcripts in TSCs.

Column name, column description: gene_ID, unique identifier of chromatin-enriched transcript; length, real length; eff_length, effective length reported by kallisto; [protein_name]_rpm, signal-under-peak values for RIP in question; [protein_name]_over_igg, IgG-corrected signal-under-peak values for RIP in question; [protein_name]_pval, p value describing enrichment of IgG-corrected signal-under-peak values for RIP in question over non-specific IgG.

https://doi.org/10.1371/journal.pgen.1012215.s011

(CSV)

S3 Table. Chromatin-enriched transcript expression and RIP metadata.

Column name, column description: gene, unqiue identifier of chromatin-enriched transcript; genetype, GENCODE biotype of transcript; length, transcript length; totalRNA_tpm, TPM of transcript from total RNA-seq; chrom_tpm, TPM of transcript from chromatin fraction RNA-seq; cyto_tpm, TPM of transcript from cytoplasmic fraction RNA-seq; chrom_enrichment,; Xist_RIP_r, Pearsons’s r value of transcript protein association profile relative to Xist; Airn_RIP_r, Pearsons’s r value of transcript protein association profile relative to Airn; Kcnq1ot1_RIP_r, Pearsons’s r value of transcript protein association profile relative to Kcnq1ot1; Xist_network_r, Pearsons’s r value of transcript protein association network relative to Xist; Airn_network_r, Pearsons’s r value of transcript protein association network relative to Airn; Kcnq1ot1_network_r, Pearsons’s r value of transcript protein association network relative to Kcnq1ot1; total_intra, total number of intracommunity edges in transcript protein association network; rare, number of rare intracommunity edges in transcript protein association network; prev, number of prevalent intracommunity edges in transcript protein association network; rare_ratio, ratio of rare edges; mean_silhouette, mean silhouette value for communities within transcript; modularity, modularity of transcript protein association network.

https://doi.org/10.1371/journal.pgen.1012215.s012

(CSV)

S4 Table. Prevalence of connected protein pairs within communities.

Column name, column description: proset, the pair of proteins being examined; perct, the percent of networks among the set of TSC chromatin-associated transcripts in which the pair of proteins was placed within the same community; count, the number of networks among the set of TSC chromatin-associated transcripts in which the pair of proteins was placed within the same community; prev_pval, the p-value describing the likelihood that the protein pair would have been observed as prevalent by chance; prev_adjp, the prev_pval adjusted by Benjamini-Hochberg; rare_pval, the p-value describing the likelihood that the protein pair would have been observed as rare by chance; rare_adjp, the rare_pval adjusted by Benjamini-Hochberg; Xist, whether the protein pair was detected within a community within the RIP association network of Xist; Airn, whether the protein pair was detected within a community within the RIP association network of Airn; Kcnq1ot1, whether the protein pair was detected within a community within the RIP association network of Kcnq1ot1.

https://doi.org/10.1371/journal.pgen.1012215.s013

(CSV)

S5 Table. Allelic expression values of genes in Airn, Kcnq1ot1, and Xist target domains.

Column name, column description: gene, gene ID; chr, chromosome of gene; start, genomic start of gene; end, genomic end of gene; gene_class, classification of gene relative to Airn, Kcnq1ot1, or Xist target domains; [HK|HU]_[experiment_ID]_[CAST|B6]_count, counts of allele-specific reads by gene by experiment; [HK|HU]_[experiment_ID]_allelic_pct, paternal expression ratio of gene by experiment; pval_HK, p value describing difference between (-) and (+) dox conditions across HNRNPK depletion experiments; pval_HU, p value describing difference between (-) and (+) dox conditions across HNRNPU depletion experiments.

https://doi.org/10.1371/journal.pgen.1012215.s014

(CSV)

S6 Table. Antibodies, antisera, oligonucleotides, and FISH probes used in this study.

Sheet 1; RIP_antibodies_antisera. Column name, column description: Antibody/Antisera, the protein target of the Antibody/Antisera used; Company, from what entity the antibody/antisera was purchased; Product #, catalogue # of the product; ul amount, microliter amount used in assay; Type, species and monoclonal or polyclonal. Sheet 2; Western_antibodies_antisera. Column name, column description: Antibody/Antisera, the protein target of the Antibody/Antisera used; Company, from what entity the antibody/antisera was purchased; Product #, catalogue # of the product; Dilution, dilution used in assay; Type, species and monoclonal or polyclonal. Sheet 3; ChIP_antibodies_antisera. Column name, column description: Antibody/Antisera, the protein target of the Antibody/Antisera used; Company, from what entity the antibody/antisera was purchased; Product #, catalogue # of the product; ul amount, microliter amount used in assay; Type, species and monoclonal or polyclonal. Sheet 4; Oligos. Column name, column description: Oligo ID, internal lab ID of oligo; Description, secondary description from lab; Sequence, oligo sequence; Use, how oligo was used; Location in paper, location in paper. Sheet 5; Kcnq1ot1_RNA_FISH_probe_oligos. Sequence of probes used to detect Kcnq1ot1 by RNA FISH.

https://doi.org/10.1371/journal.pgen.1012215.s015

(XLSX)

Acknowledgments

We thank Daniel Dominguez for critical reading and helpful comments.

References

  1. 1. Mattick JS, Amaral PP, Carninci P, Carpenter S, Chang HY, Chen L-L, et al. Long non-coding RNAs: definitions, functions, challenges and recommendations. Nat Rev Mol Cell Biol. 2023;24(6):430–47. pmid:36596869
  2. 2. Trotman JB, Braceros KCA, Cherney RE, Murvin MM, Calabrese JM. The control of polycomb repressive complexes by long noncoding RNAs. Wiley Interdiscip Rev RNA. 2021;12(6):e1657. pmid:33861025
  3. 3. Pandey RR, Mondal T, Mohammad F, Enroth S, Redrup L, Komorowski J, et al. Kcnq1ot1 antisense noncoding RNA mediates lineage-specific transcriptional silencing through chromatin-level regulation. Mol Cell. 2008;32(2):232–46. pmid:18951091
  4. 4. Schertzer MD, Braceros KCA, Starmer J, Cherney RE, Lee DM, Salazar G, et al. lncRNA-Induced Spread of Polycomb Controlled by Genome Architecture, RNA Abundance, and CpG Island DNA. Mol Cell. 2019;75(3):523-537.e10. pmid:31256989
  5. 5. Lewis A, Green K, Dawson C, Redrup L, Huynh KD, Lee JT, et al. Epigenetic dynamics of the Kcnq1 imprinted domain in the early embryo. Development. 2006;133(21):4203–10. pmid:17021040
  6. 6. Umlauf D, Goto Y, Cao R, Cerqueira F, Wagschal A, Zhang Y, et al. Imprinting along the Kcnq1 domain on mouse chromosome 7 involves repressive histone methylation and recruitment of Polycomb group complexes. Nat Genet. 2004;36(12):1296–300. pmid:15516932
  7. 7. Andergassen D, Dotter CP, Wenzel D, Sigl V, Bammer PC, Muckenhuber M, et al. Mapping the mouse Allelome reveals tissue-specific regulation of allelic expression. Elife. 2017;6:e25125. pmid:28806168
  8. 8. Andergassen D, Muckenhuber M, Bammer PC, Kulinski TM, Theussl H-C, Shimizu T, et al. The Airn lncRNA does not require any DNA elements within its locus to silence distant imprinted genes. PLoS Genet. 2019;15(7):e1008268. pmid:31329595
  9. 9. Lewis A, Mitsuya K, Umlauf D, Smith P, Dean W, Walter J, et al. Imprinting on distal chromosome 7 in the placenta involves repressive histone methylation independent of DNA methylation. Nat Genet. 2004;36(12):1291–5. pmid:15516931
  10. 10. Nagano T, Mitchell JA, Sanz LA, Pauler FM, Ferguson-Smith AC, Feil R, et al. The Air noncoding RNA epigenetically silences transcription by targeting G9a to chromatin. Science. 2008;322(5908):1717–20. pmid:18988810
  11. 11. Mitsuya K, Meguro M, Lee MP, Katoh M, Schulz TC, Kugoh H, et al. LIT1, an imprinted antisense RNA in the human KvLQT1 locus identified by screening for differentially expressed transcripts using monochromosomal hybrids. Hum Mol Genet. 1999;8(7):1209–17. pmid:10369866
  12. 12. Smilinich NJ, Day CD, Fitzpatrick GV, Caldwell GM, Lossie AC, Cooper PR, et al. A maternally methylated CpG island in KvLQT1 is associated with an antisense paternal transcript and loss of imprinting in Beckwith-Wiedemann syndrome. Proc Natl Acad Sci U S A. 1999;96(14):8064–9. pmid:10393948
  13. 13. Mancini-Dinardo D, Steele SJS, Levorse JM, Ingram RS, Tilghman SM. Elongation of the Kcnq1ot1 transcript is required for genomic imprinting of neighboring genes. Genes Dev. 2006;20(10):1268–82. pmid:16702402
  14. 14. Seidl CIM, Stricker SH, Barlow DP. The imprinted Air ncRNA is an atypical RNAPII transcript that evades splicing and escapes nuclear export. EMBO J. 2006;25(15):3565–75. pmid:16874305
  15. 15. Trotman JB, Abrash EW, Murvin MM, Braceros AK, Li S, Boyson SP, et al. Isogenic comparison of Airn and Xist reveals core principles of Polycomb recruitment by lncRNAs. Mol Cell. 2025;85(6):1117-1133.e14. pmid:40118040
  16. 16. Braceros AK, Schertzer MD, Omer A, Trotman JB, Davis ES, Dowen JM, et al. Proximity-dependent recruitment of Polycomb repressive complexes by the lncRNA Airn. Cell Rep. 2023;42(7):112803. pmid:37436897
  17. 17. Terranova R, Yokobayashi S, Stadler MB, Otte AP, van Lohuizen M, Orkin SH, et al. Polycomb group proteins Ezh2 and Rnf2 direct genomic contraction and imprinted repression in early mouse embryos. Dev Cell. 2008;15(5):668–79. pmid:18848501
  18. 18. Sachani SS, Landschoot LS, Zhang L, White CR, MacDonald WA, Golding MC, et al. Nucleoporin 107, 62 and 153 mediate Kcnq1ot1 imprinted domain regulation in extraembryonic endoderm stem cells. Nat Commun. 2018;9(1):2795. pmid:30022050
  19. 19. Andergassen D, Smith ZD, Kretzmer H, Rinn JL, Meissner A. Diverse epigenetic mechanisms maintain parental imprints within the embryonic and extraembryonic lineages. Dev Cell. 2021;56(21):2995-3005.e4. pmid:34752748
  20. 20. Mager J, Montgomery ND, de Villena FP-M, Magnuson T. Genome imprinting regulated by the mouse Polycomb group protein Eed. Nat Genet. 2003;33(4):502–7. pmid:12627233
  21. 21. Wagschal A, Sutherland HG, Woodfine K, Henckel A, Chebli K, Schulz R, et al. G9a histone methyltransferase contributes to imprinting in the mouse placenta. Mol Cell Biol. 2008;28(3):1104–13. pmid:18039842
  22. 22. Szanto A, Aguilar R, Kesner B, Blum R, Wang D, Cifuentes-Rojas C, et al. A disproportionate impact of G9a methyltransferase deficiency on the X chromosome. Genes Dev. 2021;35(13–14):1035–54. pmid:34168040
  23. 23. Masui O, Corbel C, Nagao K, Endo TA, Kezuka F, Diabangouaya P, et al. Polycomb repressive complexes 1 and 2 are each essential for maintenance of X inactivation in extra-embryonic lineages. Nat Cell Biol. 2023;25(1):134–44. pmid:36635505
  24. 24. Żylicz JJ, Bousard A, Žumer K, Dossin F, Mohammad E, da Rocha ST, et al. The Implication of Early Chromatin Changes in X Chromosome Inactivation. Cell. 2019;176(1–2):182-197.e23. pmid:30595450
  25. 25. Almeida M, Pintacuda G, Masui O, Koseki Y, Gdula M, Cerase A, et al. PCGF3/5-PRC1 initiates Polycomb recruitment in X chromosome inactivation. Science. 2017;356(6342):1081–4. pmid:28596365
  26. 26. Quinodoz SA, Jachowicz JW, Bhat P, Ollikainen N, Banerjee AK, Goronzy IN, et al. RNA promotes the formation of spatial compartments in the nucleus. Cell. 2021;184(23):5775-5790.e30. pmid:34739832
  27. 27. Weidmann CA, Mustoe AM, Jariwala PB, Calabrese JM, Weeks KM. Analysis of RNA-protein networks with RNP-MaP defines functional hubs on RNA. Nat Biotechnol. 2021;39(3):347–56. pmid:33077962
  28. 28. Smola MJ, Christy TW, Inoue K, Nicholson CO, Friedersdorf M, Keene JD, et al. SHAPE reveals transcript-wide interactions, complex structural domains, and protein interactions across the Xist lncRNA in living cells. Proc Natl Acad Sci U S A. 2016;113(37):10322–7. pmid:27578869
  29. 29. McHugh CA, Chen C-K, Chow A, Surka CF, Tran C, McDonel P, et al. The Xist lncRNA interacts directly with SHARP to silence transcription through HDAC3. Nature. 2015;521(7551):232–6. pmid:25915022
  30. 30. Chu C, Zhang QC, da Rocha ST, Flynn RA, Bharadwaj M, Calabrese JM, et al. Systematic discovery of Xist RNA binding proteins. Cell. 2015;161(2):404–16. pmid:25843628
  31. 31. Minajigi A, Froberg J, Wei C, Sunwoo H, Kesner B, Colognori D, et al. Chromosomes. A comprehensive Xist interactome reveals cohesin repulsion and an RNA-directed chromosome conformation. Science. 2015;349(6245):10.1126/science.aab2276 aab2276. pmid:26089354
  32. 32. Tsue AF, Kania EE, Lei DQ, Fields R, McGann CD, Marciniak DM, et al. Multiomic characterization of RNA microenvironments by oligonucleotide-mediated proximity-interactome mapping. Nat Methods. 2024;21(11):2058–71. pmid:39468212
  33. 33. Yu B, Qi Y, Li R, Shi Q, Satpathy AT, Chang HY. B cell-specific XIST complex enforces X-inactivation and restrains atypical B cells. Cell. 2021;184:1790-1803 e1717.
  34. 34. Bousard A, Raposo AC, Żylicz JJ, Picard C, Pires VB, Qi Y, et al. The role of Xist-mediated Polycomb recruitment in the initiation of X-chromosome inactivation. EMBO Rep. 2019;20(10):e48019. pmid:31456285
  35. 35. Graindorge A, Pinheiro I, Nawrocka A, Mallory AC, Tsvetkov P, Gil N, et al. In-cell identification and measurement of RNA-protein interactions. Nat Commun. 2019;10(1):5317. pmid:31757954
  36. 36. Pintacuda G, Wei G, Roustan C, Kirmizitas BA, Solcan N, Cerase A, et al. hnRNPK Recruits PCGF3/5-PRC1 to the Xist RNA B-Repeat to Establish Polycomb-Mediated Chromosomal Silencing. Mol Cell. 2017;68(5):955-969.e10. pmid:29220657
  37. 37. Guo JK, Blanco MR, Walkup WG, Bonesteele G, Urbinati CR, Banerjee AK, et al. Denaturing purifications demonstrate that PRC2 and other widely reported chromatin proteins do not appear to bind directly to RNA in vivo. Mol Cell 2024;84:1271–89 e1212.
  38. 38. Cirillo D, Blanco M, Armaos A, Buness A, Avner P, Guttman M, et al. Quantitative predictions of protein interactions with long noncoding RNAs. Nat Methods. 2016;14(1):5–6. pmid:28032625
  39. 39. Lu Z, Guo JK, Wei Y, Dou DR, Zarnegar B, Ma Q, et al. Structural modularity of the XIST ribonucleoprotein complex. Nat Commun. 2020;11(1):6163. pmid:33268787
  40. 40. Van Nostrand EL, Pratt GA, Shishkin AA, Gelboin-Burkhart C, Fang MY, Sundararaman B, et al. Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat Methods. 2016;13(6):508–14. pmid:27018577
  41. 41. Bramswig NC, Lüdecke H-J, Hamdan FF, Altmüller J, Beleggia F, Elcioglu NH, et al. Heterozygous HNRNPU variants cause early onset epilepsy and severe intellectual disability. Hum Genet. 2017;136(7):821–34. pmid:28393272
  42. 42. Gillentine MA, Wang T, Hoekzema K, Rosenfeld J, Liu P, Guo H, et al. Rare deleterious mutations of HNRNP genes result in shared neurodevelopmental disorders. Genome Med. 2021;13(1):63. pmid:33874999
  43. 43. Wang T, Hoekzema K, Vecchio D, Wu H, Sulovari A, Coe BP, et al. Author Correction: Large-scale targeted sequencing identifies risk genes for neurodevelopmental disorders. Nat Commun. 2020;11(1):5398. pmid:33087701
  44. 44. Yates TM, Vasudevan PC, Chandler KE, Donnelly DE, Stark Z, Sadedin S, et al. De novo mutations in HNRNPU result in a neurodevelopmental syndrome. Am J Med Genet A. 2017;173(11):3003–12. pmid:28944577
  45. 45. Lim ET, Uddin M, De Rubeis S, Chan Y, Kamumbu AS, Zhang X, et al. Rates, distribution and implications of postzygotic mosaic mutations in autism spectrum disorder. Nat Neurosci. 2017;20(9):1217–24. pmid:28714951
  46. 46. Trotman JB, Li S, Eberhard QE, Zhang Z, Calabrese JM. Protocol for evaluating RNA-protein associations in mammalian cells with RIP-seq and RIP-qPCR. STAR Protoc. 2026;7(1):104298. pmid:41455104
  47. 47. Hein MY, Hubner NC, Poser I, Cox J, Nagaraj N, Toyoda Y, et al. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell. 2015;163(3):712–23. pmid:26496610
  48. 48. Cho NH, Cheveralls KC, Brunner A-D, Kim K, Michaelis AC, Raghavan P, et al. OpenCell: Endogenous tagging for the cartography of human cellular organization. Science. 2022;375(6585):eabi6983. pmid:35271311
  49. 49. Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 2011;21(9):1543–51. pmid:21816910
  50. 50. Mardakheh FK, Shechner DM. A molecular cartographer’s toolkit for mapping RNA’s uncharted realms. Cell Rep. 2025;44(7):115877. pmid:40540402
  51. 51. Markaki Y, Gan Chong J, Wang Y, Jacobson EC, Luong C, Tan SYX, et al. Xist nucleates local protein gradients to propagate silencing across the X chromosome. Cell. 2021;184(25):6174-6192.e32. pmid:34813726
  52. 52. Jachowicz JW, Strehle M, Banerjee AK, Blanco MR, Thai J, Guttman M. Xist spatially amplifies SHARP/SPEN recruitment to balance chromosome-wide silencing and specificity to the X chromosome. Nat Struct Mol Biol. 2022;29(3):239–49. pmid:35301492
  53. 53. Solomon MJ, Varshavsky A. Formaldehyde-mediated DNA-protein crosslinking: a probe for in vivo chromatin structures. Proc Natl Acad Sci U S A. 1985;82(19):6470–4. pmid:2995966
  54. 54. Calabrese JM, Sun W, Song L, Mugford JW, Williams L, Yee D, et al. Site-specific silencing of regulatory elements as a mechanism of X inactivation. Cell. 2012;151(5):951–63. pmid:23178118
  55. 55. Calabrese JM, Starmer J, Schertzer MD, Yee D, Magnuson T. A survey of imprinted gene expression in mouse trophoblast stem cells. G3 (Bethesda). 2015;5(5):751–9. pmid:25711832
  56. 56. Schertzer MD, Thulson E, Braceros KCA, Lee DM, Hinkle ER, Murphy RM, et al. A piggyBac-based toolkit for inducible genome editing in mammalian cells. RNA. 2019;25(8):1047–58. pmid:31101683
  57. 57. Trotman JB, Porrello A, Schactler SA, DeLeon LE, Eberhard QE, Boyson SP, et al. Xist Repeat A coordinates an assembly of SR proteins to recruit SPEN and induce gene silencing. bioRxiv. 2025.
  58. 58. Cherney RE, Eberhard QE, Giri G, Mills CA, Porrello A, Zhang Z, et al. SAFB associates with nascent RNAs and can promote gene expression in mouse embryonic stem cells. RNA. 2023;29(10):1535–56. pmid:37468167
  59. 59. Cherney RE, Mills CA, Herring LE, Braceros AK, Calabrese JM. A monoclonal antibody raised against human EZH2 cross-reacts with the RNA-binding protein SAFB. Biol Open. 2023;12(6):bio059955. pmid:37283223
  60. 60. Mili S, Steitz JA. Evidence for reassociation of RNA-binding proteins after cell lysis: implications for the interpretation of immunoprecipitation analyses. RNA. 2004;10(11):1692–4. pmid:15388877
  61. 61. Trotman JB, Lee DM, Cherney RE, Kim SO, Inoue K, Schertzer MD, et al. Elements at the 5’ end of Xist harbor SPEN-independent transcriptional antiterminator activity. Nucleic Acids Res. 2020;48(18):10500–17. pmid:32986830
  62. 62. Van Nostrand EL, Freese P, Pratt GA, Wang X, Wei X, Xiao R, et al. A large-scale binding and functional map of human RNA-binding proteins. Nature. 2020;583(7818):711–9. pmid:32728246
  63. 63. Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34(5):525–7. pmid:27043002
  64. 64. Mudge JM, Carbonell-Sala S, Diekhans M, Martinez JG, Hunt T, Jungreis I, et al. GENCODE 2025: reference gene annotation for human and mouse. Nucleic Acids Res. 2025;53(D1):D966–75. pmid:39565199
  65. 65. Fei J, Jadaliha M, Harmon TS, Li ITS, Hua B, Hao Q, et al. Quantitative analysis of multilayer organization of proteins and RNA in nuclear speckles at super resolution. J Cell Sci. 2017;130(24):4180–92. pmid:29133588
  66. 66. Hutchinson JN, Ensminger AW, Clemson CM, Lynch CR, Lawrence JB, Chess A. A screen for nuclear transcripts identifies two linked noncoding RNAs associated with SC35 splicing domains. BMC Genomics. 2007;8:39. pmid:17270048
  67. 67. Hirose T, Yamazaki T, Nakagawa S. Molecular anatomy of the architectural NEAT1 noncoding RNA: The domains, interactors, and biogenesis pathway required to build phase-separated nuclear paraspeckles. Wiley Interdiscip Rev RNA. 2019;10(6):e1545. pmid:31044562
  68. 68. Hung KL, Yost KE, Xie L, Shi Q, Helmsauer K, Luebeck J, et al. ecDNA hubs drive cooperative intermolecular oncogene expression. Nature. 2021;600(7890):731–6. pmid:34819668
  69. 69. Liang Z, Gilbreath C, Liu W, Wang Y, Zhang MQ, Zhang DE, et al. Chromatin-associated RNA Dictates the ecDNA Interactome in the Nucleus. bioRxiv. 2023.
  70. 70. Nichols A, Choi Y, Norman RX, Chen Y, Striepen J, Salataj E, et al. Chromosomal tethering and mitotic transcription promote ecDNA nuclear inheritance. Mol Cell. 2025;85(15):2839-2853.e8. pmid:40614723
  71. 71. Werner MS, Sullivan MA, Shah RN, Nadadur RD, Grzybowski AT, Galat V, et al. Chromatin-enriched lncRNAs can act as cell-type specific activators of proximal gene transcription. Nat Struct Mol Biol. 2017;24(7):596–603. pmid:28628087
  72. 72. Tseng Y-Y, Moriarity BS, Gong W, Akiyama R, Tiwari A, Kawakami H, et al. PVT1 dependence in cancer with MYC copy-number increase. Nature. 2014;512(7512):82–6. pmid:25043044
  73. 73. Cho SW, Xu J, Sun R, Mumbach MR, Carter AC, Chen YG, et al. Promoter of lncRNA Gene PVT1 Is a Tumor-Suppressor DNA Boundary Element. Cell. 2018;173(6):1398-1412.e22. pmid:29731168
  74. 74. Li Q, Dilsavor C, Gahramanov V, Floyd E, Ding J, Glynn JJ, et al. Long noncoding RNA-dependent control of Myc transcriptional bursting. Cell Rep. 2025;44(10):116439. pmid:41105511
  75. 75. Li Q, Olivero CE, Floyd E, Ding J, Dangelmaier E, Knight J, et al. Activation of Pvt1b isoform contributes to local Pvt1 abundance to repress Myc during stress. PLoS Genet. 2025;21(7):e1011790. pmid:40743223
  76. 76. Olivero CE, Martínez-Terroba E, Zimmer J, Liao C, Tesfaye E, Hooshdaran N, et al. p53 Activates the Long Noncoding RNA Pvt1b to Inhibit Myc and Suppress Tumorigenesis. Mol Cell. 2020;77(4):761-774.e8. pmid:31973890
  77. 77. Gayen S, Maclary E, Buttigieg E, Hinten M, Kalantry S. A Primary Role for the Tsix lncRNA in Maintaining Random X-Chromosome Inactivation. Cell Rep. 2015;11(8):1251–65. pmid:25981039
  78. 78. Farhadova S, Ghousein A, Charon F, Surcis C, Gomez-Velazques M, Roidor C, et al. The long non-coding RNA Meg3 mediates imprinted gene expression during stem cell differentiation. Nucleic Acids Res. 2024;52(11):6183–200. pmid:38613389
  79. 79. Sanli I, Lalevée S, Cammisa M, Perrin A, Rage F, Llères D, et al. Meg3 Non-coding RNA Expression Controls Imprinting by Preventing Transcriptional Upregulation in cis. Cell Rep. 2018;23(2):337–48. pmid:29641995
  80. 80. Pandya-Jones A, Markaki Y, Serizay J, Chitiashvili T, Mancia Leon WR, Damianov A, et al. A protein assembly mediates Xist localization and gene silencing. Nature. 2020;587(7832):145–51. pmid:32908311
  81. 81. Traag VA, Waltman L, van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep. 2019;9(1):5233. pmid:30914743
  82. 82. Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65.
  83. 83. Kolpa HJ, Creamer KM, Hall LL, Lawrence JB. SAF-A mutants disrupt chromatin structure through dominant negative effects on RNAs associated with chromatin. Mamm Genome. 2022;33(2):366–81. pmid:34859278
  84. 84. Creamer KM, Kolpa HJ, Lawrence JB. Nascent RNA scaffolds contribute to chromosome territory architecture and counter chromatin compaction. Mol Cell. 2021;81(17):3509-3525.e5. pmid:34320406
  85. 85. Sakaguchi T, Hasegawa Y, Brockdorff N, Tsutsui K, Tsutsui KM, Sado T, et al. Control of Chromosomal Localization of Xist by hnRNP U Family Molecules. Dev Cell. 2016;39(1):11–2. pmid:27728779
  86. 86. Hasegawa Y, Brockdorff N, Kawano S, Tsutui K, Tsutui K, Nakagawa S. The matrix protein hnRNP U is required for chromosomal localization of Xist RNA. Dev Cell. 2010;19(3):469–76. pmid:20833368
  87. 87. Kolpa HJ, Fackelmayer FO, Lawrence JB. SAF-A Requirement in Anchoring XIST RNA to Chromatin Varies in Transformed and Primary Cells. Dev Cell. 2016;39(1):9–10. pmid:27728783
  88. 88. Sharp JA, Sparago E, Thomas R, Alimenti K, Wang W, Blower MD. Role of the SAF-A/HNRNPU SAP domain in X chromosome inactivation, nuclear dynamics, transcription, splicing, and cell proliferation. PLoS Genet. 2025;21(6):e1011719. pmid:40493679
  89. 89. Thayer M, Heskett MB, Smith LG, Spellman PT, Yates PA. ASAR lncRNAs control DNA replication timing through interactions with multiple hnRNP/RNA binding proteins. Elife. 2024;13:RP95898. pmid:38896448
  90. 90. Zhou H, Stein CB, Shafiq TA, Shipkovenska G, Kalocsay M, Paulo JA, et al. Rixosomal RNA degradation contributes to silencing of Polycomb target genes. Nature. 2022;604(7904):167–74. pmid:35355014
  91. 91. Mattout A, Gaidatzis D, Padeken J, Schmid CD, Aeschimann F, Kalck V, et al. LSM2-8 and XRN-2 contribute to the silencing of H3K27me3-marked genes through targeted RNA decay. Nat Cell Biol. 2020;22(5):579–90. pmid:32251399
  92. 92. Dou X, Xiao Y, Shen C, Wang K, Wu T, Liu C, et al. RBFOX2 recognizes N6-methyladenosine to suppress transcription and block myeloid leukaemia differentiation. Nat Cell Biol. 2023;25(9):1359–68. pmid:37640841
  93. 93. Patil DP, Chen C-K, Pickering BF, Chow A, Jackson C, Guttman M, et al. m(6)A RNA methylation promotes XIST-mediated transcriptional repression. Nature. 2016;537(7620):369–73. pmid:27602518
  94. 94. Wei G, Coker H, Rodermund L, Almeida M, Roach HL, Nesterova TB, et al. m6A and the NEXT complex direct Xist RNA turnover and X-inactivation dynamics. Nat Struct Mol Biol. 2025;32(11):2242–51. pmid:40926104
  95. 95. Nozawa R-S, Boteva L, Soares DC, Naughton C, Dun AR, Buckle A, et al. SAF-A Regulates Interphase Chromosome Structure through Oligomerization with Chromatin-Associated RNAs. Cell. 2017;169(7):1214-1227.e18. pmid:28622508
  96. 96. Raab JR, Smith KN, Spear CC, Manner CJ, Calabrese JM, Magnuson T. SWI/SNF remains localized to chromatin in the presence of SCHLAP1. Nat Genet. 2019;51(1):26–9. pmid:30510238
  97. 97. Saha D, Animireddy S, Lee J, Thommen A, Murvin MM, Lu Y, et al. Enhancer switching in cell lineage priming is linked to eRNA, Brg1’s AT-hook, and SWI/SNF recruitment. Mol Cell. 2024;84(10):1855-1869.e5. pmid:38593804
  98. 98. G Hendrickson D, Kelley DR, Tenen D, Bernstein B, Rinn JL. Widespread RNA binding by chromatin-associated proteins. Genome Biol. 2016;17:28. pmid:26883116
  99. 99. Cheng Q-X, Xie G, Zhang X, Wang J, Ding S, Wu Y-X, et al. Co-profiling of in situ RNA-protein interactions and transcriptome in single cells and tissues. Nat Methods. 2025;22(9):1824–35. pmid:40784921
  100. 100. Xiang JS, Schafer DM, Rothamel KL, Yeo GW. Decoding protein-RNA interactions using CLIP-based methodologies. Nat Rev Genet. 2024;25(12):879–95. pmid:38982239
  101. 101. Cech TR, Davidovich C, Jenner RG. PRC2-RNA interactions: viewpoint from Tom Cech, Chen Davidovich, and Richard Jenner. Molecular Cell. 2024;84:3593–5.
  102. 102. Seczynska M, Bloor S, Cuesta SM, Lehner PJ. Genome surveillance by HUSH-mediated silencing of intronless mobile elements. Nature. 2022;601(7893):440–5. pmid:34794168
  103. 103. Brown CJ, Ballabio A, Rupert JL, Lafreniere RG, Grompe M, Tonlorenzi R, et al. A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature. 1991;349(6304):38–44. pmid:1985261
  104. 104. Brockdorff N, Ashworth A, Kay GF, Cooper P, Smith S, McCabe VM, et al. Conservation of position and exclusive expression of mouse Xist from the inactive X chromosome. Nature. 1991;351(6324):329–31. pmid:2034279
  105. 105. Merens HE, Choquet K, Baxter-Koenigs AR, Churchman LS. Timing is everything: advances in quantifying splicing kinetics. Trends Cell Biol. 2024;34(11):968–81. pmid:38777664
  106. 106. Mukherjee N, Calviello L, Hirsekorn A, de Pretis S, Pelizzola M, Ohler U. Integrative classification of human coding and noncoding genes through RNA metabolism profiles. Nat Struct Mol Biol. 2017;24(1):86–96. pmid:27870833
  107. 107. Reimer KA, Mimoso CA, Adelman K, Neugebauer KM. Co-transcriptional splicing regulates 3’ end cleavage during mammalian erythropoiesis. Mol Cell. 2021;81(5):998-1012.e7. pmid:33440169
  108. 108. Ntini E, Budach S, Vang Ørom UA, Marsico A. Genome-wide measurement of RNA dissociation from chromatin classifies transcripts by their dynamics and reveals rapid dissociation of enhancer lncRNAs. Cell Syst. 2023;14(10):906-922.e6. pmid:37857083
  109. 109. Ietswaart R, Smalec BM, Xu A, Choquet K, McShane E, Jowhar ZM, et al. Genome-wide quantification of RNA flow across subcellular compartments reveals determinants of the mammalian transcript life cycle. Mol Cell. 2024;84(14):2765-2784.e16. pmid:38964322
  110. 111. Yeom K-H, Pan Z, Lin C-H, Lim HY, Xiao W, Xing Y, et al. Tracking pre-mRNA maturation across subcellular compartments identifies developmental gene regulation through intron retention and nuclear anchoring. Genome Res. 2021;31(6):1106–19. pmid:33832989
  111. 112. Choquet K, Patop IL, Churchman LS. The regulation and function of post-transcriptional RNA splicing. Nat Rev Genet. 2025;26(6):378–94. pmid:40217094
  112. 113. Bedi K, Magnuson B, Narayanan IV, Paulsen MT, Wilson TE, Ljungman M. Cotranscriptional splicing efficiencies differ within genes and between cell types. RNA. 2021;27(7):829–40. pmid:33975916
  113. 114. Neugebauer KM. Nascent RNA and the Coordination of Splicing with Transcription. Cold Spring Harb Perspect Biol. 2019;11(8):a032227. pmid:31371351
  114. 115. Kania EE, Fenix A, Marciniak DM, Lin Q, Bianchi S, Hristov B, et al. Nascent transcript O-MAP reveals the molecular architecture of a single-locus subnuclear compartment built by RBM20 and the TTN RNA. bioRxiv. 2024.
  115. 116. Quinn J, Kunath T, Rossant J. Mouse trophoblast stem cells. Methods Mol Med. 2006;121:125–48. pmid:16251740
  116. 117. Kirk JM, Kim SO, Inoue K, Smola MJ, Lee DM, Schertzer MD, et al. Functional classification of long non-coding RNAs by k-mer content. Nat Genet. 2018;50(10):1474–82. pmid:30224646
  117. 118. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21. pmid:23104886
  118. 119. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. Bioinformatics. 2009;25:2078–9.
  119. 120. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137. pmid:18798982
  120. 121. Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30(7):923–30. pmid:24227677
  121. 122. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2. pmid:20110278
  122. 123. Bailey TL. STREME: accurate and versatile sequence motif discovery. Bioinformatics. 2021;37(18):2834–40. pmid:33760053
  123. 124. Gupta S, Stamatoyannopoulos JA, Bailey TL, Noble WS. Quantifying similarity between motifs. Genome Biol. 2007;8(2):R24. pmid:17324271
  124. 125. Li S, Eberhard QE, Ni L, Calabrese JM. Improved functions for nonlinear sequence comparison using SEEKR. RNA. 2024;30(11):1408–21. pmid:39187382
  125. 126. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. pmid:19261174
  126. 127. Kramer NE, Davis ES, Wenger CD, Deoudes EM, Parker SM, Love MI, et al. Plotgardener: cultivating precise multi-panel figures in R. Bioinformatics. 2022;38(7):2042–5. pmid:35134826
  127. 128. Schertzer MD, Murvin MM, Calabrese JM. Using RNA Sequencing and Spike-in RNAs to Measure Intracellular Abundance of lncRNAs and mRNAs. Bio Protoc. 2020;10(19):e3772. pmid:33204768
  128. 129. Frankish A, Carbonell-Sala S, Diekhans M, Jungreis I, Loveland JE, Mudge JM, et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 2023;51(D1):D942–9. pmid:36420896
  129. 130. Engreitz JM, Haines JE, Perez EM, Munson G, Chen J, Kane M, et al. Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature. 2016;539(7629):452–5. pmid:27783602
  130. 131. Csardi G, Nepusz T, Muller K, Horvat S, Traag V, Zanini F, et al. igraph for R: R interface of the igraph library for graph theory and network analysis. 2025.
  131. 132. Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K. Cluster Analysis Basics and Extensions. 2025.
  132. 133. Hong Y, Team RC. Poibin: The Poisson Binomial Distribution. https://CRAN.R-project.org/package=poibin 2024.
  133. 134. Hu B, Petela N, Kurze A, Chan K-L, Chapard C, Nasmyth K. Biological chromodynamics: a general method for measuring protein occupancy across the genome by calibrating ChIP-seq. Nucleic Acids Res. 2015;43(20):e132. pmid:26130708
  134. 135. Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature. 2011;477(7364):289–94. pmid:21921910
  135. 136. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9. pmid:22388286
  136. 137. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag.
  137. 138. Perez G, Barber GP, Benet-Pages A, Casper J, Clawson H, Diekhans M, et al. The UCSC Genome Browser database: 2025 update. Nucleic Acids Res. 2025;53(D1):D1243–9. pmid:39460617
  138. 139. Ray D, Kazan H, Cook KB, Weirauch MT, Najafabadi HS, Li X, et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature. 2013;499(7457):172–7. pmid:23846655
  139. 140. Feng H, Bao S, Rahman MA, Weyn-Vanhentenryck SM, Khan A, Wong J. (2019). Modeling RNA-Binding Protein Specificity In Vivo by Precisely Registering Protein-RNA Crosslink Sites. Mol Cell 74, 1189–204 e1186.
  140. 141. Martin G, Gruber AR, Keller W, Zavolan M. Genome-wide analysis of pre-mRNA 3’ end processing reveals a decisive role of human cleavage factor I in the regulation of 3’ UTR length. Cell Rep. 2012;1(6):753–63. pmid:22813749
  141. 142. Viphakone N, Sudbery I, Griffith L, Heath CG, Sims D, Wilson SA. Co-transcriptional Loading of RNA Export Factors Shapes the Human Transcriptome. Mol Cell. 2019;75(2):310-323.e8. pmid:31104896