Parent of origin gene expression in a founder population identifies two new imprinted genes at known imprinted regions

Genomic imprinting is the phenomena that leads to silencing of one copy of a gene inherited from a specific parent. Mutations in imprinted regions have been involved in diseases showing parent of origin effects. Identifying genes with evidence of parent of origin expression patterns in family studies allows the detection of more subtle imprinting. Here, we use allele specific expression in lymphoblastoid cell lines from 306 Hutterites related in a single pedigree to provide formal evidence for parent of origin effects. We take advantage of phased genotype data to assign parent of origin to RNA-seq reads in individuals with gene expression data. Our approach identified known imprinted genes, two putative novel imprinted genes, and 14 genes with asymmetrical parent of origin gene expression. We used gene expression in peripheral blood leukocytes (PBL) to validate our findings, and then confirmed imprinting control regions (ICRs) using DNA methylation levels in the PBLs. Author Summary Large scale gene expression studies have identified known and novel imprinted genes through allele specific expression without knowing the parental origins of each allele. Here, we take advantage of phased genotype data to assign parent of origin to RNA-seq reads in 306 individuals with gene expression data. We identified known imprinted genes as well as two novel imprinted genes in lymphoblastoid cell line gene expression. We used gene expression in PBLs to validate our findings, and DNA methylation levels in PBLs to confirm previously characterized imprinting control regions that could regulate these imprinted genes.


42
Imprinted genes have one allele silenced in a parent of origin specific manner. In humans, 43 approximately 105 imprinted loci have been identified, many of which play important roles in 44 across individuals as evidence for imprinting, they identified 42 imprinted genes, both known 68 and novel, and used family studies to confirm imprinting of 5 novel imprinted genes [14]. 69 Santoni et al. identified nine novel imprinted genes using single-cell allele-specific gene 70 expression and identifying genes with mono-allelic expression in fibroblasts from 3 unrelated 71 individuals and probands of 2 family trios, and then used the trios to confirm parent of origin of 72 the alleles [15]. 73 Here, we perform a parent of origin ASE study in a large pedigree to characterize parent 74 of origin specific gene expression in the Hutterites, a founder population of European descent, 75 for which we have phased genotype data [16]. We use RNA-seq from lymphoblastoid cell lines 76 (LCLs) to map transcripts to parental haplotypes and identify known and two not previously

Mapping transcripts to parental haplotypes 83
For each of 306 individuals, the total number of transcripts at each gene was assigned as 84 maternally inherited, paternally inherited, or unknown parent of origin. The last group included 85 transcripts without heterozygote SNPs or transcripts with SNPs without parent of origin 86 information. Transcripts were assigned to the parentally inherited categories using SNPs in the 87 reads and matching alleles to either the known maternally or paternally inherited alleles. All the 88 genes analyzed had some transcripts of unknown origin (average 97.8%, range 8.3-100%). For 89 each gene we assigned parental origin to an average of 1.8% of transcripts (range: 0-34.7%), and 90 for each individual we assigned parental origin to an average of 1.4% of transcripts (range: 0-91 1.7%). On average, about 40 SNPs per gene were used to assign the transcripts of a gene to 92 parent (range 1-1839 SNPs). 93 only expressed on the maternally-inherited allele in at least one individual and not on the 100 paternally inherited allele in any individuals (S1 Table). 101

Imprinted Genes in Lymphoblastoid Cell Lines (LCLs) 102
Among the 139 genes with only paternally inherited expression or only maternally 103 inherited expression, there are three known imprinted genes (CDKN1C, NDN, SNRPN) and one 104 previously predicted to be imprinted (IFITM1) [17]. CDKN1C showed patterns opposite of what 105 has been reported [18,19], which could be due to the small sample (only three individuals 106 showed expression from one parent) or to the different cell types used here (LCLs) and in 107 previous studies (developing brain and embryonal tumors for CDKN1C). 108 We expect some imprinted genes to have 'leaky' expression, such that there is some 109 expression from the parental chromosome that is mostly silenced. To detect these genes, we used 110 a binomial test to find patterns of gene expression asymmetry by parental transcript levels. This 111 analysis identified 28 genes with an FDR <5% (  Fig 1A and 1B, respectively. We identified two additional 115 genes that showed asymmetry in parental expression from mostly one parent (PXDC1, PWAR6), 116 which we consider potentially new imprinted genes. The remaining fourteen genes showed 117 significant patterns of asymmetry but had expression from both maternal and paternal 118 chromosomes. These genes are likely not imprinted but could have asymmetry in expression due 119 to an expression quantitative trait loci (eQTL). 120

126
Two genes showed gene expression signatures consistent with imprinting but have not 127 previously been recognized as imprinted genes. The first potentially new imprinted gene is 128 PXDC1, which is in the same region and next to (<100kb) a known imprinted gene, FAM50B. 129 The second potentially novel imprinted gene is PWAR6, or Prader Willi Angelman Region 130 RNA6, a gene encoding a regulatory class of RNA. Although this gene is located within the 131 intron of a known imprinted gene, SNHG14, this noncoding RNA has not previously been 132 recognized as having parent of origin specific expression (Fig 1C). 133 The remaining fourteen genes show significant asymmetry using the binomial test but do 134 not have expression from mostly one parental chromosome. One of these genes, SNHG17, is a 135 noncoding RNA. Another gene with parent of origin asymmetry, ZNF813, is next to a known 136 the remaining genes that show parent of origin asymmetry but not with a pattern consistent with 145 imprinting (S1 Figure). 146

Validation of Imprinted Genes in PBLs 147
Using the same methods described above, we assigned parent of origin to transcripts in 148 PBLs from 99 Hutterite individuals not included in the LCL studies. Maternal and paternal 149 expression in PBLs for all 28 genes identified in LCLs showed similar trends of asymmetry as in 150 LCLs (Fig 2). 151

Methylation at Imprinting Control Regions. 152
One of the mechanisms underlying parent of origin effects on expression at imprinted 153 loci is differential methylation at cis-acting imprinting control regions (ICRs). DNA methylation 154 from the Illumina HumanMethylation 450K array was available in PBLs from the same 155 individuals included in the validation study described above. To determine the expected patterns 156 of methylation at known imprinted loci, we first looked at previously characterized methylated The methylation patterns at the two potentially novel imprinted genes identified in this 159 study, PXDC1 and PWAR6, lie in or near known imprinted regions that contain previously 160 characterized ICRs. These previously characterized ICRs show about 50% methylation (beta 161 value of between 0.25 and 0.75) in our DNA methylation data, which likely reflect methylation 162 at only one parental chromosome in all the cells in the sample. Methylation patterns in PBLs at 163 these two ICRs fall within this hemi-methylation range, further suggesting that these two genes 164 are indeed imprinted (Fig 3). We also characterized methylation patterns near genes showing asymmetry. Using results 183 from studies that had previously characterized ICRs in patients with uniparental disomy at many 184 imprinted regions [31,32], we estimated regions for defining hemi-methylation near the genes 185 identified in our study. Using this approach, we were able to provide additional supportive data 186 for the two potentially new imprinted genes to be true imprinted genes regulated by previously 187 characterized ICRs. 188 Although our study is the largest pedigree-based study to date to search genome-wide for 189 imprinted genes, it has limitations. First, we are able to determine the parent of origin for a many 190 transcripts in the Hutterites but we could not assign every RNA sequencing read to a parent due 191 to lack of heterozygous sites or missing parent of origin information for alleles. Second, we 192 conducted these studies in lymphoblastoid cell lines, and therefore could only study genes 193 imprinted in this cell type and would miss the many imprinted genes that are tissue-specific 194 and/or developmentally regulated [33]. Third, while we can verify previously characterized ICRs, 195 our study is not designed to identify novel ICRs because DNA methylation values from an array 196 cannot be assigned to parental haplotype. Lastly, although we characterized the gene expression 197 and methylation patterns for two potentially novel imprinted genes, replication of these genes in

RNA-seq in Lymphoblastoid Cell Lines (LCLs). 224
RNA-seq was performed in LCLs as previously described [34]. For this study, 225 sequencing reads were reprocessed as follows. Reads were trimmed for adaptors using Cutadapt 226 (reads less than 5 bp discarded) then remapped to hg19 using STAR indexed with gencode 227 version 19 gene annotations [35,36]. To remove mapping bias, reads were processed and 228 duplicate reads removed using WASP [37]. We used a custom script modified from WASP to 229 separate reads that overlap maternal alleles or paternal alleles. Reads without informative SNPs 230 (homozygous, or no parent of origin information) were categorized as unknown where the 231 unknown, maternal, and paternal make up the total gene expression. Gene counts were quantified 232 using STAR for each category. VerifyBamID was used to identify sample swaps [38]. Genes 233 mapping to the X and Y chromosome were removed; genes with a CPM log transformed value 234 less than 1 in less than 20 individuals were also removed. 235

RNA-seq in Peripheral Blood Leukocytes (PBLs) 236
RNA-seq was performed in whole blood as previously described [39]. For this study, 237 sequencing reads were reprocessed as described above for the studies in LCLs. For these 238 analyses, we excluded 32 individuals who were also in the LCL study. 239

Identifying Imprinted Genes 240
We used a binomial test to detect asymmetry in parent of origin gene expression. Using 241 the paternally and maternally assigned reads, we generated a binomial Z-score for each 242   Genome-wide parent-of-origin DNA methylation analysis reveals the intricacies of human 370 imprinting and suggests a germline methylation-independent mechanism of establishment.

Mean methylation at CpGs
Mean methylation across individuals (beta)