A variably imprinted epiallele impacts seed development

The contribution of epigenetic variation to phenotypic variation is unclear. Imprinted genes, because of their strong association with epigenetic modifications, represent an opportunity for the discovery of such phenomena. In mammals and flowering plants, a subset of genes are expressed from only one parental allele in a process called gene imprinting. Imprinting is associated with differential DNA methylation and chromatin modifications between parental alleles. In flowering plants imprinting occurs in a seed tissue - endosperm. Proper endosperm development is essential for the production of viable seeds. We previously showed that in Arabidopsis thaliana intraspecific imprinting variation is correlated with naturally occurring DNA methylation polymorphisms. Here, we investigated the mechanisms and function of allele-specific imprinting of the class IV homeodomain leucine zipper (HD-ZIP) transcription factor HDG3. In imprinted strains, HDG3 is expressed primarily from the methylated paternally inherited allele. We manipulated the methylation state of endogenous HDG3 in a non-imprinted strain and demonstrated that methylation of a proximal transposable element is sufficient to promote HDG3 expression and imprinting. Gain of HDG3 imprinting was associated with earlier endosperm cellularization and changes in seed weight. These results indicate that epigenetic variation alone is sufficient to explain imprinting variation and demonstrate that epialleles can underlie variation in seed development phenotypes.


22
tissue -endosperm. Proper endosperm development is essential for the production of viable seeds. We previously showed that in Arabidopsis thaliana intraspecific imprinting 24 variation is correlated with naturally occurring DNA methylation polymorphisms. Here, we investigated the mechanisms and function of allele-specific imprinting of the class IV 26 homeodomain-Leucine zipper (HD-ZIP) transcription factor HDG3. In imprinted strains, HDG3 is expressed primarily from the methylated paternally inherited allele. We 28 manipulated the methylation state of endogenous HDG3 in a non-imprinted strain and demonstrated that methylation of a proximal transposable element is sufficient to 30 promote HDG3 expression and imprinting. Gain of HDG3 imprinting was associated with earlier endosperm cellularization and changes in seed weight. These results indicate 32 that epigenetic variation alone is sufficient to explain imprinting variation and demonstrate that epialleles can underlie variation in seed development phenotypes.

38
The contribution of genetic variation to phenotypic variation is well-established. By contrast, it is unknown how frequently epigenetic variation causes differences in 40 organismal phenotypes. Epigenetic information is closely associated with but not encoded in the DNA sequence. In practice, it is challenging to disentangle genetic 42 variation from epigenetic variation, as what appears to be epigenetic variation might have an underlying genetic basis. DNA methylation is one form of epigenetic 44 information. HDG3 encodes an endosperm specific transcription factor that exists in two states in A. thaliana natural populations: methylated and expressed and hypomethylated 46 and repressed. We show that pure epigenetic variation is sufficient to explain expression variation of HDG3 -a naturally lowly expressed allele can be switched to a higher Introduction DNA methylation is a heritable epigenetic mark that can, on occasion, effect 56 gene transcription and influence development. DNA methylation is a particularly influential regulator of gene expression in endosperm, a triploid extraembryonic seed 58 tissue that supports embryo development. In endosperm, developmentally programmed DNA demethylation causes maternally inherited endosperm genomes to be 60 hypomethylated compared to the paternally inherited endosperm genome (1-3).
Methylation differences between maternal and paternal alleles identify their parent-of-62 origin and establish imprinting, an epigenetic phenomenon in which a gene is expressed primarily from one parental allele (4). Imprinting is theorized to have evolved over conflict 64 between maternally and paternally inherited alleles in offspring over the extent of maternal investment (5,6). Under the kinship theory, silencing of the maternally inherited 66 allele and expression of the paternally inherited allele is predicted to ultimately result for genes where the paternally inherited allele's optimum expression level in offspring is 68 higher than the maternally inherited allele's, (7). Comparison of imprinting between species in the Arabidopsis genus has provided empirical support for this hypothesis 70 (8,9).
Recent genomic approaches have revealed extensive natural DNA methylation 72 variation within Arabidopsis thaliana (10,11). Whereas the contribution of genetic variation to phenotypic diversity is well-established, the impact of epigenetic variation, or 74 epialleles, on phenotype is only beginning to be understood (12,13). Processes affected by epialleles include patterns of floral development, sex determination, fruit ripening and 76 nutritional content, and senescence, among others (14)(15)(16)(17)(18)(19). We previously demonstrated that natural variation in DNA methylation is associated with imprinting variation, with as 78 many as 10% of imprinted genes estimated to be variably imprinted within A. thaliana and maize (20,21). Seed development varies extensively among Arabidopsis accessions 80 and has previously been shown to be influenced by parent-of-origin effects (20,22), thus raising the possibility that variation in imprinting could influence seed phenotypes. One

94
The activity of HDG3 alleles is correlated with DNA methylation. In endosperm of imprinted strains, the highly expressed paternal HDG3 allele is methylated and the lowly 96 expressed maternal allele is hypomethylated over a Helitron TE sequence 5' of the transcriptional start site (2). Maternally inherited endosperm alleles are demethylated by 98 the 5-methylcytosine DNA glycosylase gene DME; in dme mutants, maternal alleles retain their methylation and are also expressed (2,29). Of 927 Arabidopsis accessions 100 with sufficient methylation data (11), 32 (3.5%) have no methylation in the HDG3 5' region and 871 (94%) have greater than 50% methylation. When strains where HDG3 102 methylation is low, such as Cvi or Kz_9, are the paternal parent in crosses with Col, there is no methylation difference between maternal and paternal alleles in endosperm 104 and HDG3 is biallelically expressed (20). Together, these data suggest that (1) DNA demethylation promotes repression of the maternally-inherited HDG3 allele whereas 106 DNA methylation promotes expression (or inhibits repression) of the paternal HDG3 allele and that (2) imprinting variation is due to cis epigenetic variation at HDG3 (20).

108
However, a cis or trans genetic contribution to imprinting variation cannot be excluded because of DNA sequence polymorphisms between the strains and alleles that do and 110 do not exhibit imprinting.
Here, we show that a naturally occurring epiallele can contribute to variation in 112 seed phenotypes in Arabidopsis. We tested whether cis epigenetic variation is sufficient to explain imprinting variation by generating a methylated HDG3 Cvi allele that mimicked 114 a methylated HDG3 Col allele. We found that the HDG3 Cvi allele switched from a hypomethylated, non-imprinted, repressed state to an imprinted, paternally biased, 116 expressed state. Additionally, gain of HDG3 imprinting altered endosperm development and final seed size. These data indicate that naturally occurring epialleles can have 118 phenotypic consequences in endosperm, a tissue where methylation is dynamic as a programmed part of development.

122
Natural variation in HDG3 imprinting is associated with gene expression differences 124 We previously showed that several genes that are imprinted in endosperm when Col is the paternal parent are not imprinted when Cvi is the paternal parent (20). To 126 further examine naturally occurring endosperm gene expression variation, we sequenced the transcriptomes of endosperm from Col x Col and Col x Cvi F1 seeds.

128
Comparison of these transcriptomes identified 957 genes that were expressed two-fold or higher in Col x Col and 1187 that were expressed two-fold or higher in Col x Cvi 130 endosperm (Fig 1A; S1 Table). The gene with the lowest expression in Col x Cvi relative to Col is HDG3, which is expressed 64-fold lower in Col x Cvi endosperm ( Fig 1A).

132
We previously reported that HDG3 is a PEG in Cvi x Col crosses but is biallelically expressed in Col x Cvi (20). To further explore the expression variation of 134 HDG3, we performed in situ hybridization on developing seeds (Fig 1B-C; S1 Fig). In Col x Col seeds, HDG3 is expressed specifically in the micropylar, peripheral, and chalazal 136 endosperm, with the highest expression at the heart stage of development ( Fig 1C). The same pattern was observed in Cvi x Col (Fig 1C). Whereas HDG3 expression was 138 detected by in situ hybridization in F1 endosperm when Col was the paternal parent, it was not detected in endosperm when Cvi was the paternal parent ( Fig 1C). Additionally,
Expression in Col x Col and Cvi x Col was approximately 10-fold higher than in Cvi x Cvi 142 or Col x Cvi, indicating that HDG3 expression is higher when it is imprinted (Fig 1D), consistent with the mRNA-seq ( Fig 1A) and in situ data ( Fig 1C). Thus, although 144 expression is from both maternally and paternally inherited alleles in Col x Cvi crosses (and presumably Cvi x Cvi crosses) as detected by mRNA-seq (20), the total expression 146 in those crosses is lower than when HDG3 is imprinted. As we previously showed that the Cvi allele is naturally hypomethylated (20), together these results suggest that DNA 148 methylation of the HDG3 5' region promotes HDG3 expression (Fig 1E).
There is also evidence for imprinting variation of HDG3 in other species. In 150 Arabidopsis lyrata, expression of HDG3 is also specific to the endosperm but levels

Reduced HDG3 expression affects seed development
To examine if HDG3 influenced endosperm development, we compared seeds 160 from hdg3 mutant plants and segregating wild-type siblings in the Col background. We confirmed predominantly paternal expression of HDG3 (2,20) by reciprocal crosses 162 between wild type and hdg3-1 mutants (Fig 2A). When hdg3 was crossed as a female to a wild-type sibling male, expression of HDG3 was detected in endosperm in a similar 164 manner as in Col x Col (Fig 2A). In contrast, when wild-type females were crossed to hdg3-1 mutant males, the accumulation of HDG3 transcript in endosperm was 166 dramatically affected, with no transcript detected in most cases, despite the presence of a wild-type maternally inherited allele (Fig 2A). We assessed embryo stage and the 168 extent of endosperm cellularization for sectioned wild-type and hdg3 seeds at 5 days after pollination. Embryo development was more variable in hdg3, although this 170 difference was not statistically significant, but endosperm cellularization was significantly delayed compared to wild-type seeds (Fig 2B, S2 and S3 Table). Reciprocal crosses 172 between wild-type and hdg3 mutant plants indicated that the endosperm cellularization phenotype was dependent on paternal genotype, consistent with HDG3 function being 174 primarily supplied from the paternally-inherited allele (S2 and S3 Table). Additionally, the weight and area of hdg3 seeds was slightly reduced compared to Col, suggesting 176 that in the Col background HDG3 promotes seed growth or filling (Fig 2C-D). Several PEGs have been shown to influence seed abortion phenotypes in interploidy crosses 178 (30,31), but we found no effect of hdg3 on this process (S3 Fig). To understand the potential molecular consequences of the loss of hdg3, we profiled endosperm gene 180 expression in wild-type Col and hdg3-1 by RNA-seq at 7 days after pollination (DAP) (Fig 3). 150 genes had at least two-fold higher expression upon loss of hdg3, while 238 182 genes had at least two-fold lower expression in hdg3 mutant endosperm (Fig 3, S4   Table). Differentially expressed genes included developmental regulators such as 184 Homeobox 3 (WOX9) and gibberellin oxidases, which effect the level of a key phytohormone necessary for typical seed development (32) (Fig 3). The loss of hdg3 186 also impacted the expression of ten imprinted genes, including the MEG HDG9 (Fig 3).
We hypothesized that the endosperm gene expression phenotypes associated with low 188 expression of HDG3 from Cvi paternal alleles might in some respects mimic hdg3 mutants. Indeed, of the 238 genes that are down-regulated in hdg3 mutants, 100 are 190 also down regulated in Col x Cvi crosses, where HDG3 expression is also low (Fig 3). This is a highly significant overlap (hypergeometric test in R, p = 6.079e-69) (Fig 3).

192
These data suggest that the Cvi HDG3 allele, in its hypomethylated and relatively transcriptionally repressed state, could be important for some of the accession-specific

An inverted repeat induces methylation in the region 5' of HDG3 in Cvi
To distinguish the importance of genetic variation from epigenetic variation for

Methylation of the HDG3 5' region is sufficient to promote expression and imprinting 230
Having established a methylated Cvi HDG3 allele, we tested whether paternal allele methylation was sufficient to switch HDG3 from a non-imprinted, repressed state to an 232 imprinted, more active state. In two independent lines, in situ hybridization of F1 seeds from Col x Cvi HDG3 IR crosses indicated the presence of HDG3 transcript in 234 endosperm, in contrast to Col x Cvi endosperm ( Fig 5A). Hybridization signal was primarily detected in uncellularized endosperm on the chalazal side of the peripheral 236 endosperm ( Fig 5A). However, the penetrance of Cvi HDG3 expression was variable, with about half of the seeds exhibiting HDG3 expression detectable by in situ (Fig 5A).  in Col x Cvi HDG3 IR endosperm is consistent with HDG3 being more highly expressed when imprinted (Fig 1). Thus, to measure allele-specific expression of HDG3,

244
Col and Cvi alleles were distinguished using TaqMan probes in an RT-qPCR assay. In crosses between Col females and three independent Cvi HDG3 IR lines, the fraction of 246 transcript derived from the Cvi allele increased compared to control crosses between Col females and Cvi males. In Col x Cvi, the Cvi allele accounts for 23% of the transcripts by 248 this assay, in good agreement with prior allele-specific mRNA-seq results (20). In Col x Cvi HDG3 IR lines, the Cvi fraction was between 50-60%, indicating paternal allele bias 250 (the expectation for non-imprinted genes is 33% paternal) ( Fig 5C). This is slightly less than the fraction of paternal allele expression in Cvi x Col crosses by mRNA-seq (79%) 252 (20). Together, these data indicate that the naturally occurring methylation variation at HDG3 is sufficient to explain imprinting variation. We conclude that the methylated Cvi   Table). Whereas endosperm development appeared accelerated, embryo development was significantly delayed (Fig 6B, S2 and S3 Table). The effect on endosperm 266 cellularization was also observed in Cvi x Cvi HDG3 IR F1 seeds, although to a lesser extent (S2, S3 Table). Mature selfed seeds from Cvi HDG3 IR plants weighed 268 significantly less than selfed seeds from Cvi and had reduced area (Fig 6C-D). This is consistent with known correlations between early endosperm cellularization and the understood. An exception to this is in the endosperm, where active DNA demethylation 282 in the female gamete before fertilization establishes differential DNA methylation after fertilization, a step that is essential for normal seed development (38). We thus 284 hypothesized that the phenotypic impact of naturally occurring epialleles might be particularly evident in the endosperm, because the differential methylation between 286 maternal and paternal alleles that is required for gene imprinting could be variable across accessions (20). We have shown that HDG3 represents a case study of this 288 proposed phenomenon. By placing a methylation trigger in Cvi (the HDG3 IR transgene), we were able to convert the Cvi HDG3 allele from a hypomethylated to a methylated 290 state. This switch in methylation was sufficient to promote expression of the paternally inherited Cvi HDG3 allele in endosperm to 3-fold higher levels. Because we altered 292 methylation at the endogenous HDG3 Cvi locus, which retains all DNA sequence polymorphisms, we have shown that methylation variation alone is sufficient to cause 294 expression, and thus imprinting, variation. However, our results also show that it is unlikely that methylation of the proximal TE accounts for all of the expression differences   Although the effects on final seed size are seemingly contradictory and the physiological basis remains incompletely understood, these results are predicted under the aegis of 332 the kinship theory (7). The theory predicts that PEGs promote maternal investment in offspring, which is consistent with the effects of the hdg3 mutation in Col (i.e. less maternal investment results in smaller seeds). Our results suggest that this effect is specific to a Col seed developmental program. In Cvi endosperm, expression of HDG3 is 336 seemingly maladaptive, leading to the production of smaller seeds. Cvi naturally produces much larger seeds than Col or Ler, although fewer in number (20,22,33) (Figs 338 2 and 6). Our results suggest that the loss of HDG3 expression in Cvi was an important part of the process that resulted in these phenotypic differences.

340
In summary, we have demonstrated that seed phenotypic differences can be caused by methylation differences at single genes. This study provides further evidence 342 that epigenetic differences underlie developmental adaptations in plants. We have previously shown that the imprinting status of many genes varies between accessions; 344 our current study argues that intraspecific variation in imprinting is an important determinant of seed developmental variation.

Plant material
The SALK insertion mutant was obtained from the Arabidopsis Biological Resource   Table)     individual seeds at 5 DAP were scored first for embryo stage and then for respective 408 endosperm stage. Endosperm stage was given a numerical score (-3 to +5) depending on the relative stage of endosperm cellularization compared to the expected endosperm 410 cellularization stage given the embryo stage. Individual seeds with matching embryogenesis and endosperm cellularization stages were scored "normal" and ranged 412 from 0-1; seeds that were scored "early" were defined as being +1.5 to +5 stages further along in the cellularization process compared to normal. Seeds that were scored 414 "delayed" were defined as being -1 to -3 stages behind in the cellularization process compared to normal. To determine whether any developmental differences in 416 endosperm cellularization or embryogenesis were statistically significant, we implemented the asymptotic generalized Pearson chi-squared test from the coin 418 package (45) in R with default scoring weights. Developmental stage was treated as an ordinal variable, while cross genotype was treated as a non-ordinal, nominal variable.

420
Pairwise comparisons were carried out with the R function pairwiseOrdinalIndependence from the rcompanion package. For all tests, embryo development data was collapsed 422 into three categories young (pre-globular to globular), middle (late globular to early heart), and older (heart to torpedo) and detailed endosperm cellularization data was 424 collapsed into the categories delayed, normal, and early.

Inverted repeat transgene
The 450 bp sequence 5' of HDG3 corresponding to a fragment of AT2TE60490 from 428 Chr2: 13740010-13740460 was amplified from Cvi (S5 Table)