Perturbation of Parentally Biased Gene Expression during Interspecific Hybridization

Interspecific hybridization often induces epigenetic remodeling that leads to transposon activation, gene expression changes, and loss of imprinting. These genomic changes can be deleterious and contribute to postzygotic hybrid incompatibility. In Arabidopsis, loss of genomic imprinting of PHERES1 and presumed failure of Polycomb Repressive Complex contributes to seed inviability observed in A. thaliana X A. arenosa interspecific hybrids. We used this species pair to further analyze the relationship between parentally biased gene expression and postzygotic hybrid incompatibility using two A. thaliana accessions, Col-0 and C24, with differential seed survival. We found that parentally biased expression was perturbed to a similar degree in both A. thaliana hybrids for PHERES1, HDG3, and six other normally paternally expressed genes. We propose that early genome remodeling and loss of imprinting of seed development genes induces lethality in both compatible and incompatible hybrids.


INTRODUCTION
Sexual reproduction in plants results in the formation of two zygotes, the embryo and the endosperm, which together with maternal tissue form the seed. Sexual incompatibility between individuals of different species is commonly manifested by seed death, which is thought to depend on endosperm failure because the embryo can be rescued after microdissection [1]. Proper endosperm development in most angiosperms requires a 2:1 maternal:paternal contribution entailing multiple differentially contributed factors [2] including those resulting from genomic imprinting. Imprinting, the parent-of-origin dependent regulation of genes, ensues when one allele is preferentially expressed while the other allele is preferentially repressed. Expression bias manifested as down-regulation or complete suppression is associated with differential DNA methylation established by the coordinated action of methylating and demethylating pathways [3][4][5][6] such as is the case with imprinted gene MEDEA [7]. Differentially Methylated Regions (DMR) depending on parent of origin can be found in endosperm nuclear DNA, but not in embryo DNA, and are associated with genes that are preferentially expressed according to parent of origin. Another epigenetic pathway contributing to imprinting involves Polycomb Repressive Complex 2 (PRC2) and is exemplified by regulation of PHERES1 [8,9], a paternally We consider alternative explanations for the general derepression of paternally biased, maternally imprinted genes and activation of novel paternal genes.

Growth conditions
Wild-type Col-0 (CS6673) and C24 (CS22620), as well as male sterile lines that were hemizygous for a male sterility construct [30], were grown in 16 hours of light at 21°C and 8 hours of dark at 18°C. The male-sterile A. thaliana lines (ColA9 and C24A9) were either pollinated by A. arenosa Strecno (from M. Lysak and M. Koch) or wild-type Col-0 or C24. Several types of crosses with two biological replicates for each condition were used in these experiments: Intraspecific hybrids ColA9 X C24 and C24A9 X Col-0 and interspecific hybrids ColA9 X Aa and C24A9 X Aa plus A. arenosa controls [31]. No emasculation was needed because all egg donors either had a sterility construct or were self-incompatible (A. arenosa) [32,33].  [31]. At 3 days after pollination, fresh siliques were harvested (~50-80 siliques per replicate) and seeds were dissected from each silique and frozen on dry ice for RNA isolation using Plant RNA Reagent (Invitrogen, Carlsbad, California). All RNA was harvested from seed tissue. Libraries were constructed using homemade version of the Illumina RNA-Seq kit, using 10 μg of poly-A containing total RNA. After mRNA was isolated using poly-A purification, cDNA was synthesized and adaptors were ligated (S1 Table) in preparation for enrichment (12 cycles, Thermo Scientific Phusion High-Fidelity DNA Polymerase) and sequencing (Illumina GAIIx, 80 b paired end reads).

Plant material and RNA sequencing
To obtain greater sequence depth of low-to moderate-abundance RNA, such as PEGs, we treated replicate 2 of Col-0 X C24 and C24 X Col-0 with the crab duplex-specific nuclease (see for rationale and methods [34]) and reamplified with 8-10 cycles of PCR prior to Illumina sequencing.
For independent validation of candidate MEGs and PEGs, three crosses of ColA9 X C24 and two crosses of ColA9 X Aa were performed as described above. Seed from approximately 40 siliques were harvested for each condition and RNA was isolated with Plant RNA Reagent. Approximately 1 μg of total RNA was obtained and treated with DNase 1 (New England Biolabs), according to manufacturer's specifications. After RNAse treatment, cDNA was obtained with SuperScript III (Life Technologies). Select genes were amplified (see S2 Table for gene name and primer sequence) and pooled in equal quantities to 500 ng total using SYBR Green1 dye. After pooling, adaptors (S1 Table) were ligated with NEB Quick Ligase. Both biological replicates of Col-0 X Aa were technically replicated to ensure reproducibility of library preparation. KAPA Library Quant Kit (KAPA Biosystems) was then used to assess ligation efficiency and sequencing potential. PCR products were then sequenced on Illumina HiSeq 2000 (100 b paired end reads).

Computational analysis
Sequence Preprocessing. Sequences were divided according to barcodes (S1 Table) and barcodes were removed. The Illumina 1.5+ format (fastq) reads were trimmed for adaptor contamination and bases with a quality score lower than Phred 20 were also excluded using a custom Python script. Reads that were shorter than 27 bases (for 80 b paired end) or 39 bases (for 100 b paired end). The Illumina quality scores were converted to Sanger scores, which are compatible with most aligners. Biological replicates were pooled for greater sequence depth. All sequence data for intraspecific control crosses and validation experiment are available for download at GEO (ID GSE56675) and for interspecific hybrids and A. arenosa (GSE42957).
Processed and pooled sequences were aligned to A. thaliana TAIR10 cDNA all gene models (available for download at ftp://ftp.arabidopsis.org//home/tair/Sequences//blast_datasets/ TAIR10_blastsets/TAIR10_cdna_20101214_updated) using Burrows-Wheeler Aligner (BWA) version 0.5.8c [35]. Default settings were used with the addition of a trim quality of 20. SAM files and parsed pileups were generated using Samtools version 0.1.7 [36] as well as custom Python scripts.
SNP detection. For an overview of parent-of-origin expression detection, see S1 Fig. To detect parentally biased gene expression in Arabidopsis hybrids, we first sequenced Col-0, C24, and A. arenosa RNA using the Illumina sequencing platform to identify high-probability parent-specific SNPs. We removed reads with the same start position to ensure SNP coverage was not inflated. Because A. arenosa is an obligate outcrosser and displays frequent heterozygosity [33], we were concerned about ambiguity in SNP analysis. For SNP identification, we selected positions in all controls where 95% of reads contained a SNP at a coverage greater or equal to five reads. The mean percent SNP represents the sum of all positions within a single gene where a SNP was identified divided by the number of SNP identified for each gene; the mean coverage represents the sum of reads across all positions for which a SNP was observed divided by the number of positions. In RNA extracted from C24 seed derived by selfed crosses, we identified 84,255 SNP (mean percent SNP = 99.9% ± S.D. 0.5% and mean coverage = 27 reads ± S.D. 23 reads), which covered 13,376 genes (S1 File). In A. arenosa RNA, we identified 340,813 SNPs (mean percent SNP = 99.9% ± S.D. 0.4% and mean coverage = 16 reads ± S.D. 16 reads) covering 16,476 A. arenosa genes (S2 File). We then filtered out positions that had the same SNP base in both C24 and Aa. This provided us with 329,858 A. arenosa SNPs mapping to 16,422 genes (S2 Fig.). Approximately 15K Col-0 SNPs were also identified and removed from further analysis. Reciprocal control crosses and interspecific hybrids were then aligned to TAIR10 cDNA and SNPs identified in a manner similar to that used in the parents (coverage ! five reads with any SNP percentage).
Identification of parent of origin. Chi-square analysis using JMP 10.0 (SAS Institute) was used to identify differential maternal and paternal contributions for the following comparisons: 1) Col-0 X A. arenosa versus C24 X A. arenosa, 2) Col-0 X A. arenosa versus Col-0 X C24, 3) Col-0 X C24 versus C24 X Col-0, and 4) C24 X A. arenosa versus C24 X Col-0. First, the mean percent SNP and mean coverage for each library was calculated gene-by-gene. Then, for each gene, the approximate number of maternal reads or paternal reads was calculated using the following formulas: Where N m and N p are the number of maternal or paternal reads, respectively. C is the total read count. P m and P p are the maternal or paternal allelic fraction, respectively, expressed as %.
To avoid division by zero, pseudocounts (adding one to the coverage for each library) were employed. Chi-square was determined in JMP and P-values were ranked and adjusted following the Benjamini-Hochberg correction [37]. Genes that in a pairwise comparison exceeded the allelic ratio of 2m:1p (Padj < 0.05) were considered MEGs. Genes that were below the allelic ratio of 1m:1p (Padj < 0.05) were considered to be PEGs. Both were reported in S3-S10 Files. Additional comparisons between hybrids and controls were also conducted using Chisquared tests. Biological significance of outliers was assessed using "The Database for Annotation, Visualization and Integrated Discovery" (DAVID) [38]. Genes with known parental biases (from literature or The Arabidopsis Information Resource, TAIR) were used to confirm whether differential gene expression between hybrids and controls could be attributed to loss of imprinting.
Read counts. Read were mapped to TAIR10 cDNA (all gene models) as described above. Normalized read counts (normalized to smallest library) were generated using R package DESeq [39].

Identification of a high reliability set of endosperm specific genes
We used an available dataset of tissue specific expression during seed development for 23,592 genes (Harada-Goldberg LCM dataset [40]). We focused on the preglobular and globular stages, which correspond to the timing of our analysis. For each gene, we derived a maximum value for all endosperm expression values and another maximum for the non-endosperm tissues (seed coat, suspensor, embryo). A log ratio of the maxima was calculated by the following formula: log max endosperm value À 2:3 max nonendosperm value À 2:3 where 2.3 is a minimum threshold of expression that can be taken as the 0 in the LCM dataset. A log sum of the expression was by the formula: logðendosperm value þ nonendosperm valueÞ: Low expressors were filtered by taking a minimum log sum of 1.8 resulting in 15,352 well expressed genes. For this set, we defined the top outliers of the distribution of the log ratio of the maxima using a Tukey outlier box plot where outliers (i.e. genes displaying high expression in endosperm and low expression in seed coat and embryo) are defined as those that exceed the 3rd quartile + 1.5 Ã (interquartile range) of the distribution. This resulted in 940 genes (S11 File).

Parental bias in the interspecific transcriptome
We selected two A. thaliana accessions, C24 and Col-0, for their differential compatibility, respectively, 17% and 0% live seed, when hybridized to the tester, A. arenosa accession Strecno ( Fig. 1-A). To investigate whether parental expression bias contributed to incompatibility we compared early seed ( Fig. 1-B) transcriptomes at 3 day after pollination (DAP) from the two hybrid crosses [31] hereby referred to as the "hybrids", as well as reciprocal intraspecific crosses between A. thaliana Col-0 and C24, referred to as the "controls". The A. arenosa genome displays 5% nucleotide coding region divergence from the well characterized A. thaliana genome. We elected to map sequenced cDNA to the A. thaliana reference genes using an optimized C24 (S1 File, SNPs derived from C24 mRNA-seq libraries [31]) and A. arenosa (S2 File, SNPs derived from A. arenosa accessions Care and Strecno) SNP set to score allelic contribution (S1A Fig.). Control mappings indicated that this entailed an acceptable technical bias: genomic DNA and leaf cDNA from the natural allopolyploid (functionally a hybrid) Arabidopsis suecica representing 50:50 (thaliana:arenosa) sequence mixes yielded a~60% thaliana SNP bias (see Methods for details). We set different SNP thresholds for categories of parental gene expression in intra-and interspecific hybrids: maternal at > 66% reflecting the expectation of 2:1 maternal:paternal genome ratio in endosperm, and paternal arbitrarily at > 50% SNP (both at Chi-square Padj < 0.05; see methods for further explanations) (S1B Fig.). A similar method was also used by Wolff et al. to test allele-specific expression [12].
In the intraspecific control crosses we could determine parental origin for the mRNAs of 10,800 genes ( Table 1, S2 Fig., and S3-S4, S7-S8 Files). About 49% of the genes had a maternal SNP pattern, 50% were biparental, and 1% were paternal (Col-0 X C24, Table 1). This is consistent with seed composition during early development, which is biased toward maternal tissue and, to a lesser amount, endosperm. The trend found in control crosses was magnified in interspecific crosses where out of~15,000 genes whose mRNAs could be assigned to parental genomes (S2 Fig. and S5-S6, S9-S10 Files),~66% were classified as maternal,~33% biparental, and less than 1% were paternal (Table 1).
To compare patterns of parental bias across all experiments we selected 7,198 genes that were represented by a mean of five or more reads in each cross. Using this gene set, the mean paternal bias was slightly higher in the hybrids (4.18 and 6.04 for Col-0 x Aa and C24 x Aa, respectively) compared to controls (3.55 and 6.01 for Col-0 x C24 and C24 x Col-0, respectively), more markedly when C24 was the maternal parent ( Fig. 2). While all means differed significantly according to Wilcoxon signed rank test, the differences were small. We concluded that overall paternal bias was slightly increased in the hybrids, but not grossly altered. Natural variation of parental regulation is mainly evident in the pollen parent We focused on candidate MEGs and PEGs identified in control and hybrid crosses (thereafter in this section referred to as MEGs and PEGs) to determine how parental genotype affected their behavior. In the control crosses, Col-0 X C24 and C24 X Col-0, over 2000 genes with parental bias displayed accession-dependent effects (S3 Fig.). For example, of the 5,311 MEGs in Col-0 X C24, only two-thirds were classified as MEGs in C24 X Col-0 (S3A Fig.). Of the 78 PEGs in Col-0 X C24, only 11 were scored as PEGs in C24 X Col-0, with an additional 74 unique PEGs in C24 X Col-0 (S3B Fig.). In Col-0 X Aa, 51 of 80 PEGs (64%) were scored as PEGs in C24 X Aa (S3B Fig.). Because a number of genes displayed accession-dependent effects, we set out to determine if parentally biased gene expression was preserved across species barriers. Our findings are presented in S4 Fig. The majority of candidate MEGs identified in the intraspecific control crosses were also maternal in the interspecific comparison (S4A Fig.). The 433 MEGs unique to Col-0 X C24 were enriched for cell killing (Padj < 0.05), but only relative to seed genes [31], not TAIR10 (S3 Table). A similar number of MEGs were identified in C24 X Col-0 versus C24 X A. arenosa, with no significant gene ontology enrichment, perhaps due to the small number of unique MEGs. For hybrid crosses, 6,021 genes were classified as candidate MEGs in Col-0 X Aa, but not Col-0 X C24, including six AGAMOUS-LIKE proteins (AGL5, 13,27,30,31,32), known MEGs TRANSPARENT TESTA protein 1 and 2, and many other genes relating to intracellular transport (S3 Table). In C24 X Aa, we also identified TTG1 and AGL13 as MEGs.
We were concerned that the sets of candidate maternally-expressed genes were biased for maternally-derived seed coat proteins and that these genes were impairing our ability to detect differences between accessions. After filtering for endosperm specificity (see Methods, S11 File) [40], we defined approximately 200 candidate MEGs for each cross type, with greater than 50% of MEGs shared between hybrids and controls of the same maternal accession (S4A Fig.).
PEGs displayed differential regulation in the intraspecific comparison. Less than 5% of all PEGs in both Col-0 and C24 hybrid were identified as PEGs in intraspecific control crosses (S4B Fig.). Since the PEGs in the above analysis displayed accession-dependent parental bias, we wanted to extend the analysis to a larger gene set and used the 7,198 genes defined above that are expressed in both control and hybrid crosses. Intraspecific and interspecific crosses share the same maternal seed parent: Col-0 and C24. They differ, however, in type and number of paternal seed parents: intraspecific crosses use two different parents, while interspecific crosses use a common pollen seed parent. The effect of divergent versus identical pollen seed parent can thus be compared and, as evidenced in Fig. 3, a better correlation of paternal bias is seen in the latter (R squared 0.54 vs 0.17, P < 0.001).
In summary, comparison of Col-0 and C24 reciprocal crosses revealed that PEG expression was not conserved (S4 Fig. and Fig. 3). Substantial conservation, however, was evident when the two accessions of A. thaliana were used as seed parent and mated to the same pollen parent, even if the latter belonged to a different species. Taken together these results suggested that paternal expression was largely dependent on the pollen parent. This is consistent with variation described when comparing imprinted gene identified with different combination of A. thaliana accessions [12]. We decided to focus on PEGs, which should not be biased by maternallyderived tissue contamination.

The fate of PEGs during interspecific hybridization
To understand the status of paternal gene expression post-hybridization, a literature search was performed to identify previously characterized imprinted genes. We found references to 200 MEGs and PEGs, with approximately 65 genes with predicted paternal biased expression in the endosperm and embryo [3,7,[10][11][12]16,[41][42][43][44] (S4 Table). Using alignments to A. thaliana cDNA and data from our de novo SNP analysis, we were able to determine that some of these genes were expressed in hybrid and control crosses (S4 Table). Since reciprocal interspecific crosses were not possible because of unilateral incompatibility between A. thaliana and A.
The analysis of paternally-biased genes confirmed the disruption of paternal expression observed in the overall analysis above. Of 65 previously described PEGs (S12 File), 52 were expressed with sufficient coverage for SNP detection. None were paternally biased in Col-0 X Aa ( Table 2, Fig. 4, and S9 and S12 Files). We observed maternal bias for 17 genes ( Table 2, Fig. 4, and S5 File) including SU(VAR)3-9 HOMOLOG 9 (SUVH9), ATFXG1, CHROMATIN RE-MODELING 34. In the case of characterized PEG PHERES1, expression was entirely dependent on the maternal allele as all reads matched the Col-0 sequence perfectly ( Table 2, S5 Fig., and S13 File). We set out to validate the observed loss of paternity with additional independent replicates from 3 DAP seed. Our findings are presented in Table 3. We identified 11 previously described PEGs that were PEGs or biparental in Col-0 intraspecific crosses, but we could only identify SNPs between Col-0 and C24 for eight of these genes. Of the genes with diagnostic SNPs, 8/8 were confirmed to be paternal with > 63% C24 SNP in Col-0 X C24 hybrids (ADM, PHE1, and ATFXG1 to name a few). In contrast, these genes manifested a maternal footprint in Col-0 interspecific hybrids at 3 DAP. Of the 7/11 PEGs with A. arenosa SNPs, 5/7 were maternal (<15% A. arenosa allele), including PHE1, ATFXG1, and SUVH7. AT2G40520 and YUCCA10 were biparental in Col-0 hybrid, paternal expression with an average of~71% A. arenosa across nine parent-specific SNPs. The remaining four genes had sufficient read coverage for SNP detection, but we were unable to assign reads to either parent. Interestingly, most of these A. thaliana PEGs, with the exception of alpha-fucosidase AtFXG1 (AT1G67830), displayed traces or no detectable expression in A. arenosa x A. arenosa crosses. The lack of expression is surprising since two of these genes, PHERES1 and ADMETOS, have been attributed a function [8,45]. In the case of PHERES1, expression assays based on RT-PCR on A. arenosa seed cDNA with genome-specific primers failed to yield positive data in the time range tested (2-8 DAP, results not shown). It is thus possible that these proteins have no function in early seed development of A. arenosa.

PEG candidates of A. arenosa
Our analysis found a category of potential PEGs that were unreported as such in Arabidopsis thaliana (Fig. 4, S4 Fig., S14 File). Failure to identify biased parental expression of these genes in the control crosses has at least three explanations. Firstly, these genes could be similarly biased in A. thaliana intraspecific crosses, but escaped detection because no SNPs were available to identify them. However, while most of these genes displayed SNP in C24, they were poorly or not expressed in our 3 DAP seed data. Second, they could be induced paternally in response to hybridization and represent a set of genes not normally expressed in seed, but sensitive to a hypothetical regulatory disruption. Third, their paternal expression could be specific to A. arenosa, but not A. thaliana seed development. Consistent with the last explanation, expression of these genes was low in A. thaliana, but much higher in A. arenosa x A. arenosa crosses (normalized mean read count A. arenosa = 2161 reads, S.D. = 4764; mean read count Col-0 X C24 = 59 reads, S.D. = 238 reads). Closer examination of the PEG candidates was interesting. Categorical classification of the predicted protein products was consistent with that of genes expressed during seed development as they were enriched for cell wall modifying activities, secretion, and ribosome constituents. Of the 77, 62 were represented in the AtGenExpress Affymetrix array set and 40 had sufficient expression values to be assessed for endosperm-specific expression. By requiring high expression in endosperm and low expression in embryo and maternal tissue (see Methods), we defined 940 high-confidence endosperm specific genes in the Harada seed development set [40]. When this conservative set was compared to the A.arenosa PEG candidates, 13 were common to both sets (expected mean = 2.46, hypergeometric P = 4.12E-7), indicating by a stringent test that 32.5% of the PEG candidates (13/40) displayed endosperm-specific expression in A. thaliana (see sample in Table 4). An unusual regulatory pattern involved gene AT5G59845: expression of the paternal allele was associated with suppression of the maternal allele, which was well expressed in the controls. This response was unexpected because this gene, encoding a gibberellin responsive factor, is predominantly expressed in the seed coat in A. thaliana [40]. In the interspecific hybrid, however, this normally highly expressed gene was abated and the lower-level paternal transcript was predominant. Another regulatory pattern involved maintenance of biparental or paternal expression and is exemplified by gene AT2G28680, encoding a cupin family protein, which is expressed predominantly in the embryo suspensor and micropylar endosperm. It displayed a paternal or biparental pattern in all crosses. The most common regulatory pattern, however, involved strong induction in the hybrid compared to the A. thaliana parent, but moderate induction with respect to the A. arenosa parent.
In summary, our results reveal widespread changes in PEG regulation during wide hybridization, consistent with the earlier observation on PHERES1 [27]. The early, large-scale PEG remodeling occurring at a critical time of seed development could be a major contributor to interspecific incompatibility.

DISCUSSION
We examined the pattern of parentally regulated genes during interspecific hybridization. We compared two polymorphic A. thaliana accessions crossed to each other or to the species A. arenosa, a cross in which the chosen accessions display differential incompatibility. Candidate imprinted genes are detected by genotyping allelic parental contribution to the mRNA of progeny. In our case, the interspecific comparison was facilitated because polymorphisms are at least 10 times as frequent as in the intraspecific comparison between A. thaliana accessions. We chose to assay an early developmental stage, globular embryo at 3 DAP, because incompatibility manifests first at this time [31]. Each seed was sampled whole due to the challenge of separating the 90 to 120 endosperm nuclei from embryo and maternal seed integument tissue. Endosperm mRNA was thus obtained mixed with embryonic and maternal RNA. Studies employing torpedo-stage seed (7-8 DAP) [10,11], could more easily separate endosperm from maternal tissues. In our sampling, we expected that maternal contribution should appear enhanced and paternal contribution diluted for genes expressed in maternal tissue. Sampling of embryo mRNAs, which is currently believed to be maternally biased [17,18] or express a 50-50 mix of parental alleles [16], could also alter the measured parental bias. Nonetheless, mRNAs representing paternal alleles was readily identified by their divergence, and could only result from zygotic tissues. Large paternal bias mRNAs should most likely derive from endosperm because imprinting is very rare in embryos. As a result, we could ascertain the effect of interspecific parental bias for paternally expressed genes including characterized imprinted genes. Our analysis revealed that paternally expressed genes were perturbed in response to hybridization, but that the degree of perturbation did not vary between crosses where the seed mother was incompatible accession Col-0 and compatible C24. In addition to PHERES1, at least nine known PEGs become maternally-biased during interspecific hybridization ( Table 2 and  Table 3). Increased maternal expression could derive from maternal tissue, which is the same as in the intraspecific crosses, or by expression of the maternal allele in the endosperm, a hypothesis which we favor because it is more parsimonious. Such conversion of PEGs into MEGs would be unexpected in the context of our current understanding of imprinting mechanisms. PEG regulation involves expression of the paternal allele with concurrent suppression of the maternal one. Suppression of the female allele is thought to be the result of an epigenetic state determined in the female lineage. Derepression of maternal alleles in these reverted loci indicates that this repressed epigenetic state is not hard set. Rather, post-fertilization factors provided directly or indirectly by the male must be required to either maintain or determine the state. Interestingly, expression in A. arenosa sib-crosses could not be documented for most of the A. thaliana known PEGs that were found reverted in this study. This could be a technical problem: for unknown reasons, seed and seedling stage RNA-seq libraries of A. arenosa are challenging to make and were produced only through multiple attempts. As a result, there was lower overall sequencing coverage in this genotype and several of these genes might be expressed at low level. In the case of PHERES1, however, reverse transcription PCR with A. arenosaspecific primers failed to demonstrate any expression during early A. arenosa seed development. At least in the case of this gene, it is thus possible that the factor required for maternal PHERES1 suppression may be connected to the paternal PHERES1 transcript itself. Increasing the ploidy of the seed parent A. thaliana relieves the maternal induction of PHERES1 even though paternal PHERES1 mRNA is not observed [27], indicating that absence of the cognate paternal mRNA is not sufficient for maternal allele derepression. Under incompatible conditions, therefore, a mechanism resulting in apparent overcompensation activates the maternal allele.
Our observations can be interpreted in two ways. According to the first explanation, it is possible that interspecific hybridization causes a systematic failure of PEG regulation leading to both the ectopic expression as PEGs of genes that are PEGs in either parental species, and to the conversion of most PEGs to MEGs. Many of the changes observed during interspecific hybridization are consistent with failure of the Polycomb Repressive Complex 2 (PRC2) [31], which plays an important role in the regulation of imprinting [8,9,46]. Failure of PRC2 could thus result in the syndrome described here.
A second, alternative and perhaps additive explanation is based on rapid evolution of imprinted gene regulation [26,47,48]. Three PEG-candidates behaved as MEGs in our control intraspecific crosses: AT3G03980, AT5G39260, and AT5G59845. While conversion of PEGs to MEGs was not widespread in crosses between A. thaliana accessions, it is plausible that further divergence between A. thaliana and A. arenosa might contribute to the frequent conversion observed in interspecific crosses. Further, Fig. 3-A illustrates the low correlation in paternal bias during reciprocal intraspecific crosses. Surprisingly, a better correlation is observed when different seed-parents are mated to the same pollen-parent of a different species (Fig. 3-B). This suggests an important role for pollen parents in determining PEGs. The eight failed PEGs, could thus be explained by the fact that these genes are not PEGs in A. arenosa.
Seventy-seven additional genes behaved as PEGs in interspecific hybridization, although they were not detectable as PEGs in A. thaliana. We define these as A. arenosa PEG candidates. Verification of their pattern of parental origin in A. arenosa x A. arenosa crosses was not attempted because this analysis is complicated by high heterozygosity in individuals of this species. The set of A. arenosa PEG candidates are enriched for endosperm genes, including genes predicted to encode proteins whose categorical classification is consistent with that of imprinted genes, being significantly enriched for proteins involved in cell wall metabolism and cell growth (S5 Table) [10]. Is the number of these potential PEGs too high, given that relatively few PEGs have been discovered in prior plant studies? Perhaps, PEG numbers have been underestimated. Under rapidly evolving imprinting [26,48], reciprocal crosses of two diverged accessions could fail to identify genes that are imprinted in one and not in the other. Under this scenario of rapid evolution, one might expect to find similar variation in MEGs. We did not focus on MEG regulation because of the presence of maternal tissue in our sample. Nevertheless, conversion of MEGs to PEGs would have been readily detected by the appearance of characteristically A. arenosa alleles and we can thus feel confident that such conversion was not common. The instability of PEG regulation could result from differences in the evolutionary speed of regulatory innovation between PEGs and MEGs.
In conclusion, the observed instability in paternally expressed genes during interspecific hybridization indicates either regulatory disruption or rapid evolution of imprinting. While this is likely to represent a major hybridization barrier between diverging populations, the different phenotype displayed by A. thaliana accessions Col-0 and C24 during hybridization with A. arenosa could not be explained by differences in imprinted gene regulation.
Supporting Information S1 Fig. Experimental design: parent-of-origin expression detection. (A) High probability parent-specific single nucleotide polymorphisms (SNPs) (95% SNP at 5 read coverage) were identified for A. thaliana accession C24 (S1 File) and A. arenosa accession Strecno (S2 File) to determine parental contribution in intraspecific (Col-0 X C24 and C24 X Col-0) and interspecific (Col-0 X A. arenosa and C24 X A. arenosa) hybrids. (B) Genes that were expressed in both intraspecific and interspecific hybrids were classified based on paternal contribution (S3-S6 Files). Chi-square analysis in JMP was used to test if paternal contribution differed within species (Col-0 X C24 versus C24 X Col-0) and between species (Col-0 X C24 versus Col-0 X A. arenosa). (C) Paternal contributions in (B) were then tested against a set of genes with known paternal contributions. (EPS) S2 Fig. Identification of parent-specific single nucleotide polymorphisms. Single nucleotide polymorphisms (SNPs) were detected relative to A. thaliana cDNA for C24 and A. arenosa (Aa) at 3 days after pollination. These SNPs were observed in intraspecific hybrid crosses (Col-0 X C24 and C24 X Col-0) and interspecific hybrid crosses (Col-0 X Aa and C24 X Aa). The number of genes with detectable SNPs at coverage of ! 5 reads per SNP are given by colored circles. (EPS) S3 Fig. Analysis of accession-specific parental contributions. A Chi-square analysis of SNP ratio was used to classify genes parent-of-origin. Candidate maternally expressed genes (MEGs) exhibited more than 66% maternal contribution while paternally expressed genes (PEGs) manifested as more than 50% paternal (Padj < 0.05). The Venn diagrams above depict genes with differential paternal contributions from (A) mom and (B) dad between accessions Col-0 and C24 in intra-and interspecific crosses. Aa; A. arenosa. (EPS) S4 Fig. Intra-and interspecific hybrids vary in paternal expressed genes. A) Comparison of maternally biased genes between intraspecific (either Col-0 X C24 or C24 X Col-0) and interspecific hybrids. The majority of genes that were identified as maternal in Col-0 X C24 were also expressed from A. thaliana Col-0 in the interspecific hybrid. B) Comparison of paternally biased genes (genes with < 50% SNP). While paternally expressed genes were identified in all crosses, there was little overlap between genotypes. To confirm the parent of origin of imprinted gene PHERES1 in interspecific hybrids, RNA-seq reads (red, forward read or blue, reverse read) from Col-0 X A. arenosa (Aa) were aligned to TAIR10 cDNA and visualized using Integrative Genomics Viewer. Vertical lines within reads indicate the presence of single nucleotide polymorphisms (from variation in A. arenosa or sequencing errors). No SNPs are apparent for PHERES1 (left). PHERES2 (right) is included as control to ensure that the above observed maternal gene expression of PHERES1 is not due to mismapping of paternal reads to a close homolog PHERES2. (EPS) S1