Host-Parasite Interactions and Purifying Selection in a Microsporidian Parasite of Honey Bees

To clarify the mechanisms of Nosema ceranae parasitism, we deep-sequenced both honey bee host and parasite mRNAs throughout a complete 6-day infection cycle. By time-series analysis, 1122 parasite genes were significantly differently expressed during the reproduction cycle, clustering into 4 expression patterns. We found reactive mitochondrial oxygen species modulator 1 of the host to be significantly down regulated during the entire infection period. Our data support the hypothesis that apoptosis of honey bee cells was suppressed during infection. We further analyzed genome-wide genetic diversity of this parasite by comparing samples collected from the same site in 2007 and 2013. The number of SNP positions per gene and the proportion of non-synonymous substitutions per gene were significantly reduced over this time period, suggesting purifying selection on the parasite genome and supporting the hypothesis that a subset of N. ceranae strains might be dominating infection.


Introduction
Microsporidia are unicellular spore-forming intracellular parasites within the fungal lineage, parasitizing a broad range of vertebrate and invertebrate hosts [1]. The outer proteinaceous and inner chitin structured cell wall layers protect the spores against environmental stress for long term survival [2]. Spores germinate in response to proper pH and ionic condition in the midgut and inject the infective sporoplasm into the host cells with extruded polar filament. Cell proliferation starts from meronts, leading to daughter cells (sporonts) and finally the formation of mature spores [1][2][3][4][5]. Infected cells are filled with spores later during infection and the cell may burst to release the spores. The spores could infect neighboring epithelia cells or be expelled through feces to infect new individuals. Microsporidia parasites generally have compact genomes [1,[6][7][8]. Classic mitochondria are absent save for tiny mitochondrially derived organelles called mitosomes, suggesting that the parasite fuels its own fast proliferation with energy from the host [9]. At the same time, the parasite needs to complete at least one replication cycle, before killing the infected host [10,11] Nosema ceranae is an emerging microsporidian parasite of the European honey bee, Apis mellifera. After becoming infected by N. ceranae, honey bees consume more sugar solution, which suggests energetic parasitism [12]. Infection is also reported to reduce life span and suppress the immune response [13,14]. A survey of selected set of genes in N. ceranae suggests that this parasite is highly diverse [15] and the parasite is hypothesized to suppress apoptosis of infected cells for its own reproductive advantage [10]. While prior work has measured consequences of Nosema infection on the host [16], the mechanisms used by the parasite to manipulate host cells to establish infections are unclear. In order to study selection on N. ceranae protein-coding genes and to identify putative "virulence genes", we infected the honey bees with N. ceranae spores and deep sequenced transcripts of both host and parasite at 24 hour intervals for the complete infective cycle (6 days post infection). Through a simultaneous analysis of both host and parasite responses during distinct infection stages and genetic diversity analysis at the genome level, we are able to provide new insights into mechanisms of microsporidia pathogenesis.

Ethics statement
The apiaries for bee sample collection are the property of the USDA-ARS Bee Research Laboratory, Beltsville, Maryland, USA. No specific permits were required for the described studies. Studies involved the European honey bee (Apis mellifera), which is neither an endangered nor protected species.
Infection and sampling N. ceranae spores were isolated from worker honey bees in heavily infected colonies in 2013. The midguts of infected workers were homogenized in distilled water, filtered through Whatman filtering paper and centrifuged at 3220 g for 10 min. The pellet was purified using a Percoll gradient procedure [7]. Spores were counted using a Fuchs-Rosenthal haemocytometer and the N. ceranae species status was verified by a standard PCR protocol [17]. One hundred fifty freshly emerged workers were individually fed with 2 μl 50% sucrose solution containing 10 5 N. ceranae spores. An additional 150 freshly emerged workers were individually fed with 2 μl 50% sucrose solution without spores, as a control. Each set of 50 workers were housed in a sterile plastic cup at 34 ± 1°C, 60% relative humidity [18]. In total, three cups of infected workers and three cups of control workers were constructed respectively. Five workers were sampled from each cup at 24 hour intervals for six days post-infection. After being anesthetized with CO 2 , the midguts of all sampled workers were used to extract total RNA with TRIzol immediately after the sampling. Total RNA of 15 infected and 15 control workers were pooled separately daily. Six mRNA libraries were constructed for the infected workers (one mRNA sequence library for each post infection day). Additionally, six mRNA libraries were constructed for the pooled control workers. Since N. ceranae targets the epithelial cells of the midgut, we could quantify the expression level of transcripts of both host and parasite from infected workers.
Genomics Workbench [19]. After trimming adaptors, two mismatches are allowed with deletion cost of three, insertion cost of three, length fraction of 0.8, similarity fraction of 0.8 and maximum 10 hits for a read. For the control workers, reads were mapped to honey bee assembly Amel 4.5 [20]. For the infected workers, reads were aligned to both honey bee assembly Amel4.5 and N. ceranae assembly ASM18298v1 simultaneously. In order to assess the misassignment of the reads between the host and parasite, we re-aligned the reads that previously mapped to the host back to the parasite genome. On average, 10 out of 1 million reads were mapped to both host and parasite. The impact of these shared reads on quantifying gene expression is miniscule, and these reads were excluded from different gene expression statics. For the honey bee host, the impact of infection was analyzed by comparing gene expression levels between uninfected and infected bees for each time interval. The original counts were normalized with weighted trimmed mean of M-values (TMM) to calculate relative expression levels of mRNA to calculate statistically differentially expressed transcripts edgeR Package [21]. As there is no biological replicate, the common dispersion of 0.02 was used. For the parasite, maSigpro package was used to analyze time series gene expression patterns cross the reproduction cycle [22].

N. ceranae SNP position identification and annotation
We combined all the reads aligned to N. ceranae genome from of all six libraries to search expression variations (> 100 coverage). Single nucleotide polymorphism (SNP) positions were identified by PICARD-GATK-SNPEFF pipeline [23]. In order to compare the selective pressure on the parasite genetic diversity, we identified SNP positions with the parasite collected from the same apiary in 2007. Raw reads (454 sequencing) used to assemble the N. ceranae genome were downloaded from NCBI [7]. The identified SNP were annotated by snpEff package [24]. We were able to measure pairwise differences in SNP counts for 1479 predicted genes in the N. ceranae genome for which conservative SNP positions had been identified in both 2007 and 2013. Mean SNP counts were estimated, along with means for non-synonymous and synonymous SNP positions. Paired analysis of the same sets of 1479 genes were used to avoid biases due to gene length, i.e., the gene space used for SNP analyses in both current and historic samples was identical. In addition, for those gene models (n = 465) containing five or more SNP positions in both 2007 and 2013, we regressed total SNP count, non-synonymous, and synonymous counts between samples. We also estimated the ratio of non-synonymous to total SNP positions for each of the 1479 gene models to assess selection on proteins [25].

Verification gene expression levels with quantitative real time PCR
Since antimicrobial peptides (AMPs) have been reported to respond towards Nosema infection, albeit in contrasting ways across different studies [13,17], two host AMPs, Apidaecin and Hymenoptaecin, were selected alongside two parasite genes, Dicer and PIWI to verify the sequencing data by RT-qPCR. We chose Nosema Dicer and PIWI, since they are involved in the synthesis of small interfering RNAs, arguably candidates for determining whether N. ceranae has a functional RNAi pathway. Five infected and uninfected honey bees were pooled as one replicate respectively. Three replicates of each time point were used for quantitative real time PCR assays from 1 dpi to 6 dpi. The relative gene expression level of the host was standardized to GAPDH which is known to remain stable during Nosema infection [14]. The relative Ct value of two parasite genes were used for the validation of the gene expression of the two techniques. The amplification efficiency of the primers were calculated by qpcR package [26]. T-tests were used to determine the significant difference of relative Ct values.

Time series expression of parasite genes
We detected 1988 different genes from the total predicted N. ceranae gene set (n = 2060), across all libraries during the experimental period. Expressed parasite transcripts increased from 28.9% of the predicted genes at 1 dpi to 96.4% at 6 dpi (Fig 1). Total reads mapped to the N. ceranae genome increased from 5 Ã 10 3 at day 1 dpi to 5 Ã 10 6 at 6 dpi, confirming fast proliferation at later phases of N. ceranae infection.
In total, 1122 parasite genes were significantly differentially expressed at some point during the infection period (S1 File), clustering into 4 different gene expression patterns based on the normalized gene expression level (Fig 2). There were 595 parasite genes expressed as early as 1 dpi, including RNA polymerase, DNA polymerase, ribosomal proteins, APTase, tRNA synthesis, and transporters. Once the sporoplasm of the parasite enters the host cytoplasm, spore replication is initiated. Actin is potentially important to support the structure and vesicle traffic of the membrane. N. ceranae actins (NCER_101664, NCER_101382, NCER101064, NCER_100589, p < 0.01) were significantly differently expressed over the infection period, particularly for the early infection phase (cluster 1 and cluster 2). The Nosema cell differentiation regulator (NCER_101467, cluster 4, p < 0.001) and histones (NCER_101385, cluster 4, p < 0.001) were highly expressed during the period of highest cell division. Interestingly, thioredoxin (NCER_101950, cluster 3, p < 0.001) and heat shock protein 70 (NCER_101994, cluster 1, p < 0.001) with a predicted signal peptide were also found highly expressed at the cell-division phase.
The expression of core enzymes catalyzing glycolysis was detected as early as 1 dpi. Hexokinase (NCER_101108, cluster 1, p < 0.001) was expressed from 2 dpi onward and this Over 28% of N. ceranae transcripts were detected to express as early as 1 dpi. The expressed transcripts increased dramatically from 42.5% at 2 dpi to 91.5% at 3 dpi, followed by a steady increase to 96.4% at 6 dpi. The largest increase in the expressed transcripts was between 2 dpi and 3 dpi, which might reflect an increase in meront reproduction and sporonts formation.

Putative N. ceranae virulence genes
The apoptosis inhibitor gene (NCER_100918, cluster 4, p < 0.001) is interesting because it could inhibit apoptosis by binding activated caspase. Expression of this apoptosis inhibitor was detected from 2 dpi onward. The highest expression level occurred on 2 dpi and then decreased during the experimental period. In addition, genes for two proteins involved in small regulatory RNA production, Argonaut (NCER_101240, cluster 2, p < 0.001) and Dicer (NCER_100079, p > 0.05), were expressed as early as 1 dpi. The toxin ricin (NCER_101673) was expressed from 2 dpi onward.
Single Nucleotide Polymorphisms (SNPs) and loss of SNPs in N. ceranae genome Neither introns nor alternative splicing were detected, confirming the N. ceranae genome is compact and genes generally consist of a single exon. We found substantial variation across the reads at the nucleotide level. 6525 SNP positions were identified from coding DNA sequence (CDS) regions based on our RNA sequencing reads (S2 File). In order to determine the selection direction, we identified the SNP positions from N. ceranae collected in 2007 from the same apiary. Its DNA sequencing (454 Life Science Roche) data has been used to assemble N. ceranae genome and downloaded from NCBI ( (Fig 3B), while the non-synonymous SNP level decreased from 4.58 ± 0.11to 1.85 ± 0.59 (Fig 3C). Accordingly the ratio of non-synonymous to total SNP dropped from 0.50 ± 0.0068 (Mean±SE) to 0. 43

Expression of host during infection
After N. ceranae infection, expression levels of 206, 121, 126, 201, 242, and 313 honey bee transcripts were significantly differently expressed between infected bees and controls at time points from 1 dpi to 6 dpi respectively (S3 File). Out of those, 13 transcripts were significantly regulated in the same direction during the entire infection period (Fig 4).
Besides the AMPs, the melanization agent prophenoloxidase subunit A3 (PPO) was significantly down-regulated in infected bees at early infection (1 dpi, p < 0.01) and then showed a similar expression with the control bees. Thiolester-containing protein 7 (TEP7) is an antimicrobial protein of JAK/STAT pathway. The expression level of TEP7 tended to be down-regulated from 1 to 3 dpi and then up-regulated from 4 to 6 dpi, although statistical significance was only detected at 5 dpi (p < 0.01) ( Table 1).

RNA interference pathway responses of the host
The expression level of transcripts for the honey bee Dicer protein (Dcr-1) tended to be higher for infected bees than controls at 1 dpi. The R2D2 orthologue (PRM1) showed significant upregulation at 1 dpi for infected bees (p < 0.01) after which expression levels were similar between infected and control bees, a pattern shared by Argonaute 2 (AGO2). The expression  level of helicases (Melt, bel, DDX17, LOC726768, LOC100576828) involved in maintaining the structure of the RNA induced silencing complex was consistently high, but did not significantly differ between the infected and control bees. The expression level of Aubergine was down-regulated in infected bees when compared to control bees, at two days post-infection, after which expression levels for this protein were similar between infected and control bees.

Apoptosis pathway responses of the host
Apoptosis can be activated by either blood cells or mitochondria. As there is no report suggesting the existence of hemocytes in the honey bee midgut, mitochondria oriented apoptosis is presumably the mechanism to respond to N. ceranae infection, if there is a response. The key genes involved in apoptosis were expressed from 1 dpi onward. Anti-apoptotic (LOC409189, LOC100579020) as well as pro-apoptotic (LOC411613) genes in BCL-2 family showed similar expression levels between infected and control honey bees. Caspases (LOC411381, LOC724390, LOC412235, LOC10057648), inhibitor of apoptosis (LOC726172, LOC726899, lap2, Det), tumor suppress protein P53-like (LOC10057712, LOC10057853) and cytochrome c (CytC) all had no significant differences between infected and control bees. However the expression level of reactive mitochondrial oxygen species modulator (LOC727012) was significantly down regulated for all six days post infection (p < 0.01) ( Table 1).

Quantitative real-time PCR verification
The expression levels of selected host and parasite genes were significantly correlated between the two techniques over 6 days post infection (p < 0.05) (Tables A and B in S6 File). Apidaecin and Hymenoptaecin both showed an up regulation at 1 dpi in the infected bees compared with control bees. The expression levels decreased at 2 dpi, followed by up regulation at 3 dpi and 4 dpi (p < 0.01). The expression level was down regulated again at 5 dpi and up regulated at 6 dpi (p < 0.01). These patterns based on quantitative real time PCR matched RNA sequencing analyses well (Fig 5). The parasite genes PIWI and Dicer were not detected from control bees yet the expression of PIWI was detected as early as 1 dpi in infected bees. The relative expression level between PIWI and Dicer is consistent between RNA-seq and RT-qPCR results (Fig 5).

Discussion
We provide the first time-series gene expression analysis of N. ceranae parasite during infection, showing over 95% of the gene models predicted for this species are active at some point during infection. Among the 72 undetected N. ceranae gene models, only 5 have predicted functions based on protein domains, while the remaining 67 are hypothetical with unknown function. These undetected transcripts models might be expressed soon after infection, i.e., prior to our first sampling time point. Alternatively, these might reflect inaccurate gene models, or pseudo-genes that no longer carry a function in this parasite. Bee responses in relation to N. ceranae infection were strong during the initial (1 dpi) and late infection phase (6 dpi). In a previous study, midgut expression of 336 honey bee genes was significantly altered by N. ceranae infection at 7 dpi with genes involved in oxidation reduction being significantly over expressed in the infected bees [27]. Genes involved in transportation and metabolism shown by microarray analyses of fat body and midgut tissues were significantly altered in response to Nosema infection [28]. Our data also showed that genes involved in oxidation-reduction processes as well as transporters were significantly up-regulated by infection, which might promote host survival and fuel the parasite reproduction respectively. Moreover, we found that venom proteins were significantly up-regulated at 3 dpi and 4 dpi, followed by a down regulation at 5 dpi and then up regulation again at 6 dpi. Surprisingly, the antimicrobial peptides showed a similar expression pattern. It is unclear why both AMPs and Fig 5. (A) Count ratios derived from mRNA sequencing for parasitic PIWI and Dicer. Y = Log((C PIWI /C DICER ), 2), where C PIWI represents normalized counts of parasitic gene PIWI and C DICER represents normalized counts of parasitic gene DICER. The expression of Dicer was only detected starting at 2 dpi, hence data for 1 dpi is missing. (B) Real time quantitative PCR originated expression value of the ratio between parasitic PIWI and Dicer. Y = Log((Ct DICER / Ct PIWI ), 2), where Ct DICER represents the Ct value of gene DICER and Ct PIWI represents the Ct value of gene PIWI. The expression level of PIWI is consistently higher than Dicer with the same expression pattern as sequencing data, which confirmed the validity of the sequencing data. (C) mRNA sequencing originated counts of immune gene Apidaecin, which shows significant change over the infection time with a similar expression pattern of qPCR results showed in (D). (E) The mRNA sequencing data of Hymenoptaecin again showed significant change over the infection time and shared the same expression pattern as the qPCR results in (F). The consistent expression pattern between the sequencing data and the qPCR results validated the accuracy of the sequencing data. Error bar represents standard error. * represents the significant level at 0.05. ** represents the significant level at 0.01. venom proteins showed a general suppression at 5 dpi. As one possible explanation, the reproduction cycle of N. ceranae is approximately four days. The suppressed expression level therefore might result from the parasite-mediated death of infected honey bee cells. As Nosema daughter spores re-enter the midgut lumen, AMPs might be activated again, explaining why AMPs were significantly up-regulated at 1 dpi and 6 dpi. As the RNA was extracted from midgut tissues, part of the expression of genes related with immunity could have been missed after discarding the fat body.
As an intracellular parasite, the nutrients used for Nosema replication and growth fueled by ATP come mainly from the host. Glycolysis and oxidative phosphorylation are the main catalytic reactions involved in energy generation. Initially, we expected an up-regulation of honey bee genes involved in glycolysis and oxidative phosphorylation as a way to counter the energetic burden of infection. Surprisingly, we did not find significant regulation of transcripts in either glycolysis or oxidative phosphorylation pathways, despite the constant consumption of ATP by the parasite. The genes encoding core enzymes catalyzing glycolysis were constantly highly expressed for the entire infection period including hexokinase (LOC551005), phosphofructokinase (LOC724724) and pyruvate kinase (LOC552007). The genes encoding the core enzymes that catalyze oxidative phosphorylation were highly expressed as well, specifically citrate synthase (LOC410059), isocitrate dehydrogenase (LOC551276) and succinate dehydrogenase (LOC725566). In the Encephalitozoon microsporidia parasite, the total ATP concentration within infected cells was surprisingly not significantly different from uninfected cells in spite of the energy demands imposed by the parasite [29]. ATP concentration regulates the catalytic activity of phosphofructokinase and pyruvate kinase. High ATP concentrations could inhibit glycolysis and oxidative phosphorylation and vice versa. However infected bees do consume more sugar water than control bees to maintain a constant ATP concentration, which indicates the infected bees are indeed energetically parasitized [12]. The variation of catalytic activity of the core enzymes of glycolysis oxidative phosphorylation may not be manifested from the expression level but rather by ATP concentration. Furthermore, the expression levels for all ATP/ADP transporters, ABC transporters and glycolysis enzymes decreased at later infection phases (6 dpi), suggesting that less energy is required during the spore-forming phase In order to maximize reproduction, N. ceranae must complete at least one reproductive cycle before killing the infected host. Genes involved in reducing oxidative damage, protein folding and promoting cell growth were detected early during infection. Indeed, in spite of abundant daughter spores filling the cytoplasm, infection does not induce either apoptosis or necrosis before completing the reproductive cycle [10,30,31]. N. ceranae most likely regulates host cell apoptosis at two levels: by blocking the activated caspase enzymes and by blocking the releasing of cytochrome c from the mitochondria. Caspases play central roles in apoptosis. Even though caspases are activated, apoptosis still could be regulated by inhibitors of apoptosis (IAPs), which could bind and inactivate the caspases to suppress the apoptosis [32]. The N. ceranae genome contains one IAPs gene (NCER_100918), which was expressed in infected bees as early as 2 dpi. The Bcl2 proteins, a family containing both anti-apoptotic and pro-apoptotic proteins, regulate the release of cytochrome c and other inter-membrane proteins. In principle, the anti-apoptotic proteins of Nosema could bind to host pro-apoptotic proteins to balance levels of apoptosis, i.e., once host genes coding pro-apoptotic proteins are over-expressed, apoptosis is triggered [33]. From our data, we did not find significant differences among Bcl2 proteins. However, the reactive oxygen species modulator 1-like gene (ROS) was significantly down-regulated in infected bees compared with non-infected bees for all six dpi. Over expression of ROS is known to activate the pro-apoptotic Bcl-2, thereby inducing apoptosis. On the other hand, reduced ROS could promote cell proliferation and protect the integrity of mitochondrial membranes, limiting apoptosis [34]. Maintaining functionality of host mitochondria is vital for parasite energetics and reproductive success. Therefore, protecting host mitochondrial membrane integrity to prevent the release of cytochrome c is a likely method used by the parasite to regulate host apoptosis.
Ricin is a cytotoxic protein with a lectin domain, which could inhibit protein synthesis by interfering with ribosomes [35,36]. As we noticed, a Ricin-type beta-trefoil lectin protein-coding gene was expressed from the parasite transcript as early as 2 dpi. Moreover, this ricin protein has a predicted signal peptide, which suggests it might be secreted into the host's cytoplasm to inhibit host protein synthesis to slow down the infected cell metabolism. In a fish-infecting microsporidian parasite, Spraguea lophii, a secreted ricin protein has been identified as a virulence protein [37]. From our data, the expression levels of the ribosomal proteins of the host were generally significantly suppressed in the infected bees compared with the control bees. The parasite may secrete the toxic ricin into the bee's cytoplasm to reduce the host's protein synthesis, which could be a general parasitism method used by microsporidia [1,7,37,38].
N. ceranae has become a wide spread parasite in the honey bee A. mellifera since the first report in 2006 [39,40]. After a worldwide expansion into this new host, N. ceranae still shows a high level of genetic diversity, which was even higher within colonies than among colonies [15,41]. However, it is unclear how selection pressures during this dispersal have affected N. ceranae. By comparing the N. ceranae spores collected in 2013 with those collected in 2007, we confirm that N. ceranae in our research apiary maintains substantial genetic variation. Still, we found that both total number of SNP positions cross all homologous regions and the number of SNP positions per CDS were significantly reduced. A disproportionate fraction of this reduction occurred for non-synonymous SNP positions, suggesting purifying selection on N. ceranae protein-coding genes, which was also suggested from gene analysis [15,42]. A similar phenomenon has also been observed for Deformed wing virus (DWV) in honey bees. The diversity of this virus was significantly reduced after establishing infection and in particular after population growth driven by a biological vector, the parasitic mite Varroa destructor [43]. In this case, selection is not based on de novo adaption of protein sequences but rather the tremendous success of particular existing strains that were provided a competitive advantage with the arrival of mite vectors. As genetic diversity for N. ceranae is unexpectedly high within colonies as opposed to among colonies, this reduced genetic diversity might reflect a bottleneck effect due to honey bee colony losses, whereby Nosema strains representing a small fraction of that existing variation do especially well under current selective pressures. The gene flow of N. ceranae is not stopping, which could still substantially changed the current N. ceranae diversity, as N. ceranae diversity from Asian honeybees (Apis cerana and Apis florea) still showed significant differentiation from those obtained from Apis mellifera [44]. In conclusion, surviving honey bees and their parasites might be under intense selection for tolerance, since surviving colonies are predominantly the ones spreading N. ceranae. Alternatively, a few N. ceranae strains might be dominating infection due to direct selection on this parasite as it co-evolves with its new host species. These hypotheses can be further tested with the SNPs described here and by Pellin et al (2015) [41] and with an explicit understanding of the effects of sequence changes on amino acid sequences for proteins involved in host interactions. S6 File. Total SNP distribution along the longest parasite contig (Fig A). Regression between Synonymous and Non-synonymous SNP positions between parasites collected in 2007 and 2013 (Fig B). Ct value of Apidaecin, Hymenoptaecin, Dicer and Piwi (Table A). Normalized counts of Apidaecin, Hymenoptaecin, Dicer and Piwi (Table B). (DOCX)