CpG Distribution and Methylation Pattern in Porcine Parvovirus

Based on GC content and the observed/expected CpG ratio (oCpGr), we found three major groups among the members of subfamily Parvovirinae: Group I parvoviruses with low GC content and low oCpGr values, Group II with low GC content and high oCpGr values and Group III with high GC content and high oCpGr values. Porcine parvovirus belongs to Group I and it features an ascendant CpG distribution by position in its coding regions similarly to the majority of the parvoviruses. The entire PPV genome remains hypomethylated during the viral lifecycle independently from the tissue of origin. In vitro CpG methylation of the genome has a modest inhibitory effect on PPV replication. The in vitro hypermethylation disappears from the replicating PPV genome suggesting that beside the maintenance DNMT1 the de novo DNMT3a and DNMT3b DNA methyltransferases can’t methylate replicating PPV DNA effectively either, despite that the PPV infection does not seem to influence the expression, translation or localization of the DNA methylases. SNP analysis revealed high mutability of the CpG sites in the PPV genome, while introduction of 29 extra CpG sites into the genome has no significant biological effects on PPV replication in vitro. These experiments raise the possibility that beyond natural selection mutational pressure may also significantly contribute to the low level of the CpG sites in the PPV genome.


Introduction
DNA methylation is the prime form of epigenetic modifications of the eukaryotic genome. In vertebrate cells almost exclusively the 5 th carbon atom of cytosine is methylated within CpG dinucleotides. Methylation has a significant impact on chromatin structure modulation, genomic imprinting and X chromosome inactivation. It can inhibit transcription by preventing the binding of transcription factors or by recruiting methyl-binding proteins and histone deacetylases leading to the formation of condensed chromatin structure [1].
In mammals approximately 60-90% of CpGs comprise methylated cytosine bases [2]. The 5' end of the housekeeping genes are often associated with a GC-rich stretch of DNA containing high amounts of CpG dinucleotides, so called CpG islands, which are free of methylation [3]. DNA methyltransferases (DNMTs) are responsible for the conversion of cytosines to 5-methylcytosines. The DNMTs are divided into two groups: maintenance (DNMT1) and de novo (DNMT3a and DNMT3b) methyltransferases. The mechanism of site specific CpG methylation and regulation of DNMTs to develop specific patterns are presently not well understood [4].
CpGs are observed only at one-fourth to one-third of their expected frequency [5,6] in most vertebrate genomes. Several explanations have been proposed to account for this discrepancy, deamination and timine conversion of methylated cytosines, avoidance of higher stacking energy of CpG dinucleotides during replication and prevention of autoimmune reactions among others.
Unmethylated CpGs (UCpGs) as signature of invading bacterial and viral organisms are immunostimulants even on short oligonucleotides in mammals. Immune response is triggered by the UCpGs binding to TLR9, a member of the Tolllike receptor family on the surface of dendritic cells [7]. Therefore CpG methylation is not only important in the regulation of the hosts' life processes, but it also plays a key role in the detection of microbial and viral pathogens and inactivation of integrated foreign DNA [8,9], consequently it has major influence on the lifecycles of DNA-and retroviruses as well. The role of methylation in viral regulation is less understood than in mammals. Integrated adenoviruses and papovaviruses are generally hypermethylated, while the actively replicating viral DNA is hypomethylated with methylated sites in specific regions of the viral genome [9]. EBV is highly methylated during latency, and becomes demethylated during active replication. It uses methylationinduced gene silencing to evade host immunity [10]. In contrast, ranid herpesviruses are heavily methylated during replication and probably code their own DNA cytosine-5 methyltransferases [11]. CpG dinucleotides are underrepresented in most of the small DNA viruses. This pattern is thought to be established by evolutionary pressure to avoid CpG-mediated immune responses and to decrease the direct interference of methylation on the transcription of viral RNAs and viral replication [9].
Parvoviruses are small single stranded DNA viruses with an approximately 4-6 Kb linear genome. Despite their small genome and their limited coding capacity parvoviruses are surprisingly successful to invade a wild variety of host organisms from insects to mammals and constitute a large, diverse virus family [12]. Their diversity manifests not only in the large number of parvoviral species, but also in the complexity of their lifecycle. Beside lytic infection some parvoviruses are able to infect their respective host persistently [13,14]. Adeno-associated viruses are able to insert their genome into the host genome to establish latent infection and subsequently are capable to parasitize the transcription machinery of other viruses and reactivate their own replication mechanism during helper virus infection [15].
There are some well-established connections between persistent viral infection and CpG methylation in other virus families [24,25] but not much is known about the role of epigenetic modifications in parvoviruses. In this paper our aim is to expand our knowledge about the effect of CpG methylation in the life cycle of parvoviruses focusing on PPV and to reveal whether methylation has any direct influence on the evolution of the CpG poor PPV genome.
Our in silico analysis of the CpG pattern of parvoviruses revealed that parvoviral genomes are more heterogenic in their CpG contents than it was previously recognized and a group of parvoviruses exists in which CpGs are not depleted despite that their genomes are AT rich. PPV DNA was found hypomethylated independently from its tissue origin. In vitro methylation of PPV DNA or the introduction of additional CpG sites into the PPV genome had no significant effect on PPV replication in vitro. These data indicate that CpG methylation has no regulating role in PPV life cycle and together with the recently published findings that parvoviruses do not induce TLR9 activated immune response [26] suggest that CpG depletion in the genome of PPV and other parvoviruses is most probably the consequence of other evolutionary forces than CpG methylation.

Computing CpG distribution in coding positions and SNP analysis
Parvovirus sequences have been collected from the NCBI nucleotide databank (Table S1a). The contiguous protein coding sequences for Drosophila melanogaster were downloaded in FASTA format from the FTP site of FlyBase (release r5.24) [27]. The core set of human coding sequences are from The Consensus CDS (CCDS) project [28]. The protein coding regions of the human mRNAs were downloaded in FASTA format from the CCDS database [29]. Dinucleotide frequencies as a function of coding sequence position were calculated using custom Bash scripts (available upon request). Single nucleotide polymorphism (SNP) was calculated from 68 PPV sequences containing the full coding regions or the complete NS or VP genes (Table S1b) with a custom made program. The algorithm counts the polymorphism sites (distinguishing the transition events at C, G, CpG and GC nucleotides) in a multiple alignment where a consensus base is only considered with 75% confidence or above (C++ source code available upon request).

Viral DNA extraction and determination of the methylation pattern
The packaged form of the viral DNA was extracted from 1 ml tissue supernatant by the High Pure Viral Nucleic Acid Kit (Roche, Basel, Switzerland) according to the manufacturer's recommendations. The replicative PPV DNA was purified by using modified Hirt extraction [32] as it is described by Molitor et al. [33]. The methylation status of the viral DNAs was determined by bisulfite PCR, cloning and sequencing. For the bisulfite conversion the EpiTect Bisulfite Kit (Qiagen, Venlo, Netherlands) was used according to the manufacturer's instructions. The modified CpG containing DNA fragments of the positive and negative strands were amplified by PCR primers (Table 1) which were designed using MethPrimer program [34]. PCR reactions were executed by DreamTaq DNA Polymerase (Thermo Fisher Scientific, Waltham, MA, USA). The DNA amplifications were carried out by initial denaturation for 5 min at 95 °C, followed by 30 cycles at 95 °C for 20 s, 52 °C for 20 s, and 72 °C for 20 s. The amplified DNA fragments were cloned into pJET1.2 blunt cloning vector (Thermo Fisher Scientific) and sequenced by using BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA) following the instructions of the manufacturers.
Deep sequencing was executed on an Ion Torrent sequenator using the IonXpress barcode set and the 316D chip kit, after a DNA library preparation from the equimolarly pooled bisulfite PCR fragments by the NEBNext® Fast DNA Fragmentation & Library Prep Set for Ion Torrent (New England BioLabs, Ipswich, MA, USA). For data procession the 2.2 Torrentsuite software was used. To gain quality data, reads under 15 average Phred score were omitted. CLC Genomics Workbench 5.5 was used for data analysis. High confidence of the evaluation was ensured by excluding short reads (<20 nucleotide) and setting Length fraction and Similarity fraction parameters to 0.9.

Creation of the mutant viruses
Seven mutant viruses (3 single M1, M2, M3, and three double M12, M23, M13 and one triple M123) were rescued containing extra CpGs. To introduce CpG mutations into the NcoI-SacI (3473-3935) fragments of NADL-2 strain by joining PCR, three pairs of overlapping mutational primers (M1-3F and M1-3R), two external (mut_exF, mut_exR) and two internal (mut_intF, mut_intR) primers were designed ( Table 2). For the upstream fragments of the joining PCRs mut_exF and M1-3R primers were utilized, for the downstream fragments PCR mut_exR and M1-3F primers. For single mutants the pN2 infectious clone of the NADL-2 strain [35], for double mutants the clone of the M1 and M2 mutant viruses, while for the triple mutant the clone of M12 served as templates. The PCR reactions included 5 µl of 5X HF Buffer, 0,5 µl of 10 mM dNTP, 1 µl of each overlapping and external primers (  °C for 20 s. The amplified 1005 nucleotide-long products were digested with NcoI and SacI (New England BioLabs) restriction enzymes and cloned into the same sites of the pN2 infectious clone. Mutant sequences were deposited into GenBank under the accession numbers: KF913345-KF913351.

Transfection, viral stocks titration and quantification
To rescue the mutant viruses the infectious clones were transfected into PT cells by Turbofect (Thermo Fisher Scientific) reagent according to the supplier's recommendations. After 48 hours 50 µl culture media was transferred from the transfected cells to a 24-well plate seeded with PT cells and the viruses were let to multiply for 96 hours. This was followed by the inoculation of semi confluent PT cells with 0.5 ml of viral supernatants on 75 cm 2 plates. After 96 hours the supernatants were collected and titrated by three parallel, independent dilutions with immunofluorescence detection technique on PT and Cos7 cells as described previously [36]. Shortly: virus samples were serially diluted (10x) and cells on a 96-well plate were infected with 10 µl of the viral dilutions. PT and Cos7 cells were fixed after 20 and 48 hours respectively (to exclude the detection of progeny viruses emerging from the cells used for titering) with 3% formaldehyde and permeabilized with 1% Triton X-100. The 3C9 (CRL-17; ATCC) anti-PPV antibody and Alexa Flour ® 488 donkey antimouse IgG (Life Technologies Carlsbad, CA, USA) as secondary antibody were used for visualization of the infected nuclei (IN). Titer was calculated by multiplying the number of IN by the dilution factors and values are given in fluorescent nuclei count (FNC)/ml.
For qPCR quantification of the viral production initiated by differently methylated DNAs the transfected cells were washed three times to ensure minimal contamination of the viral stocks by plasmid originated viral DNA.
To monitor sequence stability viral stocks were passaged 10 times on a 24-well plate containing PT cells transferring 5 µl of the virus containing supernatant after 48 hours to freshly seeded cells covered by 2 ml medium. After the last passages new stocks were prepared on 75 cm 2 plates as it is described above. Following DNA extraction, the mutated regions were amplified by mut_intF, mut_intR primers ( Table 2) and sequenced.

Table 2.
Primers for creating mutants with extra CpGs.

Preparation of bacterially methylated and nonmethylated DNA for transfection
To get unmethylated viral genome, a PCR was performed using pN2 as template and N2F (5'-GGGTTATTGTCTCATGAGCGGATACATA-3') and N2R (5'-CAATTTCACACAGG AAACAGCTATGACC-3') primers. The PCR reaction included 20 µl of 5X GC Buffer, 2 µl of 10 mM dNTP, 6 µl DMSO 0.6 µl of each 100 pmol/µl primers, 1 µl of 2U/µl Phusion Hot Start II DNA Polymerase (Thermo Fisher Scientific) and distilled water to a final volume of 100 µl. The DNA amplification started by initial denaturation for 3 min at 98°C , followed by 25 cycles at 98 °C for 15 s, 66 °C for 20 s, and 72 °C for 4 min 30 s. To obtain bacterially DAM and DCM methylated viral genome the pN2 was digested by KpnI and BamHI restriction enzymes (Thermo Fisher Scientific) which cut the NADL-2 virus from the vector. In each case the viral genomes were isolated from 0.7% agarose gel by Nucleospin Extract II kit (Macherey-Nagel, Dueren, Germany).

In vitro CpG methylation
In a reaction 3 µg viral DNA was methylated by CpG methylase (Zymo Research, Irvine, CA, USA) according to the manufacturer's instructions. The reaction was stopped by ethanol precipitation, washed by 70% ethanol, dried and resolved in 20 µl distilled water. The success of hipermethylation was inspected by digestion of an aliquot with methylation sensitive SsiI (Thermo Fisher Scientific) restriction enzyme in 20 µl final volume.

Determination of expression levels of DNA methyltransferases in infected and non-infected tissues
The expression levels of porcine DNMT1, DNMT3a, DNMT3b were defined by real-time PCR, and normalized with those of the GADPH gene. RNAs were purified from PPV infected (MOI 3) and mock infected PT cells grown on 75 cm 2 flask 24 hours postinfection (HPI) by RiboZol TM RNA Extraction Reagent (Amresco, Solon, OH, USA) according to the manufacturers' instructions. The RNA was dissolved in 15 µl DEPC-treated water. The first strand of cDNA and the amplification were created using One-Step RT-PCR Kit (Qiagen). The reactions included 5 µl of 5X QIAGEN OneStep RT-PCR Buffer, 1 µl of dNTP mix, 1 µl of each 15 pmol/µl primers (
The peroxidase was revealed using the TMB cholorimetric substrate (MIKROGEN, Neuried, Germany) according to the supplier's recommendations. Bands were quantified with the ImageJ programs [38].
For immunofluorescence detection DNMT1 (H-300) and CF488A labeled goat anti-rabbit were used in 50 and 300-fold dilutions respectively and the samples were examined with a Zeiss Axio Observer D1 inverse fluorescence research microscope.

GC and CpG content of Parvoviruses
To better understand common and distinguishing features of PPV genome organization among parvoviruses the GC content and CpG density of 32 parvoviruses from the Parvovirinae subfamily were calculated and compared. The GC content of parvoviral genomes scales between 35% and 63%. In general, it can be stated that self-replicating parvoviruses have AT-rich genomes (GC content < 50%), while most of the adenoassociated viruses have GC-rich genomes (GC content > 50%) (Figure 1).
In self-replicating viruses the observed/expected CpG ratio (oCpGr) is variable: it can take very low values (like in the case of PPV), or high values (as it can be seen in the CaMiV genome). Dependoviruses are more uniform regarding CpG content: in each case the oCpGr values stay above 60% with the exception of MDPV (which is an autonomous member of the Dependovirus genus) and can reach more than 100%.
Considering the GC content and the oCpGr values, three major categories can be distinguished among the members of Parvovirinae subfamily: parvoviruses with low GC content (< 50%) and low oCpGr values (< 50%) (Group I), viruses with low GC content (< 50%) and high oCpGr values (> 50%) (Group II) and viruses with high GC (> 50%) content and high oCpGr (>

CpG distribution in PPV and in the coding regions of parvoviruses
Not only the number of CpGs but their distribution in the genome is different among parvoviruses. Very few CpG islands can be found in members of the first group and usually they are restricted to the terminal regions, while several potential CpG islands can be plotted in every member of the latter two groups scattered throughout the genome including both the coding and non-coding regions (Figure 2).
The NADL-2 strain of PPV contains 60 CpGs 13 in the left 7 in the right non coding regions and 40 in the protein coding sequences. The distribution of CpGs in the coding frames is not random. Only three CpGs can be found in the first coding position (CGX) in the Arginine codons, 13 in the second position (XCG) (in serine, proline, threonine and alanine codons) and 24 in the third position (XXC GXX) affecting two amino acid codons (Figure 3a). Similar ascendant tendency of CpG distribution can be observed by position in the majority of the parvoviruses but also in the viral hosts, for example in Homo sapiens (Figure 3b). Since the distribution is independent of the absolute number of the CpGs in the viral genomes and similar distribution can be detected in eukaryotic organisms, which do not have CpG methylation (e.g. Drosophila melanogaster) [39], it is more probable that the particular pattern of CpG distribution is rather the effect of coding bias and coding preference than some unknown evolutionary effect of the methylation machinery of the different organisms.

Biological effects of additional CpGs in the PPV genome
The small number of CpGs in the PPV genome implies an evolutionary pressure against this dinucleotide. To study the biological effects of the elevated CpG ratio, seven mutants were created in which new CpG sites were inserted into three CpG free regions of the VP2 gene by site-specific silent mutations of the pN2 infectious clone. The number of the new

Methylation status of PPV
To establish the methylation status of PPV genome, the bisulfite conversion based PCR protocol was used, which allows the independent analysis of the methylation of the negative and the positive strands. To cover all CpG sites on the full genome two sets of PCR primers (13 pairs for the negative strand and 11 pairs for the positive strand) were planned. First, the encapsidated negative strand was analyzed from virions originated from permissive (PT) 20 HPI semi-permissive (Cos7) cells 96 HPI and aborted pig embryos. In each case the investigated DNAs were highly hypomethylated ( Table 4, Table  S2). PPV encapsidates only negative strand, and positive  strand can be found almost exclusively only in the replicative form PPV DNA [40]. To exclude that the hypomethylation pattern of the viral DNA would be the result of specific encapsidation of the unmethylated DNA, the methylation pattern of the cellular PPV DNA purified from PPV infected PT cells (20 HPI) was also determined (Table 4). Similarly to the encapsidated negative strand, the positive strand proved to be hypomethylated, suggesting that PPV DNA remains hypomethylated during the entire life cycle of the virus including replication and packaging. 1-2 percentage points of the CpG sites on the cloned bisulfite treated PPV fragments remained resistant against C to T conversion indicating a rare occurrence of methylation on the PPV DNA. To gain more accurate data about the methylation level, the bisulfite treated PCR fragments were deep sequenced. Around 168000 PCR fragments were analyzed with 0-22619 coverage of the CpG sites. Most of the CpG sites had more than 92% conversion frequency while around 10% of the CpG sites had less than 92% conversion rate indicating that low level CpG methylation occurs on replicating PPV DNA ( Figure 5). Most probably there is no immediate effect of such a low level methylation on PPV replication, however, even low level of methylation can drive the purge of CpGs from the PPV genome during a long period of time, since methylated cytosines are mutational hotspots [41,42].
In fact, analysis of single nucleotide polymorphism (SNPs) of the PPV genome based on the available sequences of the DNA databank support the high mutability of CpG sites in the PPV genome. On the investigated coding region, the ratio of SNPs in the CpG sites (17 mutations, 38 CpG sites)    Figure 6). These data, together with our observation that extra CpG sites in PPV do not interfere with PPV replication, raise the possibility that not only natural selection but mutational pressure might contribute significantly to the low level balance of the CpG sites in the PPV genome.

The effect of methylation on PPV replication
To investigate the effect of CpG methylation on the PPV replication, PPV genomes were in vitro methylated, transfected and their replication initiation capability was compared to that of the bacterially cloned (DAM DCM methylated) and PCR amplified (non-methylated) PPV genomes. In vitro methylation was executed by M.SssI CpG methylase and in each case almost complete CpG methylation (>95%) of the PPV genomes could be achieved, shown by the digestion of the treated DNAs with the SsiI methylation sensitive restriction endonuclease (data not shown). The CpG methylated DNAs and their non CpG methylated counterparts were transfected into PT cells and their virus replication initiation capability was monitored by IF assay at 24 hours post-transfection using 3C9 PPV specific monoclonal antibody. In each case the CpG methylated DNA induced around 62% less viral infection in PT cells than the non CpG methylated control DNAs, indicating that CpG methylation has a relatively modest inhibitory effect on PPV replication (Figure 7a-b). This result was confirmed by qPCR analysis of the supernatant of the transfected cells (Figure 7c). Interestingly, non-methylated PPV dsDNA (PCR amplified) was less effective to initiate viral replication than the bacterially DAM/DCM methylated dsDNA.
The progeny viruses from CpG methylated PPV DNA transfected cells were collected, and the methylation pattern of their DNA was examined. The in vitro hypermethylated status of the transfected PPV genome could not be detected on the investigated fragments of the genome of the progeny viruses, they even proved to be hypomethylated, similarly to the genome of the native virus (Table S2d).
To gather additional evidence on the sustainability of the hypomethylated status of the PPV genome, the clone of the M123 mutant was also transfected into PT cells and the methylation pattern of the progeny virus was also examined. The 29 newly introduced CpG sites also remained hypomethylated similarly to the "wild type" CpG sites of the native virus (Table S2e).
These experiments strongly suggest that beside the maintenance DNMT1 the de novo DNMT3a and DNMT3b methylases cannot methylate replicating PPV DNA effectively either, and hypomethylation is not restricted to the existing CpG sites: it is a generalized process which extends to the full PPV genome.

Influence of the PPV infection on the transcription, translation, and cellular distribution of DNMT proteins
Hypomethylation of PPV DNA must involve a decreased activity of the cellular DNMTs on PPV DNA and/or a weak susceptibility of the PPV DNA to their action. Absence, inhibition or elimination of DNMTs in infected cells, different compartmentalization of methylases and viral DNA, or the unability of methylases to recognize PPV DNA as substrate singularly or synergistically can lead to hypomethylation of the viral genome. To study the direct reasons behind the hypomethylated status of PPV DNA we have investigated the    (Figure 8b-c).
The effect of overexpression of the DNMT3a on PPV methylation was also investigated. To ensure the coexpression of the PPV genome and the DNMT3a protein, the pN2 and the human DNMT3a expressing pcDNA3/Myc-DNMT3A [43,44] plasmid were co-transfected into PT cells. Co-transfection of the two plasmids resulted around three times less infectious foci (data not shown) than the cotransfection of the pN2 and the pDSREDmonomer-N1 plasmid which expressed the dsRED protein as a control, indicating an inhibitory effect of DNMT3a on PPV replication. Overexpression of DNMTs has been shown to change the pattern of gene expression in cells and influence cell cycle [45,46]. So it cannot be excluded that the viral inhibition of the overexpressed DNMT3a is rather due to an indirect effect on the host cell regulation than to direct methylase activity on the viral DNA. Especially because bisulfite sequencing of the rescued viruses, emerged from pcDNA3/Myc-DNMT3A and pN2 co-transfection, revealed a hypomethylated status of the viral DNA (Table S2f), demonstrating that DNMT3a, even present in excess, cannot methylate effectively replicating PPV DNA.

Discussion
Investigation of the GC and CpG contents of 32 parvoviral genomes revealed three distinct groups. Dependoviruses represent a group with high GC and high oCpGr values. As earlier recognized, the relative CpG-rich sequence of dependoviruses makes them an out-group among similarly sized DNA viruses [9,47]. Just like large DNA viruses, they are much less biased toward CpG dinucleotids than the small DNA viruses.
The majority of the known autonomously replicating parvoviruses belong to a different group with opposing characteristics manifesting low GC and low oCpGr values. However, based on our findings, several recently described autonomous parvoviruses seem to differ significantly from this group and feature relatively high oCpGr values combined with low GC content.
This finding somewhat extenuates the conclusion which emerged from the earlier analysis of viral sequences that small DNA viruses -parvoviruses among them -are extremely biased against CpG dinucleotide, and only adeno-associated viruses are exceptions from this rule [9,47].
The methylation patterns of a few DNA viruses have been studied in details. The hypomethylated state of the replicating PPV genome fits very well with what was described about replicating adeno [48,49] papilloma and polyoma viruses  [50,51,52,53] and reinforces the emerging view that the genome of small DNA viruses remains hypomethylated during replication. One possible explanation for this phenomenon is the association of the viral DNA with host and viral proteins during rapid viral replication and encapsidation which may simply prevent the interactions with DNMTs. However, the lack of (so far unknown) cis signals for de novo methylation or the presence of chromatin insulator sequences [54,55,56] might also contribute to the sustainment of hypomethylation in the PPV genome. This notion is supported by the fact that while some episomal adenoviral constructs become rapidly de novo methylated [57], others, like episomal recombinant AAV and even some bacterial plasmid constructs remain hypomethylated for a long time after introduction into the mammalian cells, despite being replication incompetent [58,59]. The inverted terminal repeat (ITR) of AAV is suspected to be an insulator [60] and most probably has a role in keeping AAV recombinant constructs hypomethylated and transcriptionally active [61]. However, whether the ITRs of PPV or any other parvovirus functions similarly remains to be investigated.
Interestingly, complete in vitro CpG methylation of PPV genome has a moderate inhibitory effect (around 62% decrease) on PPV replication initiation. The decrease was found very reproducible in several experiments and independent of the original methylation status of the treated DNA. These experiments indicate a moderate sensitivity of the PPV genome to methylation. This result is somewhat contradictory to human parvovirus B19 studies, since B19 promoter and replication seem to be very sensitive to CpG methylation [62]. However, B19 has a CpG island and 42 CpGs in its terminal 520 bp promoter/enhancer region overlapping with Sp1, Sp3 and viral NS protein binding elements, while PPV has no CpG island and has only 12 CpGs in its 190 bp long P4 promoter/enhancer region and no CpGs at all in its 140-bp-long P40 promoter region. Recent investigations revealed that the relationship between the methylation status and transcriptional activity of a gene is more complicated than it was originally thought [4,63,64] and different promoters react differently to methylation [63,65]. The different sensitivity of the two parvoviruses to methylation can be explained by the different CpG content of their promoters since the methylation of CpG-poor promoters frequently does not preclude transcription [63] while the methylation of CpG island promoters usually suppresses their activity [4,65].
In contrast to viruses (e.g hepatitis B virus, Marek's disease virus, Kaposi's sarcoma-associated herpesvirus and EBV) from other viral families [66,67,68,69], PPV infection does not influence the mRNA or the protein level of the DNA methylases and does not seem to change the localization of the DNMT3a. So it looks like that replicating PPV DNA is a weak substrate of the host's DNA methylases and even overexpressed DNMT3a cannot raise PPV methylation level significantly.
The genome of the majority of the autonomous parvoviruses -PPV among them -similarly to their hosts are highly GC and CpG depleted. It is widely assumed that the main reason behind CpG depletion in small DNA viruses and RNA viruses is natural selection coming from replicative advantage and/or immune escape [9,47]. However, our findings that the introduction of additional CpGs into the PPV genome has no measurable biological effect (no disadvantage),or in vitro hypermethylation does not significantly inhibit replication initiation of PPV argue against the replicative advantage of CpG depletion. A recent publication about the failure of members of the Parvoviridae family to elicit TLR 9 activated interferon response in plasmacytoid dendritic cells [26] questions the existence of immunological pressure against CpGs in parvoviral genomes. The ascendant distribution of CpGs by position does not support the presence of immunological pressure against CpGs in parvoviruses either because it would be expected that under such pressure the number of first position CpGs would exceed the number of second or third position CpGs (the degenerate code makes it easy to change the C and G of the CpGs in third and second positions without amino acid changes while point mutations of first position CpGs inevitably come with amino acid changes which could be harmful to the virus).
These data together with our observation that CpG sites are more mutable than GC or C and G sites in the PPV genome suggest that mutational pressure could have a more significant role in the formation of PPV genome than selective forces.
In vertebrate genomes CpG is the most mutable dinucleotide and its occurrence is about 20 % of the expected frequency [5,6]. The high mutability of CpGs and its frequent conversion to TG and CA largely attributed to mutational pressure caused by the spontaneous deamination of 5-methylcytosine. The low level methylation of the PPV genome combined with the high mutation rate of parvoviruses [70,71] and the single stranded DNA genome, which is very susceptible for deamination [72,73] may explain the higher mutability of CpGs and their loss from the PPV genome through countless generations. Reinforcing this argument we have to mention that not only the transition but the transversion rates of the CpG sites are a few times higher in mammals [74,75,76] and non-methylated CpGs still have an approximately three times higher overall mutation rate in the human genome than non CpG nucleotides [77]. These observations suggest a deamination-independent intrinsic mutability of CpGs in the mammalian genome, and raise the possibility that mutational pressure originating from the host replicative machinery may have a significant influence to CpG suppression, at least in small DNA viruses which lack their own DNA polymerases.
Similarly to other organisms, mutational pressure, genetic drift and natural selection together shape the genome of parvoviruses. Our in vitro investigations of different CpG mutants of PPV were not able to reveal any significant evolutionary advantage of CpG depletion in viral replication at cellular level. Future in vivo experiments and monitoring our extra CpG mutants at organism level may help us clarify how much of the loss of CpGs from the PPV genome is the consequence of selective or other evolutionary forces. Table S1. Accession numbers of the viral sequences. 1a, Accession numbers and abbreviations of the investigated parvoviral sequences. 1b, Accession numbers of the PPV sequences. (DOC)