The Genome of Chelonid Herpesvirus 5 Harbors Atypical Genes

The Chelonid fibropapilloma-associated herpesvirus (CFPHV; ChHV5) is believed to be the causative agent of fibropapillomatosis (FP), a neoplastic disease of marine turtles. While clinical signs and pathology of FP are well known, research on ChHV5 has been impeded because no cell culture system for its propagation exists. We have cloned a BAC containing ChHV5 in pTARBAC2.1 and determined its nucleotide sequence. Accordingly, ChHV5 has a type D genome and its predominant gene order is typical for the varicellovirus genus within the alphaherpesvirinae. However, at least four genes that are atypical for an alphaherpesvirus genome were also detected, i.e. two members of the C-type lectin-like domain superfamily (F-lec1, F-lec2), an orthologue to the mouse cytomegalovirus M04 (F-M04) and a viral sialyltransferase (F-sial). Four lines of evidence suggest that these atypical genes are truly part of the ChHV5 genome: (1) the pTARBAC insertion interrupted the UL52 ORF, leaving parts of the gene to either side of the insertion and suggesting that an intact molecule had been cloned. (2) Using FP-associated UL52 (F-UL52) as an anchor and the BAC-derived sequences as a means to generate primers, overlapping PCR was performed with tumor-derived DNA as template, which confirmed the presence of the same stretch of “atypical” DNA in independent FP cases. (3) Pyrosequencing of DNA from independent tumors did not reveal previously undetected viral sequences, suggesting that no apparent loss of viral sequence had happened due to the cloning strategy. (4) The simultaneous presence of previously known ChHV5 sequences and F-sial as well as F-M04 sequences was also confirmed in geographically distinct Australian cases of FP. Finally, transcripts of F-sial and F-M04 but not transcripts of lytic viral genes were detected in tumors from Hawaiian FP-cases. Therefore, we suggest that F-sial and F-M04 may play a role in FP pathogenesis.


Introduction
The Chelonid fibropapilloma-associated herpesvirus (CFPHV; Chelonid herpesvirus 5, ChHV5) is strongly associated with fibropapillomatosis (FP), a neoplastic disease of marine turtles [1]. Based on previous partial sequencing of its genome, ChHV5 has been taxonomically assigned to the subfamily alphaherpesvirinae [1,2,3,4]. FP was first described in the 1930s in green sea turtles (Chelonia mydas) from Florida [5]. Since then, FP has reached epizootic proportions, affecting at least five more marine turtle species worldwide, including the east and west coast of both Northern and Southern America, the Caribbean, Australia, Asia, and the Hawaiian Islands (reviewed in [6,7]). In Florida, the prevalence of FP ranges from 11 to 52% [8]. FP is characterized by the formation of fibroepithelial tumors growing in the skin, in periorbital and orbital tissues, and in Hawaii, the oral cavity [7,9]. Furthermore, tumors may grow in multiple visceral sites, including lung, liver, kidney, heart and gastrointestinal tract [10]. Tumor size may vary from 0.1 to more than 30 cm in diameter [10]. As a result of indirect effects of tumor growth, affected turtles become emaciated, bacteremic, immunosuppressed, and anemic [11,12]. However, severe debilitation has also been observed with turtles carrying only few external or internal tumors [10]. Histologically, skin tumors are characterized as fibropapillomas [13], whereas internal tumors are characterized as fibromas, myxofibromas or fibrosarcomas of low grade malignancy [10]. All these tumor-types are ChHV5 DNA-positive and have extensive collagen deposition in the extracellular matrix with myxofibromas having deposition of sulfated proteoglycans between collagen fibers [10]. At present, it remains unclear, whether these alterations to the extracellular matrix are being caused by altered metabolism of the fibroblasts by transformation or directly by the function of unidentified viral enzymes that may be secreted from infected cells. Rarely, amphophilic intranuclear inclusion bodies, compatible with herpesvirus, are noted in the epidermal layers of tumors [14] suggesting that viral replication is restricted to certain areas of the tumor and, probably, to certain cell types. Furthermore, perivascular mononuclear cell infiltration has been observed frequently in the dermal layers of tumors [13].
While descriptions of the clinical signs, pathology, and pathogenesis of FP are numerous, research on the FP-associated herpesvirus itself has been impeded by the fact that no cell culture system for propagation of ChHV5 exists [15] and only a fraction of its genome has been sequenced; all known sequences confining to the putative Unique Long (UL) fragment of the genome [1,3,4,7]. Koch's postulates have not yet been fulfilled to identify conclusively ChHV5 as the one and only causative agent of FP, although transmission of FP by using cell-free tumor material has been reported [16]. Having access to more sequence information about ChHV5 would be valuable in that it may open the doors for elucidation of molecular mechanisms of pathogenesis as well as for the development of diagnostic tests, antiviral treatments, and vaccines.
The purpose of the research described here was to (1) generate a Bacterial Artificial Chromosome (BAC) [17,18,19] of ChHV5 and (2) expand the knowledge on ChHV5 genomic sequences [1]. Indeed, we found that the ChHV5 genome is largely collinear with the genomes of typical alphaherpesviruses but that is also comprises a number of unexpected features.

Identification a BAC containing known ChHV5 sequences
A BAC library was created from the glottis tumor of a Hawaiian green turtle with FP as described in Materials and Methods. With an estimated average insert size of 130 kbp (not counting the pTARBAC sequences), the library covered approximately two turtle genome sizes. Previously published sequences of ChHV5 (AF035003 [3], AY395516 [4], AF355149, AF355148, EF555201, AY644454 [1]) were used to generate three different hybridization probes for screening within the library for clones containing ChHV5 sequences. As a result, probes generated from UL30, UL22, and UL12 identified one single positive clone, which was designated CH-651-60O9. Presence of additional known ChHV5 sequences on the same clone was also confirmed by PCR, where primers for amplification of fragments from UL28 and UL10 were used (Table 1). A shotgun library was established from the BAC and subjected to DNA sequencing. The resulting sequences suggested that BAC CH-651-60O9 most probably contained the entire genome of ChHV5.

Sequence of BAC CH-651-60O9
Apart of the pTARBAC sequences, BAC CH-651-60O9 comprised a total of 132.233 bp of DNA, which could be divided into a unique long sequence (UL; 101.152 bp), a unique short sequence (US; 13.319 bp), and inverted repeat sequences (IRS; 8.831 bp each), which flanked the US sequence. Thus, the cloned ChHV5 sequences presented themselves in the configuration of a herpesvirus Type D genome, such as it is found in the genus varicellovirus of the alphaherpesvirinae (Figure 1). The entire viral sequence was deposited as GenBank HQ878327. Since the entire sequence had been derived from a circular molecule, the numbering of the nucleotides was pragmatically started at one of the repeats and continued over US to the second (thus internal) repeat sequence, and finally along the UL sequence. This order was also maintained to briefly describe below some of the most interesting features of the sequence. Open reading frames (ORFs) that demonstrated by BLAST analysis a recognizable similarity to known herpesvirus proteins were named according to the current herpesvirus gene nomenclature and assigned with the prefix ''F'' for fibropapilloma.

Repeats
Analysis of the repeated sequences revealed 12 ORFs with more than 40 codons, which encoded for 15 different potential proteins ( Table 2). Each 6 ORFs were oriented in the same direction. Three loci consisted of overlapping ORFs. Three translated ORFs showed restricted homology to proteins encoded in the repeats of other herpesviruses. One of them showed a remote similarity to ICP0, the second exhibited a limited resemblance to ICP4. Both of these exhibited a signature sequence that has been associated to nucleotidylylation of the corresponding proteins [20]. The third one showed, interestingly, some alikeness to the latency-associated nuclear antigen (LANA) encoded in gammaherpesvirus genomes. Four ORFs (HP6, HP7, HP11, HP13) extend into the US sequence, thus, encoding for two pairs of putative proteins with identical N-termini and differing C-termini. One ORF (HP12) started from within the US, whereas its counterpart (HP8) was contained within the repeats but shared C-terminal identity with HP12.

Unique Short sequence
Eleven ORFs were identified within the US sequence, 5 in one direction and 6 in the opposite direction ( Table 3). Several of the predicted proteins did not show close relationship to known proteins of other herpesviruses. However, there were homologs to US10, US8 and US3 of other herpesviruses. The F-US10 protein was predicted according to its similarity to the virion protein US10 of a waterbird herpesvirus (Anatid herpesvirus 1). The protein encoded by F-US8 is apparently related to several glycoprotein E (gE) variants of different herpesviruses, including equine herpesvirus 9 as well as bovine and phocid herpesviruses. Interestingly, two different genes in the US of ChHV5 showed a relatively strong homology to the herpesvirus protein kinase US3; they were designated F-US3A and F-US3B. Moreover, the gene designated F-US4 encodes for a glycoprotein that shares a conserved C-11n-C-10n-C motif with the gG, gD, gI family of glycoproteins mapping in the US of various herpesviruses [21].
Although none of the predicted proteins showed direct homology to the immediate early proteins 22 or 27, several candidates were detected, for which nuclear localization was predicted. For example, an N-terminal RNA-binding domain and similarity to RNA-binding splicing factor was predicted for F-US12, which might, therefore, contribute similar functions as either ICP27 or ICP22 of other alphaherpesviruses.

Unique long sequence
The UL sequence was mostly collinear to the genomes of typical alphaherpesviruses. 76 ORFs were identified within the UL sequence, 40 in the forward direction and 36 in the opposite direction (Table 4). 46 ORFs showed strong similarities to UL genes of other herpesviruses. However, homologs to some UL-ORFs of other herpesviruses were amiss, i.e. UL13, UL40, UL44 through 51, and UL54 through 56. Interestingly, each one of those ''missing genes'' has been reported to be non-essential for viral viability in one or the other comparable herpesvirus [22]. In compensation for those missing genes, at least four genes were detected that are not typically associated with alphaherpesviruses but may be found in beta-or gammaherpesviruses. Moreover, 26 hypothetical genes were identified.
Are the newly detected unconventional genes truly part of the viral genome?
According to the sequence determination, the pTARBAC sequences had inserted into an EcoRI site within the F-UL52 open reading frame. With parts of UL52 flanking the insertion, it was assumed that the BAC comprised the entire viral genome. However, while conventional UL genes extended on one side of the UL52 ORF, we detected on the other side of the same gene a row of ORFs that were rather unexpected in the context of an alphaherpesvirus genome (Table 4; Figure 2A). The first one was similar to cytomegalovirus M04 (F-M04, 270 aa), the second and third one showed similarity to the C-type lectin domain family (F-lec1, 178 aa; F-lec2, 176 aa), and the fourth one carried the signatures of a sialyltransferase (F-sial, 320 aa). Thus, the question arose, whether or not the newly detected unconventional genes were truly part of the ChHV5 genome.
In a first approach to address this, overlapping PCR, starting from within UL52, was performed. DNA from tumor materials of independent FP-cases was used as template, while BAC CH-651-60O9 DNA and DNA from unaffected tissue of the same turtles served as controls. The primers used are given in Table 5, while the PCR-strategy together with the expected map locations and sizes of the amplification products are indicated in Figure 2B. The primer pair P1F and P4R gave a product of the expected 396 bp size solely when DNA extracted from tumors was used as template ( Figure 3A). Due to the pTARBAC insert, the amplification  product size with the BAC DNA template would have been larger than 10.000 bp, which exceeded the limits of our PCR conditions and did not result in a visible product (not shown). Due to lack of template, no PCR product arose when DNA from normal tissue was used (not shown). However, the other primer pairs gave products of the expected sizes with either BAC CH-651-60O9 DNA (not shown) or DNA extracted from tumors ( Figure 3A) but not with DNA from normal tissue (not shown). To confirm the suitability of the primers P1F and P4R in the context of our BAC DNA, they were paired with P2R and P3F, respectively, which provided amplification products of the expected size with either DNA extracted from tumors (not shown) or BAC CH-651-60O9 DNA ( Figure 3B) but not with DNA from normal tissue as template (not shown). Together, these data strongly suggest that these unconventional genes are actually part of the ChHV5 genome. They also support the notion that the BAC CH-651-60O9 comprises the entire genomic DNA of ChHV5.
The F-sial-and the F-M04 genes are expressed in tumor tissue Tumor as well as matching normal tissue from fresh cases of FP was collected and RNA was extracted for RT-PCR as described in materials and methods. The primers used for this experiment, designed to target either the F-sial-or the F-M04gene, are listed in Table 6. Using this approach, the corresponding RNAs for both F-sial ( Figure 4A, B) and F-M04 ( Figure 4C) were detected in different tumors but not in the corresponding normal tissue that had been collected just outside of the tumor area of the same animal. Upon omission of the RT step, no product was generated, whereas the polymerase used was shown to amplify DNA template ( Figure 4B, C). Consecutively, the amplification products were cloned by recombination into a Gateway donor vector and the nucleotide sequence of each insert was determined. Each sequence matched exactly the corresponding prediction (data not shown). In contrast to F-sial and F-M04, the question regarding transcription products for either F-UL52 or F-lec1 or F-lec2 was not addressed. However, various RNAs extracted from different tumors were also analyzed for the transcripts for the envelope glycoprotein gB (F-UL27), the capsid protein VP26 (F-UL30), the capsid portal protein (F-UL6), and the viral thymidine kinase (F-UL23) all with negative result (data not shown). Thus, this series of experiments indicated that both F-sial and F-M04 were expressed in FP-tumors, while genes for components related to viral replication were not expressed. Moreover, the sequences of these two newly discovered unconventional viral genes matched exactly to the sequences determined from the BAC CH-651-60O9 DNA.

Discussion
While it is almost standard procedure to clone BACs from replication competent herpesviruses, this is, to our knowledge, the first time that a genomic herpesvirus BAC has been created directly from infected tissue and in the absence of any means to propagate the agent in culture [7,18,19,23,24,25]. Total DNA was extracted from PCR-positive turtle tumors, partially digested, size selected and cloned into a BAC vector [26]. One out of 18.000 BAC clones was identified as carrying known ChHV5 sequences, including hybridization by using probes against UL30, UL22, and UL12 as well as by PCR using primers for detection of UL28, UL27 (gB), and UL10 (data not shown).
The nucleotide sequence determination of the BAC revealed that the overall structure of the cloned molecule corresponded in its size (132.233 bp) and configuration (TRS-US-IRS-UL) to a Type D genome, which is typical for members of the varicellovirus genus of the alphaherpesvirinae. Indeed, the largest part of the molecule encoded for proteins that are known to be typical for those viruses (Tables 2, 3, 4). The sequences of several fragments of the ChHV5 UL sequence were previously deposited by others and us in GenBank and were available for comparison, revealing conservation in the range of 98 to 100%. In addition, corresponding homologues were easily detected by BLAST analysis. However, 38 potential coding sequences were also found that did not readily provide an alphaherpesvirus-related counterpart upon BLAST analysis. With a few exceptions (addressed further below), those ORFs were designated as potentially coding for hypothetical proteins (HP). The sequence analysis also revealed that pTARBAC had indeed inserted into the UL 52 gene of ChHV5 (F-UL52), a gene that belongs to the helicase/primase complex, which is essential for replication of herpesviruses [27,28].    Comparison of the predicted F-UL52 protein to its herpesvirus orthologs A P-BLAST search of the predicted F-UL52 translation product suggested a relationship to UL52 proteins from various animal alpha herpesviruses, including equid (EHV1), bovid (BoHV1), suid (SuHV1), melagrid (MeHV1), and gallid (GaHV1, 2, 3) herpesviruses. A more distant relationship to beta herpesviruses was also revealed, for example to human (HCMV) and chimpanzee   Table 5) listed on the left was expected to yield the products listed on the right. Double-arrows indicate the putative map location of the PCR product; the numbers refer to the expected sizes of the amplicons. doi:10.1371/journal.pone.0046623.g002 (PaHV2) cytomegaloviruses. Upon alignment and tree-formation of the UL52 aa sequences using the neighbor joining algorithm, the newly discovered F-UL52 species emerged on the same branch as bovid and suid alphaherpesviruses and cytomegaloviruses whereas avian alphaherpesviruses clustered on a different branch ( Figure S1). Upon pairwise comparison of the individual aa sequences (Table S1), the level of aa identity among the different herpesvirus UL52 sequences varied mostly around 31 to 43% identity. Highest identities (82%) were found in between of the closely related bovid herpesviruses 1 and 5, followed by the avian herpesviruses (,60%), F-UL52 (22 and 25%) and betaherpesvirus (7 to 10%).
There is considerable variation in aa length of UL52 of herpesviruses. Alphaherpesvirus UL52s range from 962 to 1124 aa, whereas Betaherpesviruses have much smaller UL52 homologs (640 to 668 aa). With a predicted size of 957 aa, the F-UL52 was closer in size to the UL52s of the alphaherpesviruses and may explain in part the topology of the UL52 phylogenetic tree ( Figure  S1). Interestingly, we occasionally obtained larger than expected PCR products with primers P1F and P4R, which we attribute to repeat sequences within the targeted F-UL52 gene, although size variation might also occur in F-UL52 of different ChHV5 isolates (data not shown).
In summary, F-UL52 has a distant but distinct relationship to UL52 of the alphaherpesvirinae subfamily but does not have an especially close relationship to avian UL52. Since the gene itself extended to both sides of the pTARBAC-insertion, the possibility that a fragment of the ChHV5 genome had been lost due to the cloning procedure may be regarded as diminished, although not entirely excluded.

Unique short sequence and its flanking repeats
To our knowledge, this is the first time that sequences extending to the US fragment of ChHV5 and its flanking repeats are reported. Two very interesting features were observed in the sequence of the ChHV5 US fragment. First, two individual genes (F-US3A and F-US3B), sharing absolutely no apparent sequence homology between each other, were detected, both of which appeared to encode for a herpesvirus protein kinase. Conventionally, the US3 gene encodes for a protein kinase (PK) that is characteristic for the Alphaherpesvirinae subfamily of the herpesviruses [29]. However, the US3 PK is known to fulfill various functions, which may include effects on the cytoskeleton of the infected cell, anti-apoptotic activity, and function(s) in the egress of viral particles [29,30,31]. It is tempting to speculate that each one of these two newly discovered PKs may fulfill separate functions; one that might be important for viral replication, the other (anti-  apoptotic) that may play a role in the pathogenesis of tumor formation. The total lack of sequence homology between F-US3A and F-US3B, on both the nucleotide and the amino acid sequence level, speaks against a possible emergence of the two genes due to a gene duplication mechanism. The second interesting feature of ChHV5 US relates to the number of encoded glycoprotein genes. In herpes simplex virus type 1 (HSV-1), the US sequence harbors a row of 5 genes, all in the same orientation, which encode for viral glycoproteins (US4 gG, US5 gJ, US6 gD, US7 gI, US8 gE). However, in ChHV5 US we detected only two such genes, designated here F-US4 and F-US8. By BLAST analysis, F-US8 was readily identified as a gE-homologue. In contrast, the F-US4 protein only remotely resembled to herpesvirus glycoproteins. It is well known that members of the genus varizellovirus, such as its prototype member varizella-zoster virus (VZV) or the Bovine herpesviruses types 1 and 5 (BoHV1, BoHV5) may feature a lesser number of glycoprotein genes in their US sequence [32,33,34,35]. Importantly, at least one of the encoded proteins supplies the major receptor-binding function. In most alphaherpesviruses, this property is provided by gD (US6 protein). In VZV, the protein encoded by gene 68 (corresponding to US8, gE) has evolved to execute this function in the absence of an US6 (gD) homolog. In the absence of a ChHV5 that replicates in cell culture, it is presently difficult to address the question, which one of the two ChHV5-US-encoded glycoproteins actually confers the receptorbinding function. Since the receptor-binding protein needs to interact with the virus's fusion machinery, consisting of gB, gH, and gL, it is doubtful that a simple transfer of the candidate receptor-binding proteins to another gD-deficient alphaherpesvirus will be able to identify the corresponding property. However, such an approach has been successful in the case of the two very closely related viruses BoHV1 and BoHV5 [36]. Based on the presence of a conserved C-11n-C-10n-C motif present in gG, gD, and gI, others have speculated that the corresponding glycoprotein genes may have evolved through processes of duplication and subsequent divergence [21]. Since this exact motif is apparently conserved in the F-US4-encoded protein, it may be speculated that it may indeed represent a progenitor form of the gG, gD, gI family of modern alphaherpesviruses. While in our sequence determination the TRS and IRS sequences per se were perfectly mirrored, they provided limited similarity to repeat sequences known from other alphaherpesviruses. The largest ORF within those repeats typically encodes for a member of the ICP4 family, an immediate early regulator of viral gene expression. Indeed, such an ORF was detected among the ChHV5 sequences (F-ICP4a), whose translation product  Table 6. PCR primers used for detection of F-M04-and F-sial RNA a . carried a nuclear localization signal and also exhibited the consensus nucleotidylylation sequence that has been annotated to ICP4 [20]. Both features are consistent with a potential functional homolog to ICP4, although proof will have to be provided by future experiments. Interestingly, the F-ICP4a ORF was overlapped in an alternative frame with an even slightly larger coding sequence, which exhibited similarity to the LF3 gene of Cercopithecine herpesvirus 15 (therefore designated LF3-like). It will be interesting to find out, whether only one or the two of these ORFs are functional in ChHV5 replication. Alternatively, it may also be that F-ICP4a is part of a spliced transcription product, which includes other coding sequences. Candidates for such processes were detected in the UL sequence (designated HP38 ICP4b and HP37 ICP4c). However, those genes would only be available for transcription starting from the F-ICP4a gene if the viral genome would circularize. Alternative splicing of transcripts starting from within the repeated sequences and extending into UL, thus giving rise to a product (circ) that may be produced only upon circularization of the genome, has been reported for example in BoHV1 [37].

Atypical genes
Interestingly, a series of genes was present in our BAC-cloned DNA molecule that may be atypical for alphaherpesviruses but each one of them has well defined homologues in the genomes of beta-or gammaherpesviruses. None of these gene products is known to have an essential role in viral replication. However, each one apparently plays a biological relevant role in either pathogenesis or immunedeviation. One of these (F-M04), has only been described in beta herpesviruses, while another (F-sial) has been found in a gammaherpesvirus (BoHV4), in other virus families like poxviruses and baculoviruses, and also in host cells [38]. Orthologues to F-lec1 and F-lec-2, respectively, have been detected in the genome of rat CMV as well as several cytoplasmic DNA viruses, i.e. poxviruses and asfarviruses [39,40,41].
Overall, these observations suggest that ChHV5 may combine features of not only the alphaherpesvirinae but also of betaand gammaherpesvirinae. Recent analyses of the available genomic sequences of reptilian herpesviruses suggested that ChHV5 as well as the lung-eye-trachea disease-associated herpesvirus of green turtles (ChHV6) [42,43] should be counted among the alphaherpesviruses [2,44,45]. Interestingly, the existence of atypical genes has been reported to occur among the mardiviruses, which comprise tumorigenic avian alphaherpesviruses [2,45,46,47]. It is therefore interesting to note that ChHV5 shares the property of harboring host-related genes not only with the beta-and gammaherpesviruses but also with the mardiviruses.

Features of the predicted F-lec proteins
According to the predicted amino acid (aa) sequences both of these putative proteins are type II membrane proteins and carry the signature of the C-type lectin-like domain superfamily (reviewed in [48]). The superfamily consists of at least 16 different groups. Group V, which seems to be most close to the present Flec proteins, comprises relatively small (,20 kDa), non-calciumbinding type II membrane proteins, which are associated with either activation or inhibition of natural killer (NK) cells. Most of those proteins have protein ligands but some are also multivalent, binding not only proteins but also carbohydrates. The C-type lectin protein orthologue in rat CMV (designated RCMV lectin) originates from a multiple-spliced gene [41]. In contrast, the poxviruses (e.g. vaccinia virus A40R) and ASFV (8CR) carry unspliced variants of a similar gene [39,40]. Upon alignment of these viral lectins with F-lec1 and F-lec2 as well as the equine CD69 protein (as a presumed outlier) only very limited similarity was observed among the individual aa sequences. Indeed, upon pairwise comparison, the percentage of aa identity varied between 12 to 23% (Table S2). Surprisingly, the highest identity percentages of F-lec1 and F-lec2 were neither detected among the two themselves (19.8% aa identity) nor among the other viral orthologs (12 to 15%) but with the CD69-like protein of the horse (equus caballus)(22% and 23%). Unfortunately, the corresponding sequences from Chelonia mydas, the virus's host, are presently not known.
To our knowledge, ChHV5 is the first virus detected to carry at least two separate genes coding for this type of host-like proteins. Notably, a TATA box was observed in the presumed promoter region of F-lec1 (less than 50 bp upstream of the ATG). In contrast, the F-lec2 gene was preceded with a GATA box in its presumed promoter region. Interestingly, the GATA box had been reported as a functional feature from within the promoter of the RCMV lectin gene [41].

Features of the predicted F-M04 protein
According to the predicted aa sequence, the putative ChHV5 M04 (F-M04) protein is a type I membrane protein with signal sequence (aa 1-24), an extracellular domain (aa 25-230, a transmembrane region (aa 231-253), and a cytoplasmic tail (aa 254-270). Furthermore, at least one N-glycosylation site (N53) and an O-glycosylation site (S37) can be predicted on its ectodomain. Furthermore, a motif-scan revealed a COX2 domain (aa 54-63), an Ig-and MHC-signature (aa 180-186), and a BPD-transp-1domain (aa 227-262; Binding-protein-dependent transport system inner membrane component). Many of those features are typically found on M04 proteins. Moreover, the predicted size of F-M04 (270 aa) matches nicely into the typical M04 size pattern, which ranges from 256 aa to 271 aa. However, alignment of F-M04 aa sequence (Table S3) with members of the M04 family specified by various murid cytomegaloviruses revealed only a minor relationship. MCMV members of the M04 family showed among each other an identity level of approximately 40 to 90%. In contrast, F-M04 shared between 13 and 18% identity with other members of the family.
In MCMV, M04 has been shown to associate with MHC-I molecules and to travel in their company to the cell surface, where the complex may serve as a decoy for preventing effective NK activity [49,50]. However, the M04 gene, which belongs to the m02 family of MCMV genes, is subject to considerable variation in nature, which might affect the function of its corresponding gene product [51]. It will be most interesting to test in the future how this applies for the product of F-M04.

Features of the predicted F-sial protein
According to the predicted aa sequence, the putative F-sial ( Figure S2) is a type II membrane protein, comprised of 320 aa, with a transmembrane region close to its N-terminus (aa 7-26) and with extensive similarity to the conserved domains of any typical sialyltransferase, i.e. domain L (aa 105-152), S (aa 246-268, III (aa 281-284), and VS (aa 296-301) [52,53].
Sialyltransferases are important in viral pathogenesis (reviewed in [38]). Alignment of the amino acid sequence of F-sial with selected members of the protein family (Table S4) revealed three pairs that were clearly related to each other, i.e. (1) the acetylglucosamine transferases of Bovine herpesvirus 4 (BoHV4) and its host (Bos taurus)(94% aa identity), (2) the sialyltransferases of myxoma virus and rabbit fibroma virus (79%), and (3) the sialyltransferases IV of humans (Homo sapiens) and chicken (Gallus gallus) (75%). Interestingly, the protein specified by Deerpox virus shared 39% identity with the proteins of the other poxviruses, whereas F-sial was closest to the human and chicken proteins (36% identity) but shared still about 27% identity with the poxvirus proteins, which also shared .30% identity with the human and chicken proteins. All other proteins analyzed shared only 12% or less identity among each other and with F-sial. The stretches of highest homology were found in the central region of the molecules as well as towards the C-terminus, i.e. in the context of the proteins' predicted functional domains.
Viral sialyltransferases have been reported to glycosylate not only proteins (poxviruses and bovine gamma herpesvirus 4, BoHV-4) but also ecdysteroid hormones (baculoviruses) and DNA (bacteriophages). In each of those examples, the viral glycoslytransferase seems to play an important biological role. While bacteriophages are able to switch their serotype as a result of the viral glycosyltransferase activity, insect molting, and pupation is inhibited because of the baculovirus encoded glycosyltransferase. The only herpesvirus that is currently known to encode a sialyltransferase, BoHV-4, apparently gains a great survival advantage by possessing this gene in its repertoire. Descendents from this relatively modern virus are found worldwide, whereas the glycosyltransferase-less progenitor has been extinguished [38,54,55]. Fascinatingly, myxomavirus and other leporipoxviruses that possess sialyltransferases are able to give rise to the growth of localized fibromas in rabbits, hares, and squirrels [56]. Although the respective sialyltransferases are not exclusively responsible for the fibromas, they are considered as important virulence factors of the corresponding viruses. The morphological similarity between those syndromes and FP suggests that parallels in the underlying pathogeneses may exist.
Are the unconventional genes truly part of the ChHV5 genome?
One may argue that our approach to generate the ChHV5 BAC clone is tainted with undesirable drawbacks. Due to restriction enzyme digestion and ligation, fragment(s) of the viral genome may be lost and non-viral sequences may be integrated into the final construct. Depending on the site of the pTARBAC insertion, there is also the possibility that the emerging BAC clone may not be infectious. Indeed, this third caveat actually came true: pTARBAC was inserted inside of the coding sequences of F-UL52, which encodes for the viral helicase that is essential for viral replication. Thus, infectious virus could not be reconstituted from our BAC (data not shown). Yet, this deficiency may be corrected throughout future experiments. The present sequence knowledge provides ample information to plan for the transfer of the pTARBAC cassette to a more desirable location. However, the insertion inside of a coding sequence with parts of the interrupted gene to either side of the insert has also its advantages for the present purpose. Indeed, it argues strongly against the potential loss of viral sequences that might have occurred during the cloning process. Moreover, it contributes a first line of evidence that the unconventional genes are actually part of the viral genome. A second line of evidence came from overlapping PCR using DNA from unrelated cases of FP as template and starting the series of PCR from the far side of the pTARBAC-insertion locus. Moreover, none of the unconventional sequences were amplified, when the DNA template was extracted from unaffected tissue of the same animals. A third line of evidence was contributed by the results of 454 pyrosequencing DNA extracts from independent tumors and normal skin (data not shown). Importantly, no additional herpesvirus-like sequences were detected, while the previously determined ChHV5 sequences were confirmed, including those of the unconventional genes. Finally, DNA from several FP-tumors of Australian origin (kindly provided by Dres. Graham Burgess and Ellen Ariel) was tested by PCR. The presence of F-UL27, F-M04, and F-sial sequences in these extracts was confirmed (data not shown). Yet, the Australian UL27 nucleotide sequence differed from our Hawaiian sequence by a characteristic CGACTC-insert that has been identified previously [4]. Moreover, sequence analysis of the amplification products obtained from the Australian templates revealed at least two consistent base chances in the F-M04 gene and three within the Fsial gene (data not shown).
In conclusion, we provide new sequence information about ChHV5, which likely covers the entire viral genome. Interestingly, a series of genes was detected throughout this work that are very uncommon for a candidate member of the alphaherpesviruses. While transcripts of selected viral genes that are known to be active during herpesvirus replication were not detected in RNA extracts from tumor tissue, at least two of the newly detected ChHV5 genes, i.e. F-M04 and F-sial, were not merely present but rather expressed in the tumors. This observation suggests that they indeed may play a role in the pathogenesis of FP. Due to their predicted biological activities, it may be speculated F-sial might be involved in tumor formation, whereas F-M04 might protect the infected cells from NK cell activity. The latter notion might imply that MHC-I presentation is also affected in the tumors, which then would interfere with the successful development of anti-tumor vaccines.

Samples from green sea turtle and nucleic acid extraction
All samples were taken post mortem from animals with terminal FP. DNA for BAC cloning was extracted from a glottis tumor of a green turtle (Chelonia mydas) that stranded in October 2003 on Oahu, Hawaii. Additional DNA was extracted from skin tumors of three green turtles from Oahu that also stranded there in October 2003. DNA for overlapping PCR as well as RNA extracts originated from tissues (normal skin and skin tumor, normal lung and lung tumor, normal heart and heart tumor) of a green turtle on 30 May 2007. DNA was extracted using Qiagen genomic tips (Qiagen, Valencia, CA) according to standard procedures and stored at 280uC until further use. For RNA extraction, samples of non-tumored tissues and corresponding tumors were cut into small pieces, ground and homogenized in RTL solution (Qiagen) before being extracted using the Qiagen RNeasy kit from (Qiagen). RNA was resuspended in DEPC-treated water and stored at 280uC until further use.

Detection of ChHV5 DNA
For the determination of the viral DNA load in turtle tissues, the method of Quackenbusch [57] was used without any modification. The primers and probes used are listed in Table 1.

Generation of BAC library
A sample from the frozen glottis tumor of a green turtle was ground under liquid nitrogen using mortar and pestle, resuspended in CIB buffer (20 mM NaCl, 80 mM KCl, 15 mM Tris-HCl, 0.5 mM EGTA, 2 mM EDTA, 0.2 mM Spermine, 0.5 mM Spermidine, 18 mM b-Mercaptoethanol, pH 7.2) and homogenized using a Dounce homogenizer. The resulting cell and tissue suspension was mixed with 1% InCert agarose (Cambrex) for high molecular weight DNA extraction. The agarose embedded DNA was partially digested with a combination of EcoRI restriction enzyme and EcoRI methylase before being size fractionated by pulsed-field electrophoresis. DNA fragments from the appropriate size fraction were cloned into the pTARBAC2.1 vector [58] between the two EcoRI sites. The ligation products were transformed into DH10B (T1 resistant) electro-competent cells (Invitrogen) [26]. The library was arrayed into ninety-six 384-well microtiter dishes and subsequently gridded onto two 22622 cm nylon high-density filters for screening by probe hybridization. Each hybridization membrane represented more than 18,000 distinct BAC clones, stamped in duplicate. Sample of random clones were digested with rare cutting NotI restriction enzyme (New England BioLabs) to estimate the average insert size.

Identification of BAC CH-651-60O9
Overlapping oligonucleotides UL12-OVa and UL12-OVb (Table 1) were annealed to each other, ends were filled in with Klenow polymerase fragment, and 32 P-ATP and 32 P-CTP were added to the reaction to produce the 40-base pair 32 P-labeled double stranded (ds) UL12-probe. Likewise, oligos UL22-OVa/ OVb were used to produce 32 P-dsUL22 probe whereas oligos UL30-OVa/OVb were used to produce 32 P-dsUL30 probe. All three probes were purified separately on MicroSpin TM G-50 columns (Amersham), mixed, denatured, and hybridized to the nylon membranes with the BAC clones arrayed in duplicates. Two pairs of PCR primers (Table 1), UL10-F, UL10-R (expected product size 388 bp) and UL28-F, UL-28-R (expected product size 395 bp) were used to confirm presence of herpes DNA in clones identified as positive by hybridization.

Sequencing of BAC CH-651-60O9
DNA sequencing was performed using the di-deoxy chain termination sequencing method on a shotgun library approach using pCR4bluntTopo (Invitrogen). Plasmid subclones were cyclesequenced with Big-Dye terminator version 1.0 reagents (Applied Biosystems) and analyzed on a MegaBace 1000 sequencer (Amerham Biotech) or a ABI 377 sequencer (Applied Biosystems). Computer-assisted assembly was done with Lasergene SeqMan (DNASTAR Inc.).

Overlapping PCR
Between 80 and 160 ng of DNA were used per PCR reaction. DNA was amplified with the primers listed in Table 3 in the presence of 10 nm dNTP, DMSO, 5x Phusion GC buffer and 2 U of Phusion polymerase (New England Biolabs, Ipswich, MA). For amplification, the samples were denatured at 98uC for 1 min, followed by 34 cycles at 98uC for 10 sec, 60u for 20 sec, 72uC for 50 sec, and a final extension at 72uC for 5 minutes. Synthetic oligonucleotides were obtained from Microsynth (Balgach, Switzerland).

RT-PCR
Between 100 and 200 ng of RNA was used as template for the BioScript OneStep RT-PCR (Bioline, Staunton, MA). The RT-PCR mix contained the primers (Table 4) and 25 ml 2x OneStep buffer, 2 ml OneStep enzyme mix, 5 ml RNase inhibitor and DEPC-treated water up to a volume of 50 ml. After cDNA synthesis at 50uC for 60 min and inactivation of the RT-activity at 95uC for 10 min, the samples were subjected to the same PCR protocol as described above. Negative controls included DEPCtreated water without RNA and normal amounts of RNA but with the OneStep enzyme mix replaced by Phusion DNA polymerase, which was considered inactive for RT-activity. In a parallel reaction, the Phusion enzyme was used with a DNA template to control for its DNA polymerase activity. Figure S1 UL-52 alignment tree. The CLC Main Workbench 5 software was used for this purpose. A pBLAST search, using the F-UL52 aa sequence for query, provided a list of viruses specifying related proteins. The viruses emerging on top of this list were used to create the progressive multiple alignments underlying this tree. Taxonomic status of these viruses and Swiss-Prot accession numbers: Alphaherpesviruses: Equid herpesvirus 1 (EHV1; P28962.1), Bovid herpesviruses 1 and 5 (BoHV1, Q65817; BoHV5, Q6X264), Suid herpesvirus 1 (SuHV1, pseudorabies virus, PRV, Q5PP97; SuHVK, Kaplan strain of PRV, Q85228), Cercopithecine herpesvirus 16 (CeHV16; also known as papiine herpesvirus 2 of the baboons, which belongs to the genus simplex virus, Q2QBC3), Infectious laryngotracheitis virus (ILTV; Iltovirus of chickens, Q9YZA1), Melagrid herpesvirus 1 (MeHV1; herpesvirus 1 of Turkeys, Q9DGY9), and Gallid herpesvirus type 1, (GaHV1, Q9IBS6), Gallid herpesvirus type 2 (GaHV2, Q9E6M4), and Gallid herpesvirus type 3 (GaHV3, Q782P1), Anatid herpesvirus 1 (AnHV1; alphaherpesvirus infecting ducks, geese and swans, B4XS04).