Expression and high levels of insertional polymorphism of an endogenous gammaretrovirus lineage in dogs

Despite the absence of a confirmed exogenously replicating retrovirus in Canis lupus familiaris (C. familiaris), past retroviral infections are evident in the genomes of living animals via the presence of endogenous retroviruses (ERVs). Although gammaretrovirus-like transcripts and enzyme activities were previously reported to be present in canine leukemias and lymphomas, those findings were not further explored. Initial analysis of the C. familiaris reference genome revealed a minor subset of one ERV lineage, classified as CfERV-Fc1(a), or Fc1(a) here, with features characteristic of recent integration, including the presence of ORFs and identical or nearly identical LTRs. Our previous analysis of whole genome sequence data belonging to extant Canidae revealed a burst of past infections in Canis ancestors resulting in numerous young, polymorphic, and highly intact loci now segregating in dogs. Here, we demonstrate the expression of full-length Fc1(a) proviruses in tissues collected from healthy animals and from animals with cancer. We observed significantly higher expression in samples of dogs with various cancer diagnoses when compared to samples from healthy dogs. Genotyping of insertionally polymorphic Fc1(a) loci identified candidate expressed proviruses and delineated distributions over sample groups. Collectively, the data show that Fc1(a) proviruses retain biological activity in the domestic dog and provides a means to examine potential genetic links with disease states in this species.


Introduction
The replication cycle of retroviruses is unique among mammalian viruses in the obligatory step of integrating a chromosomal copy of the infecting viral genome as a provirus.Consequently, infection of germline DNA may lead to a provirus that is transmitted to the host's offspring, referred to as an endogenous retrovirus (ERV) [1].Following integration, a canonical full-length ERV contains the viral open reading frames (ORFs; gag, pro/pol, and env) flanked by regulatory segments referred to as long terminal repeats (LTRs; 5'LTR, 3'LTR) that are identical upon integration and contain regulatory elements that direct proviral transcription [2][3][4].The vast majority of ERVs have lost infectious capacity due to accumulated changes, indels, or truncation (the latter referred to as 'near' full-length), or from recombination between the proviral LTRs resulting in a solo-LTR.ERVs have contributed to abundant proportions of many species' genomes [1,5].For example, over 8% of the human genome is recognizably of retroviral origin.The genomes of several species, including human, are known to harbor ERV lineages with evidence of recent or ongoing germline invasion [5].Members of these lineages appear more intact and are include 'young' insertions, as evidenced by insertional polymorphism (i.e., the presence of unfixed loci of 'insertion' as well as 'empty' alleles in the sampled population), high sequence identity between integrants, the presence of one or more open reading frames (ORFs), or encoded function(s) [6], raising the possibility of retained viral gene functions or LTR derived regulatory properties.
Previous analyses of tumors and affected tissues of immunosuppressed representatives of the domestic dog (Canis lupus familiaris) led to several reports of observations of retroviruslike associated products.For example, retrovirus-derived RNAs and reverse transcriptase enzyme activities were isolated from tumors, and supernatant filtrates that contained particles of C-type morphology consistent with a γ-like retrovirus were observed in dogs with cancers including leukemia [7][8][9][10][11], lymphoma [12,13], as well as other malignancies or severe immunosuppression [14][15][16]].An explanation stemming from these works was the existence of a pathogenic canine retrovirus capable of contributing to cancers or immune suppression as was known to occur in other species [14].However, possible contamination of tissues by infectious retroviruses from other species was not excluded following those reports, and there has paradoxically never been a confirmed infectious retrovirus in dogs.An alternative explanation for the presence γ-like retroviral products could be offered by the deregulation of ERVs resulting in their transcription or subsequent translation of encoded products therein.In this regard, several mammalian species harbor ERV lineages that include members with retained biological activities or even the ability to produce infectious virions, including but not limited to humans [1], mice [17], and cats [18].
Based on initial analysis of the C. familiaris reference genome assembly from a female boxer breed dog, 'Tasha', (first released in 2005) [19], ~3.6% was ERV-derived [19], later estimated to include ~0.15% from full-length or near full-length proviruses [20].Analysis of that genome build revealed the majority of ERVs were inactivated and estimated to have colonized dog ancestors in times ranging from roughly 12.5 to 25 million years ago (mya).However, two γlike full-length ERV-Fc-related proviruses (so-named for inferred use of a tRNA phe to prime reverse transcription) were identified with putative ORFs and low LTR-LTR divergence and were therefore presumed to be considerably younger [20].The significance of these observations was previously uncovered in a comparative analyses of available whole genome data of domestic and wild canids against the CanFam3.1 reference build, leading to discoveries of numerous young loci and recent bursts of germline infections from members of an ERV-Fc related lineage, 'CfERV-Fc1(a)' (Fig 1) [21].For brevity we refer to this lineage as 'Fc1(a)' throughout this text.The Fc1(a) ERV lineage first entered the canid germline prior to the split from the red fox (Vulpes vulpes) ~12 mya [22] and continued to infect dog ancestors until within the last ~400ky [21].The invading exogenous form was a recombinant virus derived from the ERV-Fc (5' LTR, gag, pro/pol, 3'LTR) and distantly related ERV-W (env; human syncytin-like) γ-like groups, likely having originated from co-packaging of RNAs from unrelated source proviruses in the infecting virion [22].The Fc1(a) ERV-W env is predicted as belonging to the RD114-and-D-type-retrovirus (RDR) interference group [23]; its acquisition likely contributed to altered pathogenic properties of the Fc1(a) progenitor, thus facilitating transmission to canid ancestors [22].
Our previous characterization of Fc1(a) loci in the genomes of modern breeds, as well as orthologous comparisons of representative genomes of all living Canis spp., revealed that past spread of this lineage has contributed to numerous polymorphic loci in contemporary animals (Fig 1 ).For example, nearly half of 145 genotyped loci, or 46.2%, were found to be insertionally polymorphic across Canis spp., including 40 loci that were deemed as variably present in modern breeds [21].These insertions tended to have short branch lengths and low group divergence (S1 Fig) .Nineteen full-length or near full-length proviruses have been currently annotated and include full-length members predicted to have at least one ORF, as well as multiple insertions of identical sequences between 5' and 3' LTRs [21].Proviruses with the shortest branch lengths tended to possess an uninterrupted env ORF (bold envelope symbols in S1 Fig), implying its function in Fc1(a) spread.The LTRs of the youngest full-length insertions also tended to pair on sister branches, and subsets shared identical LTR sequences with solo-LTRs.These collective lines of evidence point to recent infectious capacity and raise the possibility that replication-competent Fc1(a) viruses recently (and may presently) exist within living animals, or have the capability to facilitate 'breakout' of an infectious recombinant [24].
ERVs are known to impact both genome structure and physiological functions of the host.In this regard, the most recent integrants, or those not yet fixed within a host species, should have the most potential for biological impact having been recently introduced to the genome and not yet subject to long-term evolutionary pressures.Thus, the highly variable presence and recent addition to the domestic dog genome also lends candidacy for Fc1(a) loci possessing such properties for interference of normal gene regulation or aberrant transcriptional effects due to their integration within 'new' genomic contexts, noting many are situated within genes or regions proximal to predicted dog gene models [21].For example, of the annotated repertoire of Fc1(a) insertions, roughly 44% (69 loci) are within 25kb of an Ensembl dog gene model; roughly 18% are located within introns of genes (27 loci) and one is present in the 3' untranslated region of a gene exon [21].
Given the relatively recent formation, the presence of viral ORFs, and 'intactness' of members of this lineage, we therefore asked if Fc1(a) retained the potential for biological activity.In this regard, the deregulation of the Fc1(a) lineage would lend itself a likely candidate for the previously reported γ-like retroviral activities in canine leukemias, lymphomas, or other malignancies.In this study, we provide evidence for expression of Fc1(a) derived sequences in total  [21].Gene density, karyotype, and marker files from NCBI [19] were used to generate gene karyotype and density using a personal python script.The output files were implemented to RIdeogram [37] to visualize gene density over individual chromosomes, excluding unplaced contigs (chrUn) from analysis.Dark blue triangles show placement of insertions that are fixed among Canis spp.; dark red squares denote insertions previously deemed as insertionally polymorphic among modern breeds; orange squares mark additional insertions that are insertionally polymorphic in wild canids; teal circles show loci for which a provirus allele has been confirmed.Gene density is represented by a heatmap with gene poor regions in blue and gene dense regions in red.RNAs from canine cell lines as well as total RNAs isolated from distinct tissues from healthy dogs.By focusing on env-derived transcripts, we show that expression from Fc1(a) proviruses is also detected in tumor tissues from dogs diagnosed with chronic and acute leukemias or lymphomas, as well as melanomas, osteosarcomas, and others.The detection of sequences corresponding to Fc1(a) env was significantly elevated in tumor tissues as compared to blood samples from healthy dogs.Genotyping of Fc1(a) insertions in dogs of modern breeds, including near or full-length proviruses and solo-LTRs, indicates candidate source proviruses of this expression, and delineates the prevalence of insertionally polymorphic Fc1(a) loci across samples as well as their presence in the context of genic locale.As ERV presence and biological activities in other species are known to impact genomic variation and disease this work highlights the possibility of similar consequences in dogs.

Results
The env gene is most present among annotated Fc1(a) proviruses Of the proviruses, 11 are present within the CanFam3.1 reference and 8 additional insertions were obtained in non-reference discoveries briefly described above [21] (S1 Table ).A nucleotide alignment of the 19 proviral integrants characterized in Canis spp.yielded an inferred consensus, Fc1(a) CON [21,25] (S2 Fig) .Briefly, it contains complete open readings frames (ORFs) for gag, pro/pol and env, putatively intact structural regions and motifs that would be necessary for replication competency, a tRNA Phe primer binding site, and identical 5' and 3' LTR segments.Thus, in principle, few changes would be necessary to generate a putatively replication competent virus.
We assessed the 19 currently known Fc1(a) proviruses for the presence of reading frames as well as motifs therein that would be predicted as necessary for viral gene product function (summarized in S1 Table ).The env reading frame was the most frequently present of the viral genes among the Fc1(a) full-length integrants; seven of the 19 possessed clear env ORFs, whereas six displayed a putative pro/pol ORF and none possessed an intact gag.The seven env genes ranged from 98.9% (insertion at chr6:47,934,941) to 99.8% (chr12:869,873) nucleotide identity to the Fc1(a) CON env; the translated Env products are highly similar in sequence to the Fc1(a) CON , sharing from 98.4% (9 changes, insertion at chrX:50,661,637) to identical (chr12:869,873) amino acid content.Aside from the chrX provirus, all but one possessed apparent unaltered sequence motifs required for function (chr5:10,128,780; RRKR➔WRKR in its furin cleavage site), including identical RDR-like motifs predicted to be involved in receptor interactions, and at most 3 changes in the translated amino acid sequence (chr13:32,380,539); the chrX Env has incurred changes in predicted ISD and TM motifs [21].Due to its conservation and apparent intactness in the majority of the most recently integrated proviruses, we therefore examined expression of Fc1(a) env in canine cell lines and tissues.

Fc1(a) transcripts are detected in canine cell line RNAs
To first establish our approach for detection and measuring expression, we investigated the presence of env-containing transcripts over total RNAs isolated from canine cell lines of A72 fibrosarcoma, DH82 histiocytic sarcoma, D17 osteosarcoma, and MDCK derived from healthy kidney.Utilizing conserved regions within the Fc1(a) CON env ORF, primers were designed to amplify a 300 bp region within the SU domain (see Methods, S2 Fig) .The selected region is absent from 10 of the 19 annotated proviruses, which instead either possess a common deletion of 1073 bp (env Δ1073bp ; eight of the 10) or another env deletion disrupting env (two of the 10; also refer to S1 Table ).Thus, our analysis was limited to an expressed env region present in annotated and not-yet annotated Fc1(a) proviruses.
Gel electrophoresis of PCR amplicons with the env-directed primers confirmed expressed Fc1(a) in cDNAs synthesized from total RNAs isolated from the A72, DH82, and D17 cell lines (Fig 2A).The products were confirmed by Sanger sequencing.Despite multiple attempts, env amplicons were not visible by gel electrophoresis from MDCK cDNA.To further characterize env transcripts present in cell line cDNAs, we performed quantitative PCR (qPCR) using primers directed to the Fc1(a) env gene sequence and GAPDH as an endogenous control, with each sample averaged over triplicate runs (Fig 2B).Comparative CT values for env transcripts were detected at moderate levels from DH82 cells (2 -ΔΔCt of 0.63-fold), followed by A72 and D17 cells (2 -ΔΔCt of 0.254-fold and 0.249-fold, respectively).Consistent with PCR attempts, quantitative levels of env present in MDCK cDNA were several-fold decreased compared to the tumor-derived cell lines (2 -ΔΔCt of 0.036-fold).
Due to the high number of insertionally polymorphic loci known to be present in domestic dogs [21], the observed differences in env expression could possibly be explained by variability in presence of Fc1(a) proviruses in each cell line.To discriminate such possible source proviruses contributing to expression, we genotyped the allelic presence of env-possessing proviruses from genomic DNA isolated from each cell line.For these purposes, two PCRs were run per locus.The first PCR included a locus-specific primer designed to target the flanking genomic region that was paired with an Fc1(a)-specific internal primer to infer the presence of a full-length insertion; a second PCR included primers flanking each LTR-genome junction to detect the presence of the solo-LTR or unoccupied allele.Representative products for each insertion were sequenced and aligned to the CanFam3.1 genome build to confirm the correct amplified products and flanking genomic region for each locus.The obtained genotypes were tabulated and plotted for assessment (Fig 2C).As anticipated, the chrX:50,661,637 provirus was detected as a provirus in all cell lines, and none possessed the unoccupied allele or solo LTR at that locus.The remaining insertions were variably present, with the highest representation in the A72 cell line (another five loci had at least one env-possessing provirus present), followed by DH82 (another four loci), and D17 and MDCK (another three loci).Of note, the genotypes obtained for D17 and MDCK were in similar agreement, the only difference being the chr12:869,873 locus genotyped as provirus/solo-LTR in D17 that was present in homozygous state in MDCK.Given the similarity in env presence between the two cell lines, this latter observation suggests differences in env expression due to cell line-specific modulations.Also, as immortalized cells there are likely chromosomal aberrations or other alterations present that could consequently influence Fc1(a) expression, for example effects to gene regulation such as genomic methylation patterns or varying rates of cellular division [26].

Fc1(a) env RNAs are expressed in healthy adult canine tissues
The expression of several ERV-derived lineages in humans and non-human animal models has been shown to differ between tissues associated with disease or cancers, as well as within tissues corresponding to normal physiological states.Given our detection of Fc1(a) env transcripts in cultured cell lines, we next examined env expression in tissues from healthy dogs using the same approach.To assess putative patterns of in vivo transcription of env, we performed qPCRs of Fc1(a) env in cDNAs synthesized from a panel of 14 tissue types from adult beagle dogs.Individual qPCRs were run for each tissue type targeting env and the obtained values calibrated using the corresponding GAPDH values as an internal control.We observed variable levels of env across tissue types, with relative expression varying from 2 -ΔΔCt of 0.006-fold (pancreas) to 2 -ΔΔCt of 2.63-fold (spinal cord) (Fig 3).Between the two there was wide variability in env expression across tissue types relative to GAPDH, with two tissue types over a difference of 2-fold (spinal cord, cerebellum).Collectively these results confirm the Fc1 (a) env is expressed within the total RNAs of canine tissues and suggest Fc1(a) promoters may be active in distinct tissue types.

Fc1(a) env expression tends to be elevated in canine tumors
In principle, because younger ERV integrants result from germline invasion of a relatively recently infectious source, they are likely candidates to have retained the potential to be biologically active.Previously, transcripts, retroviral enzymes and other products were characterized in cancer-associated tissues sampled from dogs with lymphoma or leukemia or severe immunosuppression [7][8][9][10][11][12][13][14][15].We reasoned a possible contributing source of these observations could be due to expressed Fc1(a) insertions given the recent germline colonization and intactness of the group.We therefore expanded our study to measure Fc1(a) expressed env levels in tumors sampled from 19 dogs with various cancer diagnosis (S2 Table ).
Our analysis confirmed that cDNAs corresponding to env were detectable at variable levels across all samples (Fig 4 ).Expression in blood samples from healthy dogs ranged from 2 -ΔΔCt of 0.53-fold (DHTHY-1401) to 1.38-fold (DHTHY-0301) relative to the endogenous control.In contrast, relative env levels in tumor samples were highly variable and tended to be elevated compared to levels from healthy samples.For example, increased fold-differences in expression were observed in 15 of the 19 tumor samples compared to the average of healthy blood samples.Nine of 19 tumor samples were over 5-fold increased relative to background and six of those nine were >10-fold increased.Most of these samples were from lymphoma affected tissues, though it should be noted these were the most abundant cancer type examined (seven of the 19).The highest increased fold differences in env expression were observed in samples from dogs diagnosed with chronic myelomonocytic leukemia, melanoma, and osteosarcoma, with 2 −ΔΔCt values of 22.02-fold (DCML-0401), 48.51-fold (DMEL-1101), and 1,402.29-fold(DTM-1401), respectively.Treating the lattermost fold expression from DTM-1401 as an outlier and excluding it from further calculations, the overall average env expression within tumor samples was 2 -ΔΔCt of 8.53-fold.The relative expression of env was significantly higher in the cancer samples than in the blood samples from healthy dogs using a t-test (p<0.002),again after excluding DTM-1401 (Fig 5).Collectively, these data indicate Fc1(a) env expression was elevated but variable in presence and relative level in malignant tissues compared to blood or other tissues from healthy dogs.

Fc1(a) proviruses are insertionally polymorphic in healthy and tumor tissues of dogs
The observed variability in env transcript levels among our sample set could be due to differential contributions of expression among insertionally polymorphic Fc1(a) loci.To assess whether the total candidate source Fc1(a) elements in each sample correlated with total env expression, we therefore sought to genotype the presence of the env-possessing full-length elements from genomic DNAs of the same samples, as well as extend the analysis to additional full-length elements to obtain a snapshot of prevalence across samples (S1 Table ).We obtained genomic DNA from cryopreserved cells or tumor tissues from 13 of the 19 samples collected from dogs with cancer, as well as from blood from three healthy canines.These samples were Consistent with previous findings of Fc1(a) presence as detected in modern breeds [21], five of the 14 insertions were fixed as provirus alleles in all samples, one of which (the chrX:50,661,637 integrant) represented the only Canis-fixed provirus with a predicted env ORF.As expected, we observed variability in the presence/absence of the remaining nine Fc1 (a) proviruses between samples, including the remaining loci with clear env reading frames, as well as variable frequencies across all samples screened.Most of the Fc1(a) loci were heterozygous for the provirus and empty alleles, with few instances of samples homozygous for the provirus insertion (e.g., chr12:869,873: DHTHY-0301, DLYM-0911), or samples with variable presence of the insertion allele and/or empty site (e.g., chr4:22,610,555: DCML-0901).Aside from the provirus on chrX, there were 10 samples with at least one additional provirus predicted to possess an env reading frame.All proviruses were present in at least one sample and considerable allele frequency variation was observed for the nine unfixed loci.No one sample possessed all seven Fc1(a) proviruses with putatively intact env.We note that while the presence of env RNAs in these samples can be accounted for by LTR activation, the possibility of their presence as part of a host mRNA is not excluded (see Discussion).Further insight into the transcriptional regulation of the Fc1(a) insertions will require more comprehensive analyses.

Members of the Fc1(a) lineage exhibit high levels of insertional polymorphism in dogs
A provirus contains regulatory sequences for its own transcription within its LTRs.Since the LTR sequence is predicted to be preserved during recombination resulting in solo-LTR formation, the recombinant allele likewise maintains the same transcriptional potential.As such, LTRs have been shown to impact genomic function in humans and other animal models by altering local or long-range gene regulation via introduced promoter, enhancer, or other regulatory activities (recently reviewed in [27]).The genomic distribution of Fc1(a) includes https://doi.org/10.1371/journal.pgen.1011083.g006members within or near genes and display varying allelic frequencies in the dataset [21], thus presenting the opportunity for a preliminary examination for putative relationships with disease or phenotype.
Of the Fc1(a) 157 insertions currently annotated in Canis spp., 145 insertions were previously deemed as having resolvable integration sites and were genotyped in 332 genomes of dogs and wild canids, including 137 modern breeds [21].In that study, genotypes were inferred in silico by re-mapping Illumina reads over recreated alleles corresponding to the insertion or unoccupied states, permitting estimates of prevalence for individual Fc1(a) insertions.Occupied alleles for a total of 40 insertionally polymorphic Fc1(a) loci were detected in the genomes of modern breeds (Fig 1 ), with allele frequencies ranging from less than 0.01 (single copy detected in 137 dogs) to 0.85 [21].We examined these 40 Fc1(a) insertions in an expanded PCR-based genotyping analysis to assess the distribution of Fc1(a) alleles across samples in our present study, as well as draw comparison to previous estimates among modern breeds reasoning the comparison was most representative of expectations.We added a subset of loci previously deemed as either absent from or fixed in modern breeds by in silico genotyping or that possessed a provirus allele; this resulted in a total of 54 sites examined (S1 Table ).For screening, genomic DNA from tumors of an additional 13 diagnoses were included for a total of 29 samples examined (S2 Table ), thus providing an extended view of Fc1(a) genotypes among samples from dogs with distinct cancers.Raw genotypes (S1 Data) were scored and plotted for assessment after ordering by inferred in silico allele frequencies for each Fc1(a) locus (Fig 7).
The overall frequencies observed for each Fc1(a) locus were generally consistent with previous in silico estimates [21].For example, three annotated Fc1(a) that were previously identified in wild Canis spp.only (e.g., wolves, coyotes, and jackals; orange squares in Fig 1 ) were likewise absent in our screens (e.g., chr7:6,132,731, chr9:29,215,798, chr13:10,866,647).Also consistent with previous findings were eight sites fixed among Canis spp., with exception of a solo-LTR (chr11:71,939,089) which was heterozygous in four of the samples.As with our initial screens, the 14 sites confirmed to have a provirus allele were variably present and no individual possessed all seven Fc1(a) proviruses with a putatively intact env.Treating either insertion allele as presence (i.e., provirus, solo-LTR) and the unoccupied as absent (i.e., empty), overall allele frequencies ranged from 0.017 (chr5:80,814,713, single copy) to 0.91 (chr28:721,579) across the loci genotyped.Thus, the majority of Fc1(a) insertions were detected in frequencies consistent with expectations [21].Some putative trends were observed by genotyped presence corresponding to sample group as well as deviation from expectations as inferred from previous estimates in modern breed individuals.Among putative deviations were two proviruses not detected in our previous study, but that were genotyped here as heterozygous (provirus, empty) in multiple samples: chr5:10,128,780 and chr33:22,146,581.Inconsistencies with previous estimates were also observed for loci having presence or absence of a solo-LTR, examples include but are not limited to chr17:30,368,796, chr22:45,483,070, chr20:16,677,142 (also refer to Fig 7; see Discussion).No additional locus was found to possess a provirus allele among the genotyped samples.
Our genotyping strategy was two-fold in that it permitted a preliminary assessment of locus prevalence between groups as well as examination of the placement of Fc1(a) loci within or near gene locales in the clinical context.A relationship may be drawn by the genetic association of the presence of an inherited provirus or solo-LTR to a particular phenotype or disease state, as has been characterized in other animals, for example in mice [28].Given the observed variability in presence, high levels of sequence similarity, and putative functional conservation, the Fc1(a) lineage presents candidates for such a scenario in the domestic dog.Excluding loci fixed in all samples, of the polymorphic insertions present in modern breeds, eight are located within introns of predicted dog gene models.From a predicted gene model start, another two, six, and six are present within 5kb, 10kb, and 25kb, respectively (S1 Table; also see Discussion).Though preliminary, these data collectively show that ERV-derived elements, including proviruses that we find to be capable of contributing to biological expression, have impacted structural genome variation in dogs and provide an unexplored means to examine potential genetic links with disease states in this species.

Discussion
ERVs have the potential to generate viral products, or to alter normal host gene expression through the contribution of a promoter sequence from an LTR.In this regard, lineages having 'young' members are of interest given inferred promoter activities as well as presence within new genomic contexts.With few exceptions (for example, see [29,30] and as discussed below), in comparison to other animals ERV biological activities in dogs have been relatively understudied and there is a deep gap in our understanding of such processes in this model species.We previously characterized the emergence and expansion patterns of the γ-like Fc1(a) ERV lineage revealing numerous loci with insertional polymorphism in extant Canidae including domestic dogs [21].An inferred Fc1(a) progenitor was intact, and sequence comparisons revealed that the youngest proviruses should require few mutations to restore the lineage consensus.Expression from such relatively recent ERV integrants has been linked to disease in humans and other animals (reviewed in [1,31]).In this regard, expression of Fc1(a) members would be consistent with activation contributing to, at least in part, previous reports of γ-like retroviral products in tissues of canine cancers, thus motivating our study.Our findings provide a precedent of further examination of the biological impact and co-evolutionary dynamics of between retrovirus and host in the underexplored system of the dog.
The env gene is most frequently present among young Fc1(a) proviruses with seven possessing an uninterrupted ORF of which a majority are without obvious changes that would alter function [21].We examined env expression in cDNA synthesized from canine cell lines derived from various sources.While this approach was thus focused on proviruses that possess an env gene (S2 Fig), the possibility of expression from proviruses with an env-disrupting deletion is not excluded (refer to S1 Table ).Moderate levels of env transcripts were detected in cell lines, of which MDCK consistently had the lowest level of expressed env observed.To interrogate the presence of source candidates, we genotyped Fc1(a) proviruses predicted to possess an env ORF using genomic DNA from each cell line.Multiple env-possessing Fc1(a) proviruses were confirmed as present and genotypes varied between samples.A72 and DH82 possessed proviruses at six and five loci, respectively, and D17 and MDCK shared the same four.Particularly given the similarity in provirus content between these latter two cell lines, we interpret the differences in env expression to indicate differences in contributing factors to the relative env transcript levels.For example, inherent differences in the tissue type from which each line was derived, increased proliferation with shortened cell cycle times, possible genomic rearrangement, or malignant phenotype.The potential of contribution of env transcripts from not-yet annotated copies cannot be ruled out.Expanding our analysis to examine Fc1(a) env expressed in healthy adult dog tissues revealed variable levels of expression across tissue types, with increased fold differences in tissues from spinal cord, cerebellum, skin, liver, and lung.The highest increased fold differences of expressed env in our analysis were observed in neurological-related tissues of the spinal cord and cerebellum.Though the consequence is unclear, squares) or solo-LTR only in addition to the empty, or unoccupied allele.Insertion coordinates are shown at left and correspond to the leftmost insertion breakpoint in CanFam3.1.https://doi.org/10.1371/journal.pgen.1011083.g007elevated levels of ERV expression in the nervous system has been reported in other animals [32,33].
Deregulation of ERV proviruses is well-known to occur in cancers and other disease states in humans and other animals [1].To examine whether similar patterns would be observed for Fc1(a) proviruses, we analyzed Fc1(a) env presence in total RNAs of healthy tissues as well as tumor samples of diagnosed cancers in canines.We observed variable patterns of expression that tended to be elevated in tumors (p<0.02).Five of the seven lymphomas, the most represented cancer examined, had increased env of 5-to 10-fold difference compared to cDNAs of healthy tissues.Greater fold differences in env expression were respectively from a metastatic melanoma (DMEL-1101) and chronic myelomonocytic leukemia (DCML-0401).The latter sample DCML-0401 had increased env relative to other samples from shared diagnosis and was in blast crisis; this most progressed phase is typically accompanied by cell fusions and large scale chromosomal aberrations, thus likely impacting gene regulation [26,34].The highest increased fold difference of env expression was measured from an osteoblastic giant cell tumor (DTM-1401).Genotyping inferred candidate sources of at least one Fc1(a) provirus with a predicted env ORF for each sample.Though there was variability in the presence of insertionally polymorphic Fc1(a) proviruses, the genotyped DCML samples shared three such env-possessing proviruses (also see below).
Our findings are supported in part by a previous study by Cornelis et al., [29] in which env ORFs, including from Fc-related and unrelated ERV groups, were assessed for expression as candidate genes for a carnivore syncytin, leading to the identification of an ERV-R env derived gene, 'syncytin-Car-1', for roles in placentation.In that study, variable expression between distinct tissue types was observed in quantitative RT-PCRs over a similar composite tissue panel including the candidate chrX:50,661,637 Fc1(a) env ('canis-env3' in that report).An in silico examination of the primers used in that study suggests specificity to the chrX env target alone, such that other Fc1(a) proviruses were most likely excluded from detection.We suggest expression of the chrX provirus may thus contribute, at least in part, to the Fc1(a) env transcripts we observe in tissues.Other support for Fc1(a) expression comes from the recent mapping of CfERV cDNAs of oral malignant melanoma (OMM) transcriptomes, in which Kitao et al. identified source contributing proviruses [30].Subsets of these OMM transcripts mapped to Fc1(a) internal sequences, including the abovementioned chrX env, the chr11:12,752,994 gag region (premature stop predicted in CanFam3.1) and pol ORF, as well as the chr3:82,194,219 gag and env regions (also premature stops).As mapping of cDNAs was limited to proviruses present in CanFam builds [30], any contribution to total OMM RNAs from nonreference proviruses was not further explored.
While the presence of Fc1(a) RNAs both here and elsewhere are presumed to be at least in part accounted for by activation of LTR promoter functions, functional analyses of promoter activities of CfERV LTRs in general is currently lacking, and the possibility of intronic transcription as part of a host gene for at least some ERV loci is not excluded.In this regard, one provirus that has predicted pol and env ORFs (chr26:35,982,438; one of four Fc1(a) with both pol and env ORFs) sits within the dog gene model for PRKG1 (also discussed below).This gene is predicted to encode three isoforms of cyclic GMP-dependent protein kinase with roles in signal transduction.The provirus was genotyped as present in a subset of samples in heterozygous state (i.e., provirus/empty), thus offering a candidate for such a scenario of passive expression.Additional Fc1(a) solo-LTR insertions within genes include, but are not limited to, homologs involved in tumor suppression (EXT1) and immune functions (WDFY4) (S1 Table ).Further insight into the transcriptional regulation and functional context of the Fc1(a) insertions, for example by the presence of unspliced or spliced transcripts and pinpointing all source loci will require more comprehensive analyses based on expression.Functional analyses in this context are ongoing.
As LTRs are known to contain regulatory sequences for host transcriptional machinery, gene-proximal insertionally polymorphic Fc1(a) loci lend candidacy for alteration of normal gene regulation.For example, the abovementioned chr26:35,982,438 provirus.The insertion is present in 17 of the 26 samples from dogs with cancer and its allelic presence is consistent with the detection of proviruses genotyped at chr5:10,128,780 and chr5:78,331,579 among all samples analyzed from chronic myelomonocytic, chronic lymphocytic, or acute lymphocytic leukemias (DCML, DCLL, and DALL, respectively), indicating possible trends in Fc1(a) presence per sample group.A similar example, among others, is the chr17:30,368,769 solo-LTR that was conspicuously present by genotyping among all but two tumor samples in predominantly homozygous states, though the significance of this observation is unclear.Other solo-LTRs are proximal to genes with functions involved in gene regulation (EED, BUD13, TCF19), cytoskeletal dynamics (PLEKHG4), and others (S1 Table ).In this regard, understanding the properties of Fc1(a) LTRs as transcriptional promoters will not only shed light on functions as drivers of proviral expression but also impact to genes located nearby, particularly given their conservation in sequence between loci.The high levels of insertional polymorphism of this young ERV lineage are highly suggestive of a potential for biological impact.We thus present a preliminary view of the Fc1(a) landscape in a disease context and offer justification for future focused analysis.
In summary, our study confirms expression of Fc1(a) env in canine tissues and finds this expression is increased in tumors.We suggest the transcriptional deregulation of this ERV lineage is responsible, at least in part, for previous observations of γ-like retroviral products in cancers.By genotyping, we link these findings to candidate source loci.Several proviruses are predicted to have one or more ORFs, raising the possibility of expressed products that retain putative function.Though additional analyses are necessary to begin to draw any biological association or impact of Fc1(a) presence in this species, these findings raise questions of the relationship of this expression therein, mechanisms of activation, as well as the potential for formation of infectious virus.Though no Fc1(a) provirus identified thus far is predicted to possess ORFs for all genes, sequence analysis implies very few changes would be required to restore intactness and this scenario could, based on our preliminary analysis of existing loci, be resolved via template switch of co-packaged viral RNAs sourced from distinct loci.Indeed, the Fc1(a) progenitor itself arose as a viral recombinant from distinct source ERV groups that placed an RDR-group env in the infecting virus.Its subsequent spread in dog ancestors also resulted in multiple unique recombinant proviruses now present in the dog ERV fossil record.The possibility that rare but intact proviruses are presently segregating in dogs or related extant canids remains.

Ethics statement
All of the samples from dogs with cancer that were used for this study were obtained as part of medically necessary diagnostic procedures with written consent of the owners.No animals were harmed in the process of obtaining samples and in all cases, the disease occurred naturally and was not induced or in any way accelerated by the diagnostic procedures.The samples were collected over a period spanning more than 20 years at three institutions including the American Medical Center (AMC) Cancer Research Center, the University of Colorado, and the University of Minnesota, with approved protocols that were renewed every three years through the AMC Cancer Research Center Institutional Review Board (IRB), the AMC Cancer Research Center Institutional Animal Care and Use Committee (IACUC), the Colorado Multiple IRB (COMIRB), and the University of Minnesota IACUC.Oversight protocols included

RNA extraction
For obtaining RNA from cell lines, cultured monolayers were initially harvested in 0.25% Trypsin-EDTA (Gibco) and washed twice in 1xPBS with centrifugation at 1,000 rpm for 4 minutes (mins).For obtaining RNA from tissue samples, the tumors were sectioned and homogenized prior to lysis; for cryopreserved cells, 500 μl of each sample was utilized.RNA extraction was performed for all sources using a NucleoZol RNA extraction kit (Machery-Nagel) according to the manufacturer's protocols.Briefly, each cell pellet was homogenized by vortexing in 500 μL NucleoZol and 200 μL RNA free water and the mixture was then centrifuged 15 mins at 12,000g following a 15 mins incubation at room temperature.The supernatant was collected, from which RNA was purified using the NucleoSpin RNA Set for NucleoZol (Machery-Nagel) according to the manufacturer's instructions.Complete removal of DNA was verified through GAPDH PCR of templates consisting of extracted RNA and synthesized cDNA for all samples.All remaining RNA was stored at -80˚C.

cDNA synthesis
RNA was eluted from a NucleoSpin RNA column in 60 μL RNAse-free water and immediately subjected to a reverse transcription using random hexamer primers with the M-MuLV Reverse Transcriptase kit (New England Biolabs) following the manufacturer's protocols.All remaining RNA was stored at -80˚C and the reverse transcribed cDNA stored at -20˚C.A Taq PCR reaction was run on the extracted RNA and reverse transcribed cDNA from each sample using previously published GAPDH primers [36] to confirm the quality of cDNA following reverse transcription and the absence of genomic DNA in the extracted RNA samples.PCR reactions were performed using 0.125 U Taq polymerase (Invitrogen) in 10x buffer, 2.5 μM dNTPs, 10 μM either primer, and 2.5 μM MgCl 2 under the conditions of initial denaturation at 95˚C for 2 mins followed by 35 cycles of 95˚C for 30 secs, 59˚C for 30 secs, 72˚C for 1:15 mins, and a final extension at 72˚C for 3 mins.10 μL of the PCR reaction was visually assessed by electrophoreses in 1% agarose and 1xTBE.

PCR amplification
Synthesized cDNA was utilized as a template in PCRs to detect env expressed segments.PCR reactions were run with 1 μL cDNA in 10x buffer, 2.5 μM dNTPs, 10 μM each primer (as described above), 2.5 μM MgCl 2 , and 0.125 U Taq polymerase (Invitrogen).Reactions were performed in an Eppendorf Mastercyler under conditions of initial denaturation at 95˚C for 2 mins followed by 35 cycles of 95˚C for 30 secs, 59˚C for 30 secs, and 72˚C for 1:15 mins with a final extension for 3 mins at 72˚C.Amplified products were assessed by gel electrophoresis in 1% agarose in 1xTBE to confirm presence and amplicon size.The remaining products were purified using a Nucleospin Gel and PCR Clean-up Kit (Machery-Nagel) according to the manufacturer's protocol.Yield was assessed using a Nanodrop Lite (ThermoFisher).

Quantitative PCR
cDNA products successfully amplified from nucleic acids from cell lines, tumor tissues, and lymph tissues were subjected to a quantitative PCR (qPCR).Reactions were run using 1 μL cDNA in PowerSYBR Green PCR Master Mix (ThermoFisher), and 0.2 pM of each primer.Reactions were performed in a StepOne Real-Time System (ThermoFisher) under conditions of initial denaturation at 95˚C for 10 mins followed by 40 cycles of 95˚C for 15 secs, and 60˚C for 1 min.CT values from the triplicate runs were averaged and analyzed by using the 2 −ΔΔCt method.

Fig 1 .
Fig 1.Genomic distribution of the Fc1(a) ERV lineage.Chromosome representation of Fc1(a) insertion loci and gene density mapped to the CanFam3.1 boxer genome build.Coordinates corresponding to Fc1(a) insertions and status of insertional polymorphism are as previously identified by Halo and colleagues[21].Gene density, karyotype, and marker files from NCBI[19] were used to generate gene karyotype and density using a personal python script.The output files were implemented to RIdeogram[37] to visualize gene density over individual chromosomes, excluding unplaced contigs (chrUn) from analysis.Dark blue triangles show placement of insertions that are fixed among Canis spp.; dark red squares denote insertions previously deemed as insertionally polymorphic among modern breeds; orange squares mark additional insertions that are insertionally polymorphic in wild canids; teal circles show loci for which a provirus allele has been confirmed.Gene density is represented by a heatmap with gene poor regions in blue and gene dense regions in red. https://doi.org/10.1371/journal.pgen.1011083.g001

Fig 2 .
Fig 2. Expression of Fc1(a) env RNAs in canine cell lines.(A) Expression of env assessed by PCR of cDNA synthesized from total RNAs isolated from each cell line and visualized by gel electrophoresis.Reactions are shown for cDNA reactions performed with reverse transcriptase (+RT) and without (-RT) as controls.(B) Quantitative PCR of relative expression of the env calculated as mean fold-difference compared to endogenous control (GAPDH).The relative expression for each tissue sample was analyzed using the 2 -ΔΔCt method and the average shown for each.(C) Strategy for allele detection from isolated genomic DNA.Genotyping for env presence of the Fc1(a) loci confirmed to have an env-present provirus allele was performed via a two-step PCR screening on isolated genomic DNA.Primers were used to amplify occupied (i.e., full-length insertion, solo-LTR) or unoccupied (i.e., empty site) alleles.Dark blue indicates the presence of insertion alleles for loci on autosomes or the chrX pseudoautosomal region (X-PAR) [38]; mid blue indicates heterozygous representation; light blue indicates absence of the insertion only.For insertion presence of sites with variable insertion alleles, a dash indicates a full-length copy is present and a dot is indicative of a solo-LTR.All other provirus sites for which an insertion was detected reflect the presence of the full-length allele.https://doi.org/10.1371/journal.pgen.1011083.g002

Fig 3 .
Fig 3. Fc1(a) env expression in tissues from healthy dogs.Samples were purchased from the biobank Zyagen, Inc., and consisted of a panel of cDNAs obtained from a total of 15 beagle dog tissues.The expression of the Fc1(a) env gene in tissues from healthy beagle dogs was assessed by quantitative PCR.Fold changes were calculated using the 2 -ΔΔCt method.https://doi.org/10.1371/journal.pgen.1011083.g003

Fig 5 .
Fig 5. Fc1(a) env expression in healthy and tumor tissues.Comparison of env expression between healthy samples and diseased samples.Each dot represents the individual sample average fold-difference value corresponding to total healthy (adult healthy tissues as well as DHTHY samples; n = 18) or disease samples (n = 19).For both the healthy and diseased groups, the overall average (black bar) and standard error is shown.The * symbol signifies a statistically significant difference (p>0.02) between sample groups as calculated by an unpaired t-test.https://doi.org/10.1371/journal.pgen.1011083.g005

Fig 6 .
Fig 6.Fc1(a) provirus prevalence among cancer samples.Genotypes were obtained as in Fig 2C from genomic DNA isolated from 17 tissues corresponding to tissues from 14 tumors as well as blood from 3 healthy animals.Dark blue indicates two insertion alleles on autosomes or the X-PAR; mid-blue indicates heterozygous representation for the site on autosomes, *or an insertion allele on chrX outside the X-PAR in males; light blue indicates presence of the empty allele only.For insertion presence of sites with variably detected insertion alleles, a dash indicates a full-length copy and a dot the solo-LTR.All other provirus sites for which an insertion was detected reflect the presence of the fulllength allele.At left, gold indicates absence of the insertion from CanFam3.1; orange indicates presence in CanFam3.1.Insertion coordinates are shown at left and correspond to the leftmost insertion breakpoint in CanFam3.1.Sample names are indicated at the top according to clinical group.Samples are from healthy animals (DHTHY, n = 3) and tumors from canines diagnosed with lymphoma (DYLM, n = 4), leukemias including chronic lymphocytic leukemia (DCLL, n = 1), chronic myelomonocytic leukemia (DCML, n = 3), acute lymphocytic leukemia (DALL, n = 2), and acute myelomonocytic leukemia (DAML, n = 1), melanoma (DMEL, n = 1), osteosarcoma (DOS, n = 1; DTM, n = 1).

Fig 7 .
Fig 7. Insertional polymorphism of Fc1(a) insertions.Genotypes were obtained using the strategy as in Fig 2C for 54 Fc1 (a) loci utilizing genomic DNA isolated from blood, canine tumor tissues, and cell lines A72, D17, DH82, MDCK.Sample names are indicated at the top according to clinical group.Samples were from healthy animals (DHTHY, n = 3) and tissues from canines diagnosed with lymphoma (DYLM, n = 8), chronic lymphocytic leukemia (DCLL, n = 3), chronic myelomonocytic leukemia (DCML, n = 3), acute lymphocytic leukemia (DALL, n = 3), acute myelomonocytic leukemia (DAML, n = 1), hemangiosarcoma (DHSA, n = 3), melanoma (DMEL, n = 2), osteosarcoma (DTM, n = 1; DOS, n = 2).At left, gold indicates 'non-reference', or absence of the insertion from CanFam3.1; orange indicates 'reference' insertion, or presence in CanFam3.1.Dark blue indicates insertion alleles on autosomes or the X-PAR; mid blue indicates heterozygous representation for the site on autosomes, *or an insertion at a chrX locus outside the X-PAR in males; light blue indicates absence of the insertion.For insertion presence of sites with variably detected insertion alleles, a dash indicates a full-length copy and a dot the solo-LTR.All other provirus sites for which an insertion was detected reflect the presence of the fulllength allele.Genotypes are shown for sites shown to have a full-length copy of the insertion allele (indicated by green