Aging-associated patterns in the expression of human endogenous retroviruses

Human endogenous retroviruses (HERV) are relics of ancient retroviral infections in our genome. Most of them have lost their coding capacity, but proviral RNA or protein have been observed in several disease states (e.g. in inflammatory and autoimmune diseases and malignancies). However, their clinical significance as well as their mechanisms of action have still remained elusive. As human aging is associated with several biological characteristics of these diseases, we now analyzed the aging-associated expression of the individual proviruses of two HERV families, HERV-K (91 proviruses) and HERV-W (213 proviruses) using genome-wide RNA-sequencing (RNA-seq). RNA was purified from blood cells derived from healthy young individuals (n = 7) and from nonagenarians (n = 7). The data indicated that in the case of HERV-K (HML-2) 33 proviruses had a detectable expression but in only 3 of those the expression levels were significantly different between the young and old individuals. In the HERV-W family expression was observed in 45 loci and only in one case the young/old difference was significant. However, applying hierarchical clustering on the HERV expression data resulted in the formation of two distinct clusters, one containing the young individuals and another the nonagenarians. This suggests, that even though the aging-associated differences in the expression levels of individual proviruses are minor, there seems to be some underlying aging-related pattern. These data indicate that aging does not have a strong effect on the expression of individual HERV proviruses, but instead several proviruses are affected moderately, leading to age-dependent expression profiles.


Introduction
During mammalian evolution, integration of retroviral RNA into a germ line cell may have led into a formation of a provirus that is transmitted vertically and inherited in a Mendelian manner. In humans, these endogenous retroviruses (HERV) comprise ca. 8% of our genome [1]. While it is known that some retroelements of the human genome are still capable of a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 retrotransposition, DNA sequences of the HERVs have accumulated mutations to the point where retrotransposition or formation of viral particles is not taking place anymore [2]. Despite this mutation-driven functional inactivation, there are hundreds of publications demonstrating associations between HERV expression and various disease states (malignancies, infections, neurological and autoimmune diseases), however, the causal relationship has remained enigmatic [3][4][5][6].
Since the mechanism of action cannot be explained by de novo insertional mutagenesis nor with the formation of viral particles, it has been proposed, that potential pathogenicity of the HERVs could simply underlie in the presence of proviral DNA, acting as a transcriptional regulatory sequence, modifying the expression of neighboring and even more distant genes. HERVs can do this for example by acting as transcription factor binding sites. From this hypothesis it naturally follows, that potential effects of the HERVs would be restricted in some genomic window around the primary proviral insertion site. However, there is also evidence supporting more global mode of action as HERVs have been shown to activate immune and inflammatory responses of the body directly. For example, their RNA could be recognized as a pathogen-associated molecular pattern (PAMP) by Toll-like receptors and this would induce type I interferon production contributing to the pathogenesis of autoinflammatory diseases [7]. Some HERVs are still able to encode an intact envelope protein (Env) and its presence has been observed in some viral infections or in autoimmune diseases [3][4][5][6]. It has been proposed that the mechanism of action of Env is based on the antigenicity of the molecule, possibly causing a polyclonal activation of lymphocytes, i.e. functioning as a "superantigen" [8].
As the diseases, where HERV-associations have been observed, demonstrate some of the fundamental and characteristic aspects of aging, e.g. increased level of inflammation and changes in the proportions of the various lymphocyte subsets [9,10], we now quantitated the RNA levels of all previously characterized proviruses of HERV-K (HML-2) and HERV-W families in peripheral blood mononuclear cells (PBMC) derived from young and 90-year old individuals. Aging-associated increase in the expression of several HERV families has been reported previously using quantitative PCR [11]. However, qPCR approach utilizes degenerate primers for each HERV family, thus missing the information regarding individual proviruses. RNA-sequencing possesses the capability to obtain this crucial data and hence it was the method of choice.
The most recent entrants to our genome are represented by HERV-K (HML-2) family (ca. 0.2-2 million years ago), of which Subramanian et al. have identified 91 full-length proviral sequences [12]. HERV-W represents an older group of HERVs (primary infection ca. 40 million years ago) and it contains 213 full-length or near full-length elements [13].

Study populations
Two populations, representing young and elderly individuals, were used. The young ones consisted of healthy laboratory personnel, all female, aged 26 to 32 years (n = 7, median age 28) who did not have any medically diagnosed chronic illnesses, were non-smokers and had not had any infections or received any vaccinations within the two weeks prior to blood sample collection. The elderly individuals (n = 7) were selected among relatively healthy, community living, non-frail, nonagenarian females, without any severe aging-associated diseases, that were participants in The Vitality 90+ study. The nonagenarians were born in 1920 and the samples were collected in 2014. The recruitment and characterization of participants were performed as has been reported previously [14]. The study participants provided their written informed consent. This study was conducted according to the principles expressed in the declaration of Helsinki, and the study protocol was approved by the ethics committee of the city of Tampere (1592/403/1996).

Sample collection
Blood samples were collected by a trained laboratory technician in the laboratory facilities. All blood samples were drawn between 8 am and 12 am and collected into EDTA containing tubes. Samples were directly subjected to leucocyte separation on a Ficoll-Paque density gradient (Ficoll-Paque Premium, cat. no. 17-5442-03, GE Healthcare Bio-Sciences AB, Uppsala, Sweden). The PBMC layer was collected and cells used for RNA extraction were suspended in 150 μl of RNAlater solution (Ambion Inc., Austin, TX, USA). Nonagenarian and control samples were collected at the same time.

RNA extraction
RNA used for RNA sequencing was purified using a miRNeasy mini kit (Qiagen, CA, USA) and the RNA used for PCR analysis using RNeasy mini kit (Qiagen, CA, USA) according to manufacturer's protocol with on-column DNA digestion (Qiagen). The concentration and quality of the RNA was assessed with a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA).

RNA sequencing
Agilent Bioanalyzer RNA nano chips (Agilent) were used to evaluate the integrity of total RNA and Qubit RNA-kit (Life Technologies) to quantitate RNA in samples. 1 μg of total RNA was used for ScriptSeq Complete Gold System (Epicentre) to ribodeplete rRNA and further for RNA-seq library preparation. SPRI beads (Agencourt AMPure XP, Beckman Coulter) were used for purification of RNAseq libraries. The library QC was evaluated on High Sensitivity chips by Agilent Bioanalyzer (Agilent). Paired-end sequencing of RNAseq libraries was done using Illumina HiSeq technology with a minimum of 60 million 2x100bp paired-end reads per sample.

Data preprocessing and analysis
Raw reads were aligned to human genome reference build hg19 using TopHat v2.0.13 [15] with the default parameters. Only uniquely mapped reads were considered in the transcript abundance estimation and to this end SAMtools [16] was used to filter out reads mapping to multiple regions of the genome. The downstream analyses were all conducted using the tools in cufflinks2 v. 2.2.1 [17,18]. The raw expression estimates were calculated using cuffquant and the expression were normalized using cuffnorm, which gives the normalized read counts and the fragments per kilobase per million values (FPKM) for each gene as an output. The geometric normalization method was used which scales the read counts as well as the FPKM values according to procedure described in [19].
The annotation data for HERV-K (HML-2) was from Subramanian et al. [12] and that for HERV-W from Grandi et al [13]. To ensure the robustness of the normalization the expressions of HERV elements were quantified and normalized together with ENSEMBL v. 82 gene reference set [20,21]. For each individual study subject, a given HERV element was considered significantly expressed when the individual expression level exceeded normalized read count of 16 [22].

Cluster analysis
Hierarchical clustering of the samples based on normalized read counts was done separately for both HERV-K (HML-2) and HERV-W. Spearman correlation was used as the distance metric, which is robust against outliers and non-Gaussian distributions, and can capture nonlinear relationships [23,24]. Ward's minimum increase of sum-of-squares was used as the linkage method, which has been reported to perform better with gene expression data than the more traditional methods of average and complete linkage [23]. Multistep-multiscale bootstrap resampling was done to evaluate the uncertainty involved in the clustering [25]. Thousands of samples of varying sizes are randomly created from the data and then clustered. An approximately unbiased (AU) p-value is obtained, which indicates the bias corrected percentage of dendrogram variants where the specific cluster was observed.

Results
The results of the RNA-seq analysis indicated that 33 HERV-K (HML-2) loci out of 91 had a detectable expression, but often at low level and not in all individuals. The expression levels of PBMCs derived from young and elderly individuals were generally similar. Only at three loci (1q22, 10p14 and 12q24.33) the difference was statistically significant as shown in Table 1.
In the case of HERV-W, the results were similar, in 45 proviruses out of 213 the read count was >16 at least in one individual, and in the case of Xp11.21 the difference in expression levels between the young and old was significant as shown in Table 2.
Hierarchical clustering of the samples based on normalized provirus read counts was done to investigate expression patterns. Clustering of samples based on HERV-K (HML-2) expression resulted in two groups separated along the age group lines (Fig 1A). There were two deviations from this, with one nonagenarian in the predominantly young sample cluster and one young sample in the nonagenarian cluster. Heatmaps of the clustering of the HERV-K (HML-2) and HERV-W provirus expression levels are shown in Figs 2 and 3, respectively.
Bootstrap resampling of the clustering was done to quantify the certainty of the clustering. Both clusters have an approximately unbiased (AU) p-value of 97, which is the bias corrected percentage of resampling dendrogram variants where the specific cluster was observed. AU pvalue of 97 is equivalent to a p-value of 0.03, indicating statistical significance. The same significant clusters resulted even if the significantly differentially expressed 1q22, 10p14 and 12q24.33 were excluded from clustering.
HERV-W expression based clustering of samples resulted in one statistically significant cluster (p-value of 0.04), which contains the same six nonagenarian samples that are grouped together in the HERV-K (HML-2) based clustering (Fig 1B).

Conclusions
The RNA levels of individual proviruses varied considerably between samples. It was not the case that some individuals would have been more active producers than others, but instead different proviruses seemed to be expressing non-systematically within and between individuals. A total of eight HERV-K proviruses and nine HERV-W proviruses were found to be expressed in all 14 samples and consequently these proviruses were expressed with highest RNA levels. This suggests that some individual proviruses could be less restricted in terms of their expression potential, that is brought by the regulation machinery of the cell. Several proviruses were expressed only in small part of individuals and it is tempting to think that these could be the ones behind potential adverse effects, especially if they would be mainly expressed in nonagenarians, as they are probably silenced for a reason. There were no proviruses that were expressed exclusively in nonagenarians, but for example HERV-K 8p23.1a was expressed in 6 nonagenarians and only in 1 young individual. Furthermore, only 4 aging-associated differentially expressed proviruses were identified (in HERV-K (HML-2) 1q22 and 10p14 having a higher and 12q24.33 a lower expression in the elderly and in HERV-W Xp11.21 a lower expression). Putting all this together, it seems to be the case, that aging has only moderate effect on the expression levels of individual proviruses.
However, the hierarchical clustering of the expression data indicated that the expression profiles of the young and elderly subpopulations were different. The simplest way to achieve this kind of difference would be if, for example, all the proviruses were expressed systematically Table 1

. Median expression levels (normalized read counts) of HERV-K (HML-2) proviruses.
Proviruses were deemed expressed if exhibiting a read count of 16 or more [22]. Known aliases are derived from Subramanian et al. [12].  slightly up or down in one of the groups. This kind of behavior could be attributed to some kind of common regulator that has only one simple mode of action. However, this was not the case, as different proviruses were up-and downregulated equally in the nonagenarians. This requires more complex regulation and is possibly reflecting multilayered epigenetic regulation machinery involving, among other things, DNA methylation and histone modifications, and inducing distinguishable aging-associated expression profile. Due to Spearman correlation based distance metric in the clustering, each provirus has identical weight in the clustering  Aging-associated patterns in the expression of human endogenous retroviruses result, regardless of level of expression. Therefore this result would indicate that there are differences between the age groups that are revealed when the proviral expression profiles are examined as a whole. The underlying cause behind observed expression profile difference thus has to affect the expression of many different proviruses. Understanding what causes this difference could increase knowledge of HERV expression associated disease states and of agerelated decline. Since the same nonagenarian samples are clustered by both HERV-K (HML-2) and HERV-W expression, this phenomenon may not be limited to these families, and could be present in other HERV families as well. It is noteworthy, that our analysis is only limited to HERV-K (HML-2) and HERV-W families. Previous studies have indicated that upregulation of some other HERV subclasses might also have implications in tumor immunity [26][27][28]. Therefore, it is possible that these HERVs could contribute to aging more than HERV-K (HML-2) and HERV-W. This remains to be explored in future studies. There is a general agreement that the expression of HERVs should be under a strict control, i.e. allowing their expression in the germ line but silencing in most somatic cells, where their activity could disrupt normal gene expression or transcript processing. Several of these control mechanisms have been characterized in detail [29,30]. As human aging is associated with dramatic epigenetic changes, e.g. DNA methylation [31], it is maybe surprising that expression levels between the young and old individuals were not strikingly different. However, it is possible that this epigenetic regulation is responsible for the observed differences and the expression profiles would be due to differential sensitivity of the individual proviruses to these aging-associated epigenetic changes.

HERV-K locus Aliases Median expression level (normalized read count) in nonagenarians/young controls
The general expression profile of HERV-K (HML-2) in the resting blood cells used here, was dominated by a few loci, i.e. 3q12. 3, 19q13.12b and 1q22., resembling the situation in in vitro pre-activated lymphocytes [32], suggesting that the proliferative state of the cells has probably only a minor effect. This far, no similar data in the case of HERV-W is available.
In conclusion, transcriptional regulation of the proviruses belonging to HERV-K (HML-2) and HERV-W families appears to be two-dimensional in the PBMCs; a subset of HERVs are expressed constantly in age-independent manner having only slight aging-associated differences in the expression levels. These differences might be explained by a fine-tuning of transcriptional regulation that is brought by DNA methylation and is known to be heavily altered in aging. On the other hand, proviruses in another subset of HERVs were characterized by total lack of expression in some individuals. This could be the result of some more drastic mode of regulation such as that of H3K4me3, that is also known to be altered in aging [33]. Aging-dependent HERV profile found with clustering might reflect this aspect of regulations and it is also possible that adverse effects of HERVs are driven by those proviruses that undergo more radical transcriptional relaxation or restriction, that is not necessarily seen in the median RNA levels (for example HERV-K 8p23.1a in Fig 2 and HERV-W 11q14.2 in Fig 3). This finding might have some practical consequences. In the clinical studies demonstrating associations with HERV-expression the expression of only one or a few proviruses have been used as the indicator. In these studies, analysis of the whole HERV profile would help in finding the true pathogenic provirus. Small number of samples is a limitation of this study, and more comprehensive studies with bigger sample populations are needed for confident evaluation of the RNA levels.