Larger Mammalian Body Size Leads to Lower Retroviral Activity

Retroviruses have been infecting mammals for at least 100 million years, leaving descendants in host genomes known as endogenous retroviruses (ERVs). The abundance of ERVs is partly determined by their mode of replication, but it has also been suggested that host life history traits could enhance or suppress their activity. We show that larger bodied species have lower levels of ERV activity by reconstructing the rate of ERV integration across 38 mammalian species. Body size explains 37% of the variance in ERV integration rate over the last 10 million years, controlling for the effect of confounding due to other life history traits. Furthermore, 68% of the variance in the mean age of ERVs per genome can also be explained by body size. These results indicate that body size limits the number of recently replicating ERVs due to their detrimental effects on their host. To comprehend the possible mechanistic links between body size and ERV integration we built a mathematical model, which shows that ERV abundance is favored by lower body size and higher horizontal transmission rates. We argue that because retroviral integration is tumorigenic, the negative correlation between body size and ERV numbers results from the necessity to reduce the risk of cancer, under the assumption that this risk scales positively with body size. Our model also fits the empirical observation that the lifetime risk of cancer is relatively invariant among mammals regardless of their body size, known as Peto's paradox, and indicates that larger bodied mammals may have evolved mechanisms to limit ERV activity.


Introduction
Mammalian genomes contain large numbers of endogenous retroviruses (ERVs), derived from multiple independent germline invasions over evolutionary time. The human genome contains 31-40 such ERV invasions, termed 'families', each derived from a distinct ancestral exogenous retrovirus [1,2]. These ERVs can continue proliferating after the initial germline invasion until they are inactivated, either through the acquisition of substitutions that occur at the host background level (,10 23 per base per my) or by recombinational deletion [3,4]. Most ERV families proliferate by reinfection, although some ERVs occasionally switch from reinfecting germline cells to an entirely intracellular life, and this switch can lead to an increase in the size of the ERV family [5]. As a result of these processes, ERVs have come to occupy ,5-10% of their hosts' genomes [6,7].
The fixation of a new ERV insertion is influenced by its fitness consequences to the host, and other population genetic parameters [8]; for example a neutral ERV could fix by drift, and a slightly deleterious insertion may hitchhike or fix during a population bottleneck [9]. A small number of ERVs have been exapted and have beneficial functions in their host [10][11][12], but the integration of retroviruses into or near host genes can have highly deleterious effects, as the consequent disruption or alteration of gene expression can lead to malignant transformation [13]. Furthermore, illegitimate recombination between ERVs at different loci can also have deleterious effects, as can the expression of viral proteins. The uncontrolled proliferation of ERVs would therefore be extremely detrimental to their host [14], and this process must be limited either by cessation of replication activity, or by host mediated suppression [15].Vertebrate genomes have evolved a range of responses that act at various stages of the viral life cycle to limit retroviral replication and its associated tumorigenic potential [16,17].
The diversity and activity of ERVs across mammalian genomes has not been systematically assessed, and it remains unclear what factors have determined ERV abundance in their hosts. Mice and humans, the first two mammalian species to have their genomes sequenced, show strikingly different patterns of ERV activity -most human endogenous retroviruses are inactive, with a striking deceleration in activity over the last 25 million years [7]. In contrast, the mouse genome shows no sign of deceleration in ERV activity and a large number of murine ERVs are active and unfixed in the mouse population [6]. This difference is also reflected in the proportion of catalogued mutant alleles that are due to ERV insertions; ,10% of mutant alleles can be attributed to ERVs in mice, whereas no such alleles can be attributed to ERVs in humans [18]. It has been suggested that the markedly different ERV activity in human and mouse genomes can be explained by systematic factors in the biology of these hosts [14]. We explore the hypothesis that differences in ERV activity across mammals are determined by differences in host life history, with smaller bodied animals expected to have higher levels of ERV activity. We compare body size with ERV numbers using data from a diverse set of 38 mammals in a multivariate analysis, controlling for confounding variables such as life history traits. We also explore the effect of body size and horizontal transmission on ERV dynamics through a mathematical model. Finally, we discuss our associations of body size and ERV replication in the light of evolutionary theory and cancer biology.

Results
Body size has a negative correlation with ERV abundance across mammals By analysing 38 mammalian genomes over approximately the last 10 my period we find a negative relationship between the number of integrated ERVs and body size (Figure 1b, Figure 2a). The correlation is robust if instead of present day body size we use the reconstructed body size at 5 million years ago (P = 0.0069, R 2 = 0.31), and remains significant if we use a single substitution rate for all mammals (P = 0.01, R 2 = 0.25). The correlation is dependent on the age of the integration in the genome and is no longer significant when we consider ERVs that are older than 10 my (Figure 1c, Figure 1d). If we exclude ERVs that belong to our previously defined megafamilies [5] with mean divergence ,10%, namely the IAP family from Cavia Porcelus, the IAP family from Dipodomys Ordii, a Class I family from Felis catus, and the IAP and ERV-L families from Mus Musculus, the correlation remains (P = 0.0042, R 2 = 0.30). If we split ERVs into their traditional classes, the correlation is significant only for the class II ERVs (Figure 3). There is no data to suggest systematic differences in the biology of retroviruses from different classes, given that the majority of ERVs are derived from extinct retroviral lineages. The three classes differ in their age distribution; class II ERVs are much younger (Figure 4), with the majority of insertions falling within the 10 my era that we use to define the young age category. Thus the observed relationship between body size and Class II ERVs is likely due to their recent replication and not some other difference in their biology such as higher pathogenicity.

The correlation of body size with ERV abundance is not confounded by another life history trait
Since life history traits are correlated with each other, it is possible that the apparent and inferred correlation of ERVs with body size could be confounded by another trait such as reproductive output (for which gestation period is a proxy) and timing (age at sexual maturity) [19]. The number of mates or the type of placenta might also influence ERV abundance via an increased risk of horizontal or vertical retroviral transmission, respectively. To clarify if number of sexual partners has played a role in determining the number of ERVs per genome, we use testis size as a proxy as it is known to correlate with the number of mates and the strength of sexual selection in mammals [20,21]. To assess the effect of the placental type we modeled placental invasiveness as a semiquantitative parameter (i.e. marsupials = 1, epitheliochorial = 2, endotheliochorial = 3 and hemochorial = 4) [22]. We evaluated the correlation between ERV integration rate and potential confounders with multivariate models and standard stepwise forward model selection. We included in turn the following confounding variables; time to sexual maturity, gestation period, life span, testis size and placental invasiveness. Body size remained as the only significant variable confirming that it is the only significant predictor of ERV integration rate over the last 10 my ( Table 1). The models remain significant when we account for phylogenetic non-independence [23], reconstruct ancestral mass and/or incorporate a body mass dependent substitution rate. Thus, unlike substitution rate [24,25], ERV integration rate is not a result of shorter generation time. We do not find a significant correlation with testis size, either as an additional predictor variable (P = 0.3) with body size included in the model, or as an interaction term with body size (P = 0.2). Thus, the number of mates does not appear to have played a significant role on the number of ERVs per genome.
Another possible confounder is the effective population size of the host [26]: species with higher effective population sizes are expected to be more efficient at purging slightly deleterious mutations such as those incurred by ERV proliferation [27]. As a result, since larger bodied animals have smaller effective population sizes [19,28], we would expect them to have more, not fewer ERVs. Thus, confounding due to effective population size would lead to a correlation in the opposite direction to what we observe, indicating that the observed correlation between body size and ERV numbers is robust against variations in effective population size.

Relationship between ERV abundance and body size can be explained by a mathematical model of retrovirus-host dynamics
To explore the possible mechanistic links between body size, integration rate and transmission route, we designed a mathematical model ( Figure S1). We constructed a compartmental mathematical model using a system of ordinary differential equations to describe the epidemiological dynamics of exogenous and endogenous retroviral infections. There are two broad classes of individuals that need to be considered, the susceptible population S ð Þ and the infected population. In order to gain a more detailed picture of the latter compartment, namely to elucidate the interconnected roles between exogenous and endogenous retroviral infections, we further distinguish three infected sub-populations: individuals infected with an exogenous retrovirus I RV ð Þ, those infected with a single integrated copy of the retrovirus through the

Author Summary
Retroviruses have been invading mammalian genomes for over 100 million years, leaving traces known as endogenous retroviruses (ERVs). Early genome sequencing studies revealed a marked difference in the activity of retroviruses among species, with humans largely containing inactive lineages of ERVs, while the mouse contains numerous lineages of active ERVs. We explore the hypothesis that life history traits determine the activity of ERVs in mammalian genomes, and show that larger mammals have fewer ERV copies over recent evolutionary time (the last 10 million years) compared to smaller mammals. This association is determined by body size independently of any confounding variables. We build a mathematical model that shows that ERV abundance in genomes decreases with larger body size and increases with horizontal transmission. Retroviral integration can cause cancer, and our analysis suggests that larger bodied animals control ERV replication in order to postpone cancer until a post-reproductive age. This is in line with a long-standing observation that cancer rates do not fluctuate among mammals of different body size, a phenomenon known as Peto's paradox, and opens up the possibility that larger animals have evolved mechanisms to limit ERV activity.
process of endogenisation I ERV ð Þ, and lastly, those infected with an endogenous retrovirus that has undergone amplification I AERV À Á . The overall level of retroviral activity is directly related to the copy number of endogenised retroviruses in the infected population. Since the vast majority of endogenous retrovirus present in the host population persists in the pool of AERV-infected individuals, the level of retroviral activity can be represented by the magnitude or size of this compartment. We first explore the roles of three key factors: body size B ð Þ , the rate of retroviral endogenisation s ð Þ which governs vertical transmission of the retrovirus, and the force of infection l ð Þ which determines the rate of horizontal transmission of the retrovirus. As shown in Figure S2 increased body size results in a lower number of individuals harbouring amplified endogenous retrovirus when this system reaches equilibrium, ( Figure S2, upper plot), while the rate of retroviral endogenisation s ð Þ and the force of infection l ð Þ display the opposite relationship ( Figure S2, lower plot).
Our model demonstrates that horizontal and vertical transmission are both crucial for the eventual endogenisation and amplification of the retrovirus. If there is no horizontal transmission (i.e. We explored the impact of body size on the structure of the population at equilibrium in relation to the extent of retroviral endogenisation s and the force of infection l ( Figure 5) in more detail. The results in Figure 5 illustrate that higher rates of horizontal transmission, represented by the force of infection l ð Þ, lead to a higher proportion of AERV infections for a given body size, and furthermore, highlight our finding that larger body size B ð Þ is associated with a lower extent of AERV activity, with all other parameters fixed ( Figure S2, lower plot). Furthermore, we observe from Figure 5 that for sufficiently high rates of horizontal transmission, the proportion of AERV infections plateaus with respect to increasing body size B ð Þ. In this case, larger body size is associated with a greater proportion of exogenous infections, as new (exogenous) infections would arise through horizontal transmissions at a faster rate than endogenisation via vertical transmissions. To explore the possibility that the number of horizontal transmissions confound the number of elements per genome, we tested if the number of families per genome is correlated with body size, and find no significant correlation (P = 0.15, R 2 = 0.08).   (Table 2). We find that the number of ERV integrations in mammals is negatively correlated to body size. This correlation can explain 37% of the variance in the number of ERV integrations over the past 10 my. We have controlled for confounding variables such as life history and sexual selection, and also confirmed robustness to variation in effective population size. Nevertheless body size can be influenced by other parameters, and it is possible that other factors (e.g. environmental, dietary) contribute to both body size and ERV abundance, thereby explaining part of the remaining variance; for example they might account for the residual variance of outliers (e.g. Dasypus Novemcinctus and Canis familiaris). Interestingly, Microcebus murinus, whose life history evolved rapidly due to its isolation in Madagascar [29], might be expected to be a significant outlier in the correlation, but is very close to the regression line. Perhaps the global distribution and geographic isolation of a species is another determinant of the variance in ERV abundance.
We also see that 68% of the variance observed in the mean age of ERV integrations in a genome (a proxy for recent replication) is explained by body size (Figure 1a), with the number of young (i.e. recently replicating) insertions correlating inversely with body size (Figure 1b   These recently active ERVs may retain some level of virulence, and therefore still have the potential for malignant transformation [13]. In line with this prediction of intermediate virulence, reconstructed ERVs [32] or recently established present day ERVs [33,34] have low but detectable viral loads. The presence of pathogenic ERVs in a genome after such a long period of time may appear surprising. It could however be explained by analogy to models of the transmissibility of pathogens within the context of host-parasite co-evolutionary dynamics [35,36]. Such models incorporate the effects of both transmissibility and virulence on the reproductive success of a parasite, and show that they do not necessarily evolve to be harmless; in some empirical datasets reproductive success is maximised  at intermediate levels of both of these parameters [35]. In other words a pathogen can continue to be virulent despite selection imposed by the host for a more benign infection. We have modelled the spread of retroviruses among hosts and within genomes, distinguishing between exogenous and endogenous retroviruses, and taking into account vertical and horizontal modes of transmission. A key aspect of our model is the assumption that the deleterious effects of a retrovirus in a genome scale with body size. The model shows that as body size increases, the proportion of individuals in the host population that carry ERVs drops ( figure S2). Elevated rates of either endogenisation or horizontal transmission lead to higher ERV abundance and accelerate the rate at which ERV abundance increases with body size. For any given rate of horizontal transmission however the overall relationship between body size and ERV abundance is maintained ( figure 5). In our model, the body size-associated pathogenic effect of an ERV in a genome is equivalent, whether it has been generated by vertical or horizontal transmission. Horizontal transmission of an ERV would require somatic expression and replication of the virus in order to propagate effectively, which may in turn increase the mortality of the host via a direct result of retroviral infection. Experimental evidence suggests that infections with replication competent retroviruses are more pathogenic when retroviral replication is high (e.g. HIV, or the recently endogenised Koala retrovirus [34,37]). One way in which the pathogenicity of an ERV can be reduced while its replicative capacity is maintained is through epigenetic regulation in somatic cells. During genomic reprogramming of the germline, transposable elements are expressed and can replicate before being silenced [38,39], resulting in lower levels of expression in somatic tissues and hence lower transmissibility. Thus, low levels of replication in somatic cells may be favorable for an ERV, enabling it to maximize its own success via vertical transmission while minimizing harm to the host. The association between ERV abundance and body size indicates that somatic replication cannot be completely suppressed and that the pathogenetic effects of ERVs cannot be dissociated from their copy number.
On a macroevolutionary timescale, ERV copy number will be determined both by the number of cross species transmissions and the subsequent proliferation of ERV families. The number of families per genome is orders of magnitude lower than the number of ERVs (mean number of families = 23, mean number of elements = 1073), and most ERVs within a genome come from a small number of families, the so-called superspreaders (or megafamilies) [5]. In line with this uneven distribution of ERVs among families, we do not see a correlation between the number of families within a genome and body size (P = NS, R 2 = 0.08). Furthermore, ERVs that belong to megafamilies lack the env genome that is required for horizontal transmission, highlighting the importance of vertical transmission in determining ERV abundance, despite the ability of ERVs to cross species on timescales spanning millions of years.
Crucially, according to our model the selective cost of an ERV is determined by the body size of the host. Larger bodied animals would be expected to have a higher lifetime risk of cancer as a consequence of having both more dividing cells and longer lifespans. No such association is observed in nature, with relatively invariable risks of cancer in animals with differing body sizes, a phenomenon known as Peto's Paradox [40,41]. Under our model, the risk of retrovirally induced cancer also scales similarly with body size. The observed negative correlation between body size and ERV integration rate suggests that larger mammals attain a lower ERV virulence cost per body size unit by reducing the number of ERVs in their genome. This should therefore enable them to postpone the onset of cancer until after their reproductive age. Table 2. Cont. Our results indicate that larger animals exert greater control over ERV proliferation. This could be due to the evolution of mechanisms capable of limiting retroviral activity and consequently limiting the incorporation of ERVs in the genome. Such mechanisms could involve the enhancement of innate or adaptive responses to retroviruses [16,17], or perhaps epigenetic regulation [42] is more potent in larger mammals. An intriguing alternative is that the effect is indirect via an improved immune surveillance -some genes involved in pattern recognition for defence against pathogens such as viruses are also involved in controlling cancers [43]. Antiviral genes are the result of a continuous and ancient arms race between viruses and their hosts [15], and elucidating their roles in controlling cancer across animals of different body size could provide insights into cancer susceptibility.

ERV mining and dating of insertions
Our mining of the 38 mammal genomes has been described previously [5,44,45]. We estimated age based on the divergence from the most similar other ERV insertion in the same genome (''nearest neighbour''). We favour this approach over cruder metrics that are based on divergence from a consensus sequence, as it takes into account the phylogeny of the ERVs, and over approaches based on divergence between paired LTRs due to the variable quality of the genomes being analysed, most of which do not contain contigs that are long enough to include complete proviral elements. We first calculated nucleotide divergence from the most similar other ERV insertion in the same genome as described in Magiorkinis et al. [5], and then converted this to an integration date assuming a mean nucleotide substitution rate at neutral nuclear protein coding sites in mammals of 2.2610 29 per site per year [46], and corrected for multiple hits using the Jukes-Cantor model. To calculate the average age of ERVs in each genome we took into account the known effect of body size on substitution rate by using a regression of rate against mass with slope of 20.09, i.e. log(adjusted rate) = 0.096 (log(mean mass)2log(mass))+log(unadjusted rate) [47]. We also repeat the correlation between body size and ERV number with a single substitution rate for all mammals.

Incorporating ancestral body mass
Using the data above we reconstructed ancestral body masses assuming a Brownian motion model of trait evolution as implemented in the package GEIGER in the R language [48]. This program returns the estimated body mass at nodes in the tree, and from these we calculated values at the mid-points of our time intervals (averaging where necessary). We then manually pruned our trees to this point and repeated the regression between number of ERVs and body mass for each time interval, taking the phylogeny into account (Table 2). Our regressions were performed with both present day body size and the reconstructed body size at the mid-point of our time intervals (e.g. body size at 5 million years ago for regression against activity during the last 10 million years).

Multivariate analysis
Life history traits correlate with each other; for example larger bodied animals tend to live longer and have smaller effective population size [19,28]. Therefore body size could in principle be a surrogate measure of a different life history trait, as has been previously shown for substitution rate [24]. Mammalian life history data was taken from [49] and the phylogenetic tree from [50]. We collected the testis size for 24 out of 38 species in our study (Table  S1). We used the Generalized Least Squares (GLS) approach as implemented by the Analysis of Phylogenetics and Evolution (APE) package [51] in R. We used standard model selection to identify significant confounders of ERV numbers per genome (Table 1).

A mathematical model of ERV persistence and evolution
Model (1) captures the fundamental dynamics of retroviral infections including the processes of retroviral endogenisation and amplification. The key interactions of the model are illustrated schematically in Figure S1.  In model (1), we consider both vertical and horizontal routes of transmission. We also distinguish between exogenous and endogenous retroviral infections. Whereas horizontal transmission can only lead to infection with an exogenous retrovirus (i.e. RV compartment), vertical transmission can result in retroviral endogenisation (i.e. ERV compartment) and subsequent amplification (i.e. AERV compartment).
There are two ways in which new (exogenous) retroviral infections may arise horizontally in an initially susceptible individual, either through contact with an individual infected with an exogenous retrovirus (i.e. RV compartment), or alternatively via exposure to an individual infected with an amplified endogenous retrovirus (i.e. AERV compartment). We assume that individuals infected with only a single integrated copy of the retrovirus (i.e. ERV compartment) are unable to transmit the infection horizontally between hosts. The force of infection l is composed of two terms, l~l 1 zl 2 , and thus reflects the dual modes of horizontal transmission. There are various different functional forms for the force of infection, and we choose the commonly used form l~b 1 I RV =Nzb 2 I AERV =N, where b 1 and b 2 are the respective coefficients of infectious transmissibility for RV-infected and AERV-infected individuals, and N is the total population which is assumed to be constant.
A small proportion s, where 0ƒsƒ1, of births from individuals who are infected by an exogenous retrovirus acquire an integrated endogenous copy of the retrovirus, thereby entering the ERV compartment. Meanwhile, individuals infected with an integrated endogenous retrovirus (in the ERV compartment) undergo retroviral amplification at a rate a(B), which is dependent on body size (B). A consequence of retroviral amplification is a greater number of endogenous retroviruses, therefore the size of the compartment of individuals harbouring amplified, endogenised retroviruses is an indirect measure of the overall extent of retroviral activity. Births arising from infected individuals with amplified, endogenous retrovirus (i.e. AERV compartment) themselves harbour amplified, endogenous retroviruses.
To investigate the system without unnecessarily over-complicating the dynamical behaviour of the model, we consider a population that is maintained at a fixed size (Nw0) so that (SzI RV zI ERV z I AERV )~N. The pool of susceptible individuals is maintained by the birth of new susceptible individuals and is encapsulated in the term b in the S equation, which includes new births of susceptibles from all other compartments as well as a term to balance the in-and out-flux of individuals in the system and ensure that the total population remains constant. We assume that background birth and death rates in each compartment are equal at a constant rate m. Additional mortality due to the detrimental effects of amplified, endogenous retroviral infection, such as the development of cancer, is reflected in the parameter m AERV (B), which depends on body size (B). Excess mortality as a consequence of cancer is fed back into the susceptible pool so that, therefore the birth of susceptible individuals can be encapsulated by the term b, where b~m N{sI RV {I AERV Â Ã zm AERV (B)I AERV The above discussion highlights an important trade-off between retroviral amplification a(B), which is beneficial to the long-term persistence of the retrovirus, and increased mortality m(B) in excess of background death rates as a consequence of the detrimental effects associated with increased retroviral activity. These two factors both depend on body size (B), but in opposing ways. Whereas larger body size means increased retroviral amplification, it also results in greater mortality so that both a(B) and m AERV (B) are increasing functions of B. We therefore investigate the role of body size (B) on the outcome of infection. Several additional parameters of significance are the force of infection l as well as the rate of retroviral endogenisation s and how varying body size can influence the dynamical behaviour of the infection according to model (1). For the former, we explore how body size can affect the system when differences between the force of infection (l) of individuals infected with exogenous retrovirus (i.e. the I RV compartment) versus those carrying amplified, endogenous retrovirus (i.e. the I AERV compartment) are taken into account. In terms of the latter, it is expected that a higher rate of endogenisation would result in a greater proportion of individuals with integrated endogenous retroviruses, and we are interested in determining the role of body size with respect to differences in endogenisation rates. Because we have assumed that the total population remains constant, it is sensible to investigate the dynamics of the model with respect to proportions of the total population rather than in terms of the sizes of each compartment. Figure S1 A schematic diagram of the model representing the interactions among four distinct subpopulations: susceptibles (S ), infected with (exogenous) retrovirus (I RV ), infected with integrated (endogenous) retrovirus (I ERV ), and infected with amplified integrated (endogenous) retrovirus (I AERV ). (EPS) Figure S2 The results of model (1) show that the proportion of the population infected with amplified, endogenous retrovirus (i.e. the AERV -compartment) is associated with a larger body size (B), and lower rates of endogenisation (s) and force of infection (l). The model also predicts that a higher rate of retroviral endogenisation (s) and a greater force of infection (l) are both linked to a shorter time to reach the endemic steady state. (EPS)