Association and Host Selectivity in Multi-Host Pathogens

The distribution of multi-host pathogens over their host range conditions their population dynamics and structure. Also, host co-infection by different pathogens may have important consequences for the evolution of hosts and pathogens, and host-pathogen co-evolution. Hence it is of interest to know if the distribution of pathogens over their host range is random, or if there are associations between hosts and pathogens, or between pathogens sharing a host. To analyse these issues we propose indices for the observed patterns of host infection by pathogens, and for the observed patterns of co-infection, and tests to analyse if these patterns conform to randomness or reflect associations. Applying these tests to the prevalence of five plant viruses on 21 wild plant species evidenced host-virus associations: most hosts and viruses were selective for viruses and hosts, respectively. Interestingly, the more host-selective viruses were the more prevalent ones, suggesting that host specialisation is a successful strategy for multi-host pathogens. Analyses also showed that viruses tended to associate positively in co-infected hosts. The developed indices and tests provide the tools to analyse how strong and common are these associations among different groups of pathogens, which will help to understand and model the population biology of multi-host pathogens.


INTRODUCTION
Pathogens have highly variable host ranges: in natural conditions some infect only one or a few related species (i.e., specialist pathogens) while other can infect a wide range of hosts belonging to different taxonomic groups (i.e., multi-host or generalist pathogens). A large fraction of described pathogens of humans, animals and plants are generalists [1][2][3]. The ability to infect different hosts conditions the epidemiology and pathogenicity of generalist pathogens and, therefore, is highly relevant for pathogen management and disease control [1,4]. The distribution of multihost pathogens over their host range, i.e. the frequency of infection in the various host species within an ecosystem, may vary largely, which could determine the population dynamics and structure of the pathogen. The distribution of a pathogen species over its host range may also determine important aspects of its biology in hosts significant from an anthropocentric viewpoint (i.e. target hosts), such as reservoirs and inoculum sources, emergence and reemergence, population thresholds for disease invasion or critical community size for disease persistence [e.g., 1,[4][5][6][7].
Animal or plant species may be hosts for a range of pathogens, and most host populations encounter a large number of different pathogen species [8]. For significant host species, there is abundant evidence of differences in the infection frequency of the various pathogen species present in an ecosystem. The distribution of pathogens over their hosts, and the distribution of different pathogens within a host species, will affect the frequency of multiple infection of an individual host by different pathogens. Multiple infection may have important consequences for the infected hosts, for the pathogens, and for host-pathogen coevolution [8,9]. In the host, frequent co-infections may lead to heterozygote superiority against multiple pathogens and contribute to the persistence in host populations of alleles conferring susceptibility to disease [10]. In multiple infected hosts, pathogens can cooperate or can compete for host resources, which will affect each other's fitness. Hence, multiple infections will be a factor in pathogen evolution. Theoretical analyses predict that the withinhost dynamics of microparasites in multiple infected hosts may have important consequences in the evolution of their virulence [11][12][13][14], and there is evidence that multiple infection may result in either increased or reduced virulence [e.g., [15][16][17]. Multiple infection of a host may also directly affect the genetic diversity of the pathogen population, as co-infection is a prerequisite for genetic exchange between different pathogen species or strains. Also, infection by one pathogen may result in an increased host susceptibility to a second pathogen, a common phenomenon named facilitation or predisposition by animal and plant pathologists, respectively [8,18].
In spite of its potential impact on pathogenicity, evolution, epidemiology and control, the distribution of pathogens over their host range and the occurrence of co-infections have been largely overlooked, and most research on pathogen ecology and epidemiology has dealt with specific pathogen-host interactions [8]. To our knowledge, it has not been analysed whether the distribution of pathogens over their host range is random or, alternatively, associations between pathogens and hosts occur, neither has been addressed whether host co-infection by different pathogens is random or associations between pathogens occur in particular hosts. Here we address these issues.
First, we propose indices for the observed patterns of host infection by different pathogens, and the observed patterns of coinfection, and tests to analyse if they conform to the null hypothesis of randomness or reflect associations. Second, we apply these tests to data on the prevalence of five insect-borne virus species in wild plant species within an agroecosystem in Central Spain. Results of these analyses uncover patterns that, if general, would be highly relevant to understand the ecology and evolution of pathogens.  [19]. Except for TSWV, which has a single-stranded RNA genome of negative and ambisense polarity, all other viruses have single-stranded RNA genomes of messenger polarity. AMV, CMV and WMV are transmitted by aphids in a non-persistent manner, i.e. the virus is retained in the distal structures of the aphid mouth parts for short period of time. BWYV is transmitted in a circulative, non-propagative manner, i.e., the virus penetrates through the gut wall into the haemocoel of the insect vector, and circulates with the haemolymph to reach the salivary glands, from where it is inoculated into new plants. TSWV follows a similar path within the thrips body, but infects and multiplies in the insect cells [19]. All five viruses cause important diseases in vegetable crops worldwide, including the studied region in Central Spain, but infection in the analysed wild hosts was asymptomatic. Table 1 shows the number of samples analysed and the number of infected plants by each of these five virus species, in single or multiple infection, in the 21 most frequently found plant species in three monitored habitats (see Methods) for the analysed period.

Association between viruses and hosts, and among viruses, in weeds in Central Spain
To this data set tests for association between hosts and pathogens (see Methods) were applied. The index of selectivity of pathogen (ISP), and its significance, is shown in Table 2 for the five viruses. The distribution of three of five analysed viruses over their hosts was significantly non-random, i.e. some of the available hosts were preferentially infected. Fig. 1 shows the relationship between prevalence and the ISP for the five viruses. A positive correlation was found for both parameters (r = 0.9347, P = 0.0189 in a Spearman rank correlation test), i.e., the more host-selective viruses were those with a highest prevalence in the analysed ecosystem. Similarly, the index of selectivity of the host (ISH), and its significance, was calculated for the 21 host plant species in Table 1, and values are shown in Table 3. For about half (9/21) of the analysed hosts (Amaranthus spp., Cirsium arvense, Convolvulus arvensis, Diplotaxis erucoides, Lactuca serriola, Medicago sativa, Portulaca oleracea, Solanum nigrum and Taraxacum spp.) differences in the prevalence of the five viruses departed significantly from random. Fig. 2 shows the relationship between virus prevalence and the ISH for the 21 host species. Again, a positive correlation between both parameters was found (r = 0.5161, P = 0.0166, in a Spearman rank correlation test), i.e., the more virus-selective hosts were those with a higher prevalence of virus infection. The relationships between prevalence and selectivity for viruses and hosts were not due to a coincidence in the frequency of infection among hosts by different viruses, as shown by a contingency analysis of counts of infected hosts by the different viruses (P,10 24 ).
For 16 of the 21 plant species in Table 1, co-infection with more than one of the five viruses occurred. For these 16 plant species, 102 plants were infected by at least one virus out of 1060 analysed plants ( Table 4). The above described test of association between pathogens was applied to this set. The data in Table 4 showed a tendency of the analysed viruses to associate positively: the distribution of the association index (AI) was skewed towards positive values (Fig. 3) so that out of 68 AIs computed for the five viruses in 16 plant species, 47/68 (more than two thirds) were positive and 21/68 were negative. Moreover, there was a conspicuous tendency of the positive AI values to have smaller probabilities (r = 20.6575, P,10 24 , in a Spearman rank correlation test). When the pooled sample from the sixteen plant species was considered, the AI was positive and significantly different from zero for each of the five viruses, i.e. each of the five viruses was found in co-infection with a frequency significantly higher than expected from the null hypothesis of independence of infection. However, this was not so when the data for each of the sixteen plant species were analyzed separately. Hence the association analysis uncovered two patterns that were not obvious: i) a general tendency of the analysed viruses to associate positively, ii) association depended on both the plant and the virus species.

DISCUSSION
Most efforts to understand the population biology of pathogens have focussed on specialist pathogens, and population biologists have successfully developed a formal understanding of the dynamics and evolution of single-host pathogens. However, most pathogens of humans, animals and plants are multi-host pathogens [1][2][3]20]. As stated by Woolhouse et al. [1] ''understanding the more complex population biology of multi-host pathogens will be one major challenge in the 21st century ''. There is evidence that within an ecosystem the prevalence of multi-host pathogens may differ largely for the different species of their host range [e.g., [21][22][23][24][25]]. Similarly, there is evidence of large differences in the prevalence on a host species of the various pathogens that are able to infect it [e.g., [26][27][28][29]]. However, no attempt has been made, to our knowledge, to analyse if differences in the distribution of multihost pathogens over their hosts are random or if there are associations between hosts and pathogens. The uncovering of associations between hosts and pathogens would be highly relevant to understand and model the population biology of multi-host pathogens, and for understanding the phenomenon of generalism itself.
We present here indices and tests to analyse if there is association between multi-host pathogens and their hosts. The proposed indices of selectivity for the pathogen and for the host measure the degree of association between hosts and pathogens.
The tests analyse the homogeneity of distribution of a pathogen over different host species or populations, and of different pathogens on a host, and analyse how significantly the values of the indices departs from zero (i.e. no association). The literature on pathogen ecology does not abound with data on the prevalence of various pathogens on various hosts. Hence, we have applied these indices to our unpublished data on the prevalence of five insectborne plant viruses on 21 species of wild plants in an agroecosystem in central Spain over a three year period.
The analysis of the prevalence of the different viruses in each host species by the homogeneity test that we propose, shows that half of the analysed plant species showed an index of selectivity of the host (ISH) significantly different from zero. The distribution of the host species showing virus selectivity was not related to taxonomy, habitat (fallow fields, edges or wastelands), seasonality or vegetative cycle (annual vs. perennial) (not shown). Interestingly, there was a positive correlation between the ISH and the average virus prevalence for these 21 host plant species, showing that the                 more selective hosts are more prone to be virus-infected, obviously by the virus(es) that better infects them. This phenomenon suggests that in spite that each host encounters a wide array of pathogens, mechanisms of escape and/or resistance [30] to some of them would operate, which could explain their selectivity. In fact, contingency analysis of counts of infected hosts by different viruses, suggest that different viruses specialise on different hosts. The analysis of the homogeneity of prevalence of a virus over its host species showed that for three of the five analysed viruses there was a significant host association, i.e., the value of the index of selectivity for the pathogen (ISP) significantly departed form zero. One major and unexpected finding of the analysis was that there was a positive and highly significant correlation between the value of the ISP and the prevalence of the viruses. The value of the ISP was not conditioned by the number of host plant species infected by each virus, as there was no correlation (r = 0.60, P = 0.173 in a Spearman rank correlation test) between ISP and the number of plant species that each virus infected in the analysed system i.e., the more selective viruses were not those infecting a smaller number of plant species. Thus, the more host-selective viruses were those that did best in the analysed ecosystem. This result could be highly relevant for understanding the evolution of generalism in pathogens. Although most described pathogens are generalists, the advantages of generalism are poorly understood. A generalist strategy provides the pathogen with more opportunities for transmission and survival, but it is predicted that evolution would favour specialism, because pathogen-host co-evolution could result in functional trade-offs that would limit the generalist    fitness in any one host [1,[31][32][33][34]. Our results are compatible with the hypothesis that specialism is advantageous for pathogens, as host selectivity is the rule for the analysed set of generalist viruses, and the more host selective is the virus, the more successful its strategy. Hence, our results could suggest that for generalist pathogens a degree of host specialisation, i.e. host-selectivity as defined here, is a successful strategy. Host specialisation in generalist pathogens would also be relevant for important issues of host and pathogen biology, as host specialisation will affect hostpathogen co-evolution and co-speciation, would reduce the opportunities for host switches and jumps, thus constraining the evolution of host expansion, and may result in spatial heterogeneity of hosts, thus favouring the stable maintenance of pathogen and host diversity [6,[35][36][37]. In addition, host specialisation may affect the opportunity for different pathogens of sharing a host and, thus, the consequences of multiple infection for pathogen and host evolution, as discussed below. We propose here also a simple procedure to estimate association among pathogens, which enables to compute an association index whose significance can be tested against the null assumption of independence of infections that follow a binomial distribution. The test was applied to the same data set as above, and the second major contribution of our analysis is the finding that co-infection was mostly non-random and that associations among the five analysed viruses were mostly positive. This result is relevant because co-infection of different pathogens may have important consequences for the pathogens, the infected hosts, and for hostpathogen co-evolution [8,9,14]. For viruses, co-infection of a host may result in the generation of new genotypes by recombination or by reassortment of genomic segments between different viral species or strains, often with dramatic changes in host range or pathogenicity. The classical example is the reassortment of avian and human strains of influenza A resulting in novel viruses with pandemic potential [38][39][40][41], but examples abound for both animal and plant viruses [e.g., [3,[42][43][44][45][46][47][48]]. In the individual host, coinfection may lead to aggravated disease, often resulting from extracellular cooperativity of independently replicating viruses, by which one virus modulates the host response to infection to the benefit of the other [49,50]. In addition, direct interactions of different viruses in co-infected cells may result in complementation   of highly pathogenic defective genotypes, in increased virus replication or in modified cell and tissue tropisms [e.g., [51][52][53][54][55][56][57]]. Alternatively, there is also evidence that mixed infections of pathogens result in reduced pathogenicity and less severe disease [17]. Examples from viruses include mixed infection with satellite or with defective interfering nucleic acids [58]. In our data set, association between viruses depended on each particular virus-host system. Hence, data suggest that in some hosts, but not in all, coinfection would be advantageous for some viruses, though the underlying mechanism remains to be analysed. The analysis here reported of plant virus infection on weeds has uncovered two major features that should be relevant to understand the population biology of viruses: i) the more hostselective viruses do better on the analysed ecosystem, ii) viruses tend to associate positively in co-infected hosts. It would be of high interest to know how general are these features and in which types of pathogens would they occur. The indices and tests that we propose here could be of general use in the analysis of the ecology of pathogens, and we hope that our results would prompt research on the ecology of pathogen-host and pathogen-pathogen associations, as these analyses might uncover pathogen properties relevant to the formal understanding of the population biology of multi-host pathogens.

Indices and tests
We study two factors relative of the distribution of pathogens in different hosts (i.e. different host populations, genotypes, species etc): if there are associations between pathogens and their hosts and if there are associations among pathogens. To analyse these two factors we propose the following tests and indices: Association between pathogens and their hosts Let us call N k the number of analysed individuals in host k (k~1, 2, :::: , n k ) and X ik the number of these individuals that are infected by pathogen i (i~1, 2, :::: , n i ). The prevalence of pathogen i in host k will be the ratio P ik = X ik /N k .
The average prevalence of pathogen i over hosts will be Conversely, the average prevalence of the different pathogens in host k can be defined as Homogeneity of the prevalence of a pathogen among hosts can be tested by means of a 2xn i contingency table with elements X ik and (N k 2X ik ) [59]. Different proportions (i.e. lack of homogeneity) will indicate a property of the pathogen that we will call selectivity. Selectivity will be measured by the Cramer's coefficient of contingency [59] of the contingency table. If x 2 i is the chisquared of the 2xn i table, the index of selectivity of the pathogen will be: Both of these indices range from zero to one, with zero meaning equal prevalence of the pathogen over hosts, or of pathogens over the same host, i.e. no selectivity for the pathogen or the host.
Association between different pathogens Let us call Xs ik and Xa ik the number of analysed individuals of host k that are infected only by pathogen i (single infections) and by pathogen i and at least another one (associated infections), respectively, (X ik = Xs ik +Xa ik ).
The frequency of pathogen i in host k can be estimated as: which equals the above defined prevalence. Under the null hypothesis of independence of infection by different pathogens, the probability of a sampled host individual being infected only by pathogen i is: The conditional probability of non-infection by any other pathogen given the presence of pathogen i is: ps ik~P j=i (1{P jk ), and the conditional probability for the observed multiple infections given the presence of i is: So, under the hypothesis of independence of infection by different pathogens (non-association between pathogens), Xs ik will be distributed as a binomial with X ik trials and probability ps ik . We define the association index (AI) for pathogen i in host k as the difference between the proportion of samples that being infected by pathogen i are infected also by at least another pathogen (Xa ik /X ik ), minus the expectation of this proportion under the null hypothesis (pa ik ). This index has a range from one to minus one and an expected value, under the null hypothesis, of zero. The significance of the observation can be estimated as a onetail test from the binomial above.
To test for association of different pathogens within a host, or for a given pathogen across different hosts, we follow the same process, as the expectation of a sum of observations will be equal to the sum of their expectations, and the corresponding sums of observations will be binomially distributed given the X ik .
To single out significant tests in a group, raw significance probabilities were corrected by the sequential Bonferroni method for multiple independent tests as indicated in [60].

Analyses of virus prevalence in wild plants
Plants were sampled monthly for three years in a horticultural area in central Spain within three habitats characterised by different degrees of human intervention: fallow fields, edges between fields, and wastelands. Plants were sampled systematically along fixed itineraries, with no consideration of symptom expression, as described in Sacristán et al. [21]. Infection by AMV, BWYV, CMV, WMV and TSWV in the sampled plants was analysed by double-antibody sandwich enzyme-linked immunosorbent assay (DAS-ELISA), using commercial antisera (Bio-Rad, Marnes-La-Coquette, France), according to the manufacturer's instructions.
The distribution of the host species showing virus selectivity according to taxonomy, habitat (fallow fields, edges or wastelands), seasonality or vegetative cycle (annual vs. perennial) was analysed by chi-squared tests of 2x N contingency tables, and their significances assessed, as in the rest of tests of this work, by simulation following model III.