## This is an uncorrected proof.

## Figures

## Abstract

There is a long-standing interest in understanding host-parasite coevolutionary dynamics and associated fitness effects. Increasing amounts of genomic data for both interacting species offer a promising source to identify candidate loci and to infer the main parameters of the past coevolutionary history. However, so far no method exists to perform the latter. By coupling a gene-for-gene model with coalescent simulations, we first show that three types of biological costs, namely, resistance, infectivity and infection, define the allele frequencies at the internal equilibrium point of the coevolution model. These in return determine the strength of selective signatures at the coevolving host and parasite loci. We apply an Approximate Bayesian Computation (ABC) approach on simulated datasets to infer these costs by jointly integrating host and parasite polymorphism data at the coevolving loci. To control for the effect of genetic drift on coevolutionary dynamics, we assume that 10 or 30 repetitions are available from controlled experiments or several natural populations. We study two scenarios: 1) the cost of infection and population sizes (host and parasite) are unknown while costs of infectivity and resistance are known, and 2) all three costs are unknown while populations sizes are known. Using the ABC model choice procedure, we show that for both scenarios, we can distinguish with high accuracy pairs of coevolving host and parasite loci from pairs of neutrally evolving loci, though the statistical power decreases with higher cost of infection. The accuracy of parameter inference is high under both scenarios especially when using both host and parasite data because parasite polymorphism data do inform on costs applying to the host and vice-versa. As the false positive rate to detect pairs of genes under coevolution is small, we suggest that our method complements recently developed methods to identify host and parasite candidate loci for functional studies.

## Author summary

It is of importance for agriculture and medicine to understand host-parasite antagonistic coevolutionary dynamics and the deleterious associated fitness effects, as well as to reveal the genes underpinning these interactions. The increasing amounts of genomic data for hosts and parasites offer a promising source to identify such candidate loci, but also to use statistical inference methods to reconstruct the past coevolutionary history. In our study we attempt to draw inference of the past coevolutionary history at key host and parasites loci using sequence data from several individuals and across several experimental replicates. We demonstrate that using a Bayesian statistical method, it is possible to estimate the parameters driving the interaction of hosts and parasites at these loci for thousands of generations. The main parameter that can be estimated is the fitness loss by hosts upon infection. Our method and results can be applied to experimental coevolution data with sequences at the key candidate loci providing enough repetitions and large enough population sizes. As a proof of principle, our results open the door to reconstruct past coevolutionary dynamics using sequence data of interacting species.

**Citation: **Märkle H, Tellier A (2020) Inference of coevolutionary dynamics and parameters from host and parasite polymorphism data of repeated experiments. PLoS Comput Biol 16(3):
e1007668.
https://doi.org/10.1371/journal.pcbi.1007668

**Editor: **Samuel Alizon,
CNRS, FRANCE

**Received: **June 5, 2019; **Accepted: **January 19, 2020; **Published: ** March 23, 2020

**Copyright: ** © 2020 Märkle, Tellier. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **Codes and pipelines associated with the article can be found at: https://gitlab.lrz.de/tellier/abc_coevolution_onepop.

**Funding: **This work was supported by the Deutsche Forschungsgemeinschaft (https://www.dfg.de/) (TE809/3-1 and TE809/4-1 within the DFG Priority Program SPP1819”Rapid evolutionary adaptation: Potential and constraints” awarded to AT). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Host-parasite coevolution is an ubiquitous process and has been demonstrated in terrestrial [1], limnological [2] and marine environments [3]. It describes the process of parasites and hosts exerting reciprocal selective pressures on one another. Therefore, coevolutionary dynamics are expected to substantially interact with and shape neutral nucleotide diversity linked to the coevolving sites. The latter can be single or multiple SNPs in coding or non-coding parts of genes [4, 5], insertions/deletions [6] or distributed across a gene network [7]. Accordingly, the polymorphism patterns at the coevolving loci, referred to as the genetic signatures, are expected to be distinct from loci not involved into the coevolutionary interaction. Therefore, host and parasite genomic data are not only expected to be a valuable source to identify loci under coevolution but also to understand the past coevolutionary history.

On the one hand, signatures of positive selection which are characterized by lower genetic diversity compared to the genome-wide average and increased levels in linkage disequilibrium [8] are expected to arise at the coevolving loci under so called arms-race dynamics [9, 10]. In arms race dynamics, frequencies of new beneficial alleles (such as new resistance or infectivity alleles) arising by *de novo* mutations increase towards fixation in both interacting partners. Accordingly, alleles are short lived and recurrently replaced and thus, allelic polymorphism is only transient [9, 10]. On the other hand, signatures of balancing selection characterized by higher than average diversity [11] are expected to be the result of so called trench-warfare dynamics (also referred to as Red Queen dynamics) [6, 9]. In this type of dynamics, several alleles are stably maintained over large time periods in both coevolving species. Hereby, allele frequencies either converge towards a stable equilbrium or they fluctuate persistently over time. Based on these classic expectations, genomic studies have unravelled positive and balancing selection signatures at various resistance genes [4–6, 12–16] and effector genes [17, 18].

However in reality, there is a continuum between arms-race and trench-warfare dynamics. The dynamics are in fact strongly affected by the type and strength of various forms of selection (negative indirect frequency-dependent selection, negative direct frequency-dependent selection, overdominant selection) and their interplay with genetic drift [19, 20] and mutation [21, 22]. Under negative frequency-dependent selection (nFDS) the fitness of a particular allele is either inversely proportional to its own frequency (direct, ndFDS) or to allele frequencies in the interacting partner (indirect, niFDS) [23, 24]. If only ndFDS is acting in single locus host-parasite coevolutionary interactions internal equilibrium points where several host and parasite alleles coexist may exist, but they are always unstable. In such systems overdominant selection or some form of ndFDS are a necessary but not always sufficient condition for trench-warfare dynamics to take place [21, 24]. Even if some form of ndFDS is acting, arms-race dynamics can take place if either the strength of ndFDS compared to niFDS is weak or genetic drift is causing random loss of alleles (see S1 Fig).

The exact nature of the dynamics, such as the equilibrium frequencies of alleles and the period and amplitude of coevolutionary cycles, are further affected by the way host and parasite genotypes interact at the molecular level and the fitness costs associated with the coevolutionary interaction. The interaction at the molecular level is captured by the infection matrix which stores the specificity and the level of infection in all possible pairwise interactions between host and parasite genotypes [25]. One well studied type of interaction is the gene-for-gene (GFG) interaction which presents one endpoint of a continuum of infection matrices [26, 27]. GFG-interactions are characterized by one universally infective parasite genotype and one universally susceptible host type and for example have been found in the Flax-*Melampsora lini* system [28].

A fitness cost which has been shown to crucially affect the coevolutionary dynamics is the loss in host fitness due to infection [19, 24], henceforward called the cost of infection (*s*). This fitness loss could be for example a reduced fertility or an increased mortality. In addition, costs of resistance (*c*_{H}) such as reduced competitive ability or fertility in absence of the parasite [29–31] and costs of infectivity (*c*_{P}) such as reduced spore production of highly infective pathogens [32] compared to pathogens with a more narrow infection range can further alter the dynamics. These costs also determine the equilibrium frequencies of the coevolutionary system [33, 34] at which one or several alleles are maintained or around which allele frequencies cycle. An important result from previous theoretical investigations [33, 34] is that the equilibrium frequencies in the host population depend on fitness costs applying to the parasite (cost of infectivity). Conversely, the equilibrium frequencies in the parasite population depend on host fitness costs (cost of resistance and cost of infection). Strictly speaking, coevolutionary dynamics in a GFG-interaction occur only when *s* > *c*_{H}. Coevolution is stronger with an increasing difference between these two parameters, the strength being measured by faster cycles and lesser sensitivity to the effect of genetic drift [19].

Given this continuum of coevolutionary dynamics, it is necessary to gain a deeper and refined understanding on how the interaction between allele frequency dynamics at the coevolving loci, genetic drift and mutation shapes the resulting genetic signatures at the coevolutionary loci and linked neutral sites. This is an important step for the development and application of methods designed to draw inference on the coevolutionary history. A previous study has investigated this link for two distinct coevolutionary models [19]. Focusing on a small set of summary statistics, the signatures at the coevolving loci cannot be necessarily distinguished from neutrality when considering host or parasite data in isolation. Moreover, the strength of genetic signatures at the coevolving loci depends on the host and parasite population sizes and varies with changing costs of infection, resistance and infectivity [19].

The first aim of the present paper is to extend this approach [19] by including additional summary statistics in order to get a more refined understanding of the resulting genetic signatures. Based on this extended set of summary statistics, our major aim is to jointly infer several of the above mentioned parameters as a proof-of-principle by using an Approximate Bayesian Computation approach [35–37]. We thus, specifically seek to understand how much information about parameters governing the past coevolutionary history is contained in, and can be inferred from, the polymorphism data at pairs of coevolving loci.

We base our inference on average summary statistics from *r* = 10 and *r* = 30 repeatedly simulated coevolutionary histories. We envision data from repeated experiments for the following two reasons. First, previous studies have shown [19] that drift substantially interacts with the coevolutionary dynamics and the resulting genetic signatures. Second, microcosm experiments offer the possibility to perform repeated coevolutionary experiments with the same initial conditions.

We test this approach on two different scenarios. In scenario 1, we aim to infer simultaneously the cost of infection (*s*), the host population size (*N*_{H}) and the parasite population size (*N*_{P}) assuming that we know the true cost of resistance (*c*_{H}) for resistant hosts and the true cost of infectivity (*c*_{P}) for infective parasites. This scenario mimics systems where experimental measures of the costs of resistance or infectivity have been performed [32, 38] and thus, these parameters can be assumed as known. In scenario 2, our goal is to infer simultaneously the cost of infection (*s*), the cost of infectivity (*c*_{P}) and the cost of resistance (*c*_{H}) assuming that the true host (*N*_{H}) and parasite population sizes (*N*_{P}) are known. Scenario 2 is motivated by the assumption that an independent estimate of the effective population size can be obtained by using full-genome data of loci unlinked to the coevolutionary locus. For each scenario we perform the ABC model choice to distinguish pairs of coevolutionary from neutral loci and subsequently infer the model parameters.

## Materials and methods

### General outline of the approach

Approximate Bayesian computation (ABC) is an inference method which can be used in situations where likelihood calculations are intractable, as is the case for the coevolutionary models [39]. The principle of ABC methods is to perform a large amount of simulations covering the parameter space for each of several possible models which are expected to reflect the past evolutionary history of the population(s) of concern and thus, having given rise to the observed data. These values of the different parameters of each model are drawn from prior distributions based on current knowledge. The observed data and each simulation are summarised by the same set of summary statistics to reduce their dimensionality. In a rejection step the best set of simulations, *i.e*. the simulations with the smallest distance to the summary statistics of the observed data, can be selected. Based on this retained simulations a model choice can be applied to obtain a posterior probability for each competing model. Under the model with the highest posterior probability, an additional regression step can be used to generate the posterior distribution of each parameter. In this paper we do not use real observed sequence data, but study the power of our approach using so-called pseudo-observed (simulated) datasets (PODs).

In more detail the workflow in our paper is as follows:

- We compare a model of coevolution between a single host and single parasite locus to a neutral model of independently evolving (non-interacting) pairs of host and parasite loci. Under each model, we simulate polymorphism data for
*n*= 50 haploid host individuals and*n*= 50 haploid parasite individuals. - We simulate
*r*replicates of these data corresponding to repeating*r*-times the coevolutionary history. Such repetitions can be obtained in controlled laboratory set-ups using for example microcosm/chemostat experiments with several replicates, or from several independent natural populations of the same host-parasite system with similar environmental conditions. - We summarise the obtained SNP data by a set of 17 statistics (Table 1) for each of the
*r*− replicates. - We calculate the mean for each of the 17 statistics across the
*r*-replicates. These average values are used as summary statistics in the ABC. Therefore, the average statistics from one set of*r*-replicates define a given pseudo-observed dataset (POD). - We first perform a model choice between the coevolution model and the neutral model based on our PODs. For each POD, we select the 1% closest simulations based on the set of summary statistics. Based on these retained simulations we compute the posterior probability for both models.
- In a second step, we estimate the posterior distribution of the coevolutionary parameters for the PODs. We apply a post-sampling adjustment (regression) based on the 1% best simulations under the coevolutionary model.

### Simulation of SNP data at the coevolutionary loci

SNP data at the coevolutionary loci are simulated by using a forward-backward approach as outlined in [19].

#### Forward in time coevolution model.

We model coevolution between a single haploid host and a single haploid parasite species. The coevolutionary interaction in both species is driven by a single bi-allelic functional site (SNP, indel, …). This functional site is located in the coevolutionary locus which encompasses several other linked neutral sites. Hosts are either resistant (*RES*) or susceptible (*res*) and parasites are either non-infective (*ninf*) or infective (*INF*). Thus, the model follows a gene-for-gene interaction with the following infection matrix:
(1)
where each entry gives the probability that a given host genotype is infected by a given parasite genotype. Hence, a 1-entry in the infection matrix indicates that the parasite always infects the host and a 0-entry indicates that the host is fully resistant towards the parasite.

We denote the frequency of resistant hosts (susceptible hosts) by *R* (*r*) and the frequency of infective parasite (non-infective parasites) by *a* (*A*). The coevolution model is based on the polycyclic auto-infection model in [24]. This population genetics model (*sensu* [40]) assumes host and parasite population sizes to be constant regardless of the disease prevalence and is based on non-overlapping host and parasite generations. As such it is probably most suited to describe plant-parasite or invertebrate-parasite systems.

Polycyclic diseases are characterized by more than one infection cycle per season. For simplicity, the model is based on *T* = 2 infection cycles per discrete host generation *g* each caused by a single discrete parasite generation *t* (*t* ∈ {1, 2}). An auto-infection refers to an infection where a parasite re-infects the host individual on which it was produced. Therefore, resistant (*R*_{g}) and susceptible hosts (*r*_{g}) which are infected by infective parasites (*a*_{g,1}) in the first infection cycle (*t* = 1) stay infected by infective parasites in the second infection cycle (*t* = 2). This causes a fitness reduction *s*_{1} = *s*, where *s* is called the cost of infection and refers to the host fitness loss resulting from infection such as increased levels of mortality or reduced fertility. The same applies to susceptible host (*r*_{g}) infected by non-infective parasites (*A*_{g,1}) in the first infection cycle (*t* = 1). Resistant host which are attacked by non-infective parasites in the first infection cycle (*t* = 1) resist infection. In the second infection cycle (*t* = 2), this fraction of resistant hosts (*R*_{g} ⋅ *A*_{g,1}) either receives a non-infective parasite (*A*_{g,2}) resulting in no fitness loss or an infective parasite (*a*_{g,2}) resulting in a reduced cost of infection *s*_{2} = *s*/2. Therefore, the cost of infection (*s*) is different from a classic selection coefficient as its effect depends on the infection matrix and the frequencies of the different parasite genotypes. The resistant (*RES*) allele in the host comes at cost *c*_{H} (cost of resistance) and the infectivity (*INF*) allele in the parasite comes at cost *c*_{P} (cost of infectivity).

The allele frequencies of resistant hosts (*R*_{g}), susceptible hosts (*r*_{g}), non-infective parasites (*A*_{g,t}) and infective parasites (*a*_{g,t}) are given by the following recursive equations:
(2a)
(2b)
(2c)
with *A*_{g,t} = 1 − *a*_{g,t} and *r*_{g} = 1 − *R*_{g}. The equilibrium frequencies [24] at the internal, non-trivial equilibrium point are approximately given by:
(3)
Note that in this model coevolution, which we define as the occurrence of coevolutionary cycles, only takes place if the cost of resistance (*c*_{H}) is smaller compared to the cost of infection (*s*). Otherwise the susceptible allele in the host always fixes immediately, irrespective of the initial frequencies of alleles.

In the forward part, we obtain the frequencies of the different alleles at the beginning of each discrete host generation *g* in three steps:

- Using the discrete-time gene-for-gene coevolution model from Eq (2), we compute the expected allele frequencies in the next generation (under the infinite population size assumption).
- Genetic drift is incorporated by performing a binomial sampling based on the frequency of the
*RES*-allele (*INF*-allele) after selection (Eq (2)) and the finite and fixed haploid host population size*N*_{H}(parasite population size*N*_{P}) as in [19] (see also [41]). - Recurrent allele mutations take place and change genotypes from
*RES*to*res*at rate*μ*_{Rtor}or*res*to*RES*at rate*μ*_{rtoR}in the host and from*ninf*to*INF*at rate*μ*_{ntoI}and from*INF*to*ninf*at rate*μ*_{Iton}in the parasite. Henceforward, such mutations are referred to as functional mutations. In the reminder of this manuscript we set all functional mutation rates to*μ*_{Rtor}=*μ*_{ntoI}=*μ*_{rtoR}=*μ*_{Iton}= 10^{−5}(for a discussion on these values see [19, 41]).

Repeating this procedure for *g*_{max} host generations, we obtain the so called frequency path, which summarizes the allele frequencies at both loci forward in time.

#### Backward in time coalescent.

To obtain polymorphism data at the coevolutionary loci we combine the obtained frequency paths which include genetic drift and recurrent mutations with coalescent simulations separately for the host and the parasite. The host and parasite frequency paths are used separately as input for a modified version of msms [19, 42], after scaling time appropriately in units of the respective population sizes (for more information see S1 File). Based on the allele frequency in a species at present, a coalescent tree is build backward in time using msms. A sample of size *n*_{H} (*n*_{P}) is drawn at random from the host (parasite) population consisting of *RES* and *res*-alleles (*ninf* and *INF*-alleles) [19]. The tree shape and length are conditioned on the changes in allele frequencies, including fixation or loss [19]. To clarify the forward—backward correspondence, let us describe the case of recurrent selective sweeps in the parasite population. In a monomorphic parasite population of allele *INF*, a functional mutation with rate *μ*_{Iton} can reintroduce forward in time a mutant *ninf*. This allele reaches fixation and the population is then monomorphic for allele *ninf*. Backward in time, this is equivalent, in msms, to the decrease of the *ninf* allele population size until only one last individual exhibits this allele. This last *ninf* coalescent lineage then migrates to the population of allele *INF*. The forward frequency path and the backward msms simulations are thus coupled for the re-introduction of new alleles due to functional mutations in analogy to gene flow in a structured coalescent with two demes [43].

The forward in time coevolution model is run for *g*_{max} = max(3*N*_{H}, 3*N*_{P}) generations assuming a small initial frequency of *RES* (*R*_{0} = 0.2) and *INF* (*a*_{0} = 0.2) alleles. The length of simulation time was previously found to be sufficient to observe signatures of selective sweeps and balancing selection in host or parasite [19]. In msms, the backward simulations conditioned on the frequency paths are run for the same amount of time. If after *g* generations, several coalescent lineages remain and/or the most recent common ancestor of both functional alleles has not been reached, a neutral Kingman coalescent process is built until a common ancestor of all remaining lineages is found. Note that that in this last temporal phase of the simulation, *i.e*. older than *g* generations in the past, the functional alleles in hosts (*RES* and *res*) and in parasites (*INF* and *ninf*) have the same fitness (and are exchangeable within species). We therefore simulate a coevolution history of *g* generations.

We set the sample size to *n*_{H} = 50 for the host (*n*_{P} = 50 for the parasite) which are adequate to capture balancing selection if one of the allele occurs in low frequency at the present time of sampling [19]. For both species we assume realistically a coevolutionary locus of length 2500 bp without recombination and a per site neutral mutation rate of 10^{−7}. Accordingly, the neutral population mutation rate is *θ*_{H} = 2 ⋅ *N*_{H} ⋅ 2500 ⋅ 10^{−7} for the host (*θ*_{P} = 2 ⋅ *N*_{P} ⋅ 2500 ⋅ 10^{−7} for the parasite) defining the number of mutations found on the host and parasite coalescent trees (and in the polymorphism data).

#### Calculating statistics for the SNP-data.

For each msms-output we calculate eight statistics for each species which are based on the site frequency spectrum (SFS) of the respective coevolving locus (Table 1). We only use statistics based on the unfolded site frequency spectrum (SFS), as it can be hard to obtain unbiased haplotype statistics depending on the sequence method. In addition to these 16 statistics we calculate the (**P**airwise **M**anhattan **D**istance) which is based on comparing the host and parasite site frequency spectra (see S2 File).

#### Additional coevolutionary models tested.

Additionally, we study two extensions, **B** and **C** of the model from Eq (2) (Model **A**), in order to check for the generality of our results. In model **B**, we extend the described model to include more than two parasite (*T* > 2) generations per host generation *g* (see S1 File). In model **C**, we keep *T* = 2 but allow for allo-infection to take place at rate (1 − *ψ*) in the second parasite generation (*t* = 2) within host generation *g* (see S1 File). Based on the equations (S1 File), we generate forward in time simulations with genetic drift and functional mutations (as described above and in S1 File) and the expected coevolutionary signatures at the coevolving loci. We study how the values of the different statistics obtained under these two more realistic but complex models differ from those of the main model from Eq (2).

### ABC inference

In the following section, we lay out the two scenarios to be investigated, the simulations for obtaining the PODs, and the prior distributions for the coevolutionary and neutral models. Finally, the ABC model choice and parameter estimation procedures are described.

#### Inference scenarios.

We focus on two scenarios. In scenario 1, we aim to infer the cost of infection (*s*), the host population size (*N*_{H}) and the parasite population size (*N*_{P}). Therefore, the cost of resistance (*c*_{H}) and the cost of infectivity (*c*_{P}) are assumed to be known. In scenario 2 the goal is to infer the cost of infection (*s*), the cost of resistance (*c*_{H}) and the cost of infectivity (*c*_{P}), assuming that the host (*N*_{H}) and parasite (*N*_{P}) population sizes are known.

#### Generating pseudo-observed data sets.

Each pseudo-observed datasets (PODs) is composed of *r* = 30 (or *r* = 10 in SI figures) repetitions of the coevolutionary history under a particular combination of parameters (*s*, *c*_{P}, *c*_{H}) while fixing the haploid population sizes to *N*_{H} = *N*_{P} = 10, 000 and the population mutation rates to *θ*_{H} = *θ*_{P} = 5.

For scenario 1, we simulate PODs for values of cost of infection (*s*) ranging from *s* = 0.15 to *s* = 0.85 (in steps of size 0.05) while fixing the cost of resistance to *c*_{H} = 0.05 and the cost of infectivity to *c*_{P} = 0.1. For each value of *s*, 30 independent PODs are simulated.

For scenario 2, we generate PODs for the 60 possible combinations of *c*_{H} ∈ {0.05, 0.1}, *c*_{P} ∈ {0.1, 0.3} and *s* from 0.15 to 0.85 (in steps of size 0.05). For each of these combinations, 15 PODs are generated.

#### ABC sampling: Priors of the coevolutionary model.

For both scenarios, between 95, 000 and 100, 000 datasets are generated from the coevolutionary model based on the following priors (defined with the ABCsampler from ABCtoolbox, Version 1.0, [44]).

In scenario 1, defined with *c*_{H} = 0.05 and *c*_{P} = 0.1, the cost of infection is drawn from a uniform prior such that , and the host and parasite population sizes are drawn for log uniform distributions such that and . The population mutation rates are calculated as *θ*_{H} = 2*N*_{H} ⋅ 25000 ⋅ 10^{−7} and *θ*_{P} = 2*N*_{P} ⋅ 25000 ⋅ 10^{−7} (see Table 2).

Settings for the ABC simulations under scenario 1.

In scenario 2, defined by *N*_{H} = *N*_{P} = 10, 000 and *θ*_{H} = *θ*_{P} = 5, the cost of infection is drawn from a uniform distribution such that , and the cost of resistance and infectivity from uniform distributions such that and (see Table 3).

Settings for the ABC simulations under scenario 2.

#### ABC sampling: Priors of the neutral model.

As for the coevolution model, we obtain between 95, 000 and 100, 000 data sets for a corresponding neutral model for each scenario. This neutral simulations are generated by coalescent simulations with msms [42] for a non-recombining host and parasite locus with the same length (2500bp) as in the coevolutionary model. To mimic data obtained from the same repeated evolutionary history, we generate *r* = 30 repetitions of the neutral coalescent process. For each replicate we calculate the same 17 statistics as under the coevolution model which are defined in Table 1. The summary statistics used in the ABC consist of the average value of each statistic over the *r* replicates.

In scenario 1, the neutral simulations are based on the same priors for the host and parasite population sizes as in the corresponding coevolutionary model. Accordingly, both, the host and the parasite population size are drawn from log uniform distributions ( and ). The population mutation rates are calculated as *θ*_{H} = 2*N*_{H} ⋅ 25000 ⋅ 10^{−7} and *θ*_{P} = 2*N*_{P} ⋅ 25000 ⋅ 10^{−7}.

In scenario 2, we simulate neutral datasets for constant host and parasite population sizes (*N*_{H} = *N*_{P} = 10, 000) and thus, the population mutation rates are *θ*_{H} = *θ*_{P} = 5.

#### ABC model choice.

The ABC model choice procedure is used to test whether a pair of coevolving loci can be discriminated from pairs of neutral loci based on our set of summary statistics and within the range of priors for our outlined scenarios. To find genes under coevolution, we wish to access the False Positive (FPR) and the False Negative (FNR) rate. These rates are also referred to as the confusion matrix in the ABC literature. Under the hypothesis that two genes (one from the host and one from the parasite) are coevolving, the FPR is the percentage of pairs of truly neutral loci which have a higher posterior probability in support of the coevolution model rather than the neutral model. Thus, these loci would be incorrectly identified as coevolving although in fact they are not. On the other hand, the FNR is defined as the percentage of truly coevolving pairs of loci which have a higher posterior probability in support of the neutral model (rather than the coevolving model). These loci would be considered as neutral although they are in fact coevolving. To access the FPR and FNR, we first perform a leave-one-out cross-validation running the function cv4postr of the abc r-package (version 2.1, [45]) for each scenario 1 and 2. The cross-validation is based on the rejection algorithm and works as follows. For a given scenario (1 or 2), a dataset, called validation simulation, is chosen at random from all simulations which have been performed under one of the two models (coevolution or neutral) with their respective priors. The summary statistics of all simulations are standardised by their median absolute deviation. Based on these normalised summary statistics the Euclidean distance between the summary statistics of the validation simulation and all other simulations from both models is calculated. The one percent of the simulations with the smallest Euclidean distance to the validation simulation are retained and all other simulations are rejected [45]. Based on these retained simulations, the posterior probability for each of the two models is calculated for this given validation simulation. This procedure is repeated for 500 validation simulations for each model within each scenario. The FDR and FNR are thus computed for each scenario.

After the cross-validation we perform a model choice for each of the PODs to investigate the effect of specific coevolutionary parameters on the accuracy of model choice. For each scenario we used the same settings and simulations for the coevolution model and the neutral model as for the cross-validation. For each POD we retain the 1% best simulations and report the posterior probability for the coevolution model.

#### ABC parameter estimation.

The inference of the coevolution model parameters is obtained using the ABCestimator within the ABCtoolbox (Version 1.0, [44]). We retain the 1,000 simulations with the smallest Euclidean distance (without summary statistics normalisation) to the respective POD (rejection step). The standard ABCestimator applies a Gaussian kernel smoothing for each parameter (width of Dirac peak set to 0.01) followed by a post sampling adjustment via a general linear model [44, 46]. We report the median of the posterior marginal density distribution for each parameter. For each POD we perform the parameter estimation based on a) host and parasite summary statistics, b) host summary statistics only and c) parasite summary statistics only.

## Results

### Link between coevolutionary dynamics and sequence data

Previous work has dealt with understanding the coevolutionary dynamics under a gene-for-gene coevolution model [24] and the resulting genetic signatures [19]. We provide a short summary of these results here to help the reader to gain an intuition regarding the ABC results. A classic coevolutionary cycle in this Gene-For-Gene model consists of four phases (see S1 and S2 Figs, [24, 47]):

- The frequency of resistant (
*RES*) hosts increases when infective (*INF*)-parasites are in low frequency. - As a response to the increasing frequency of
*RES*-host the frequency of*INF*-parasites increases very quickly and the parasite population reaches almost fixation for the*INF*-allele. - This results in a decrease of the frequency of
*RES*-hosts due to the cost of resistance. - Once
*RES*-hosts are in low frequency the frequency of non-infective parasites (*ninf*)-parasites increases due to the cost of infectivity (*c*_{P}).

Depending on the combination of cost of infection (*s*), cost of resistance (*c*_{H}) and infectivity (*c*_{P}) the model either exhibits trench-warfare dynamics or arms-race dynamics (for an explanation see S1 Fig). Trench-warfare dynamics mainly take place for small to intermediate costs of infection. The dynamics switch to arms-race for high costs of infection (S2 and S3 Figs), irrespective of *c*_{H} and *c*_{P} When arms-race dynamics take place the parasite population always exhibits fixation of the *INF*-allele within the chosen range of parameters. The speed of the subsequent fixation of the *res*-allele in the host depends on the cost of resistance (*c*_{H}) and is faster for higher costs of resistance (*c*_{H}).

The internal equilibrium frequencies under trench-warfare dynamics are affected as follows. The frequency of *RES*-hosts mainly increases with increasing cost of infectivity (*c*_{P}) (S3a+S3b Fig vs. S3c+S3d Fig), increases very slightly with increasing cost of infection (*s*) and remains almost unaffected by changing costs of resistance (*c*_{H}) (S3a+S3c Fig vs. S3b+S3d Fig). The opposite is true for the parasite. Here, the equilibrium frequency of the infective (*INF*)-parasite rises mainly with increasing cost of infection (*s*) (S3 Fig). Higher costs of resistance (*c*_{H}) decrease the equilibrium frequency of *INF*-parasites (S3a+S3c Fig vs. S3b+S3d Fig) for a given value of *s*. In contrast to the host, the equilibrium frequencies in the parasite are almost unaffected by changes in the cost of infectivity (*c*_{P}). Whenever allele frequencies are close to the boundaries alleles can be lost at random due to genetic drift (see S1 Fig).

The changes in equilibrium frequencies with changing cost of infection (*s*), cost of resistance (*c*_{H}) and changing cost of infectivity (*c*_{P}) are reflected by the resulting genetic signatures at the coevolving loci (S10 Fig). We summarize the genetic signatures of coevolution chiefly as the behaviour of Tajima’s D under selective sweeps and balancing selection (S10 and S4 Figs) is well known. Generally, the strongest signatures of balancing selection, indicated by high Tajima’s D values, can be observed when the equilibrium frequencies of *INF*-parasites or *RES*-hosts are close to 0.5 (see S3 and S10 Figs). The strength of the signatures declines the further the equilibrium frequencies move away from 0.5.

The genetic signature in the parasite changes strongly with changing cost of infection (*s*), irrespectively of *c*_{H} and *c*_{P} (S10 Fig). Further, the resulting genetic signatures in the parasites for a given cost of infection *s* are distinguishable for different costs of resistance but not for different costs of infectivity. The genetic signature in the host is mainly indicative about the cost of infectivity (*c*_{P}), a cost which is affecting the parasite fitness, whereas the signature in the parasite is mainly informative about the costs of resistance (*c*_{H}) and infection (*s*), parameters with a direct fitness effect on the host (S10 Fig).

The qualitative changes of the genetic signatures for changing costs of infection remain similar even when population sizes differ in both interacting partners (S4 Fig). However, their strength is affected by the population sizes.

### Inference of coevolutionary dynamics from polymorphism data

#### Scenario 1: Model choice.

Under scenario 1 and *r* = 30, the model choice procedure is suited to distinguish a coevolutionary model in which the cost of infection (*s*), host population size (*N*_{H}) and parasite population size (*N*_{P}) are unknown from a neutral model where the host and parasite population size are unknown. The cross-validation reveals (values for *r* = 10 in brackets) that 482 (441) out of 500 coevolution simulations are correctly identified, while 18 (59) are misclassified as neutrally evolving pairs of loci, yielding a FNR of 3.6% (respectively 11.8%). In addition, 498 (495) neutral simulated pairs of loci are correctly identified as evolving neutrally, yielding a FPR of 0.4% (respectively 1% for *r* = 10) (see S5 Fig for *r* = 30, S6 Fig for *r* = 10). When analysing results for the PODs the accuracy of model choice is very high for low costs of infection but becomes worst when *s* > 0.6 (Fig 1 for *r* = 30, S7 Fig for *r* = 10). It is apparent from Fig 2 that for Tajima’s D and PMD all PODs with intermediate to high *s* are in the cloud of neutral simulations (see S8 Fig for *r* = 10). For high values of *s*, dynamics are indeed switching to arms-race generating fast recurrent selective sweeps. Hence, the values of these statistics become similar to neutral expectations under small host and parasite population sizes. Additional simulations show that for small costs of infection ranging from *s* = 0.01 to *s* = 0.09 the signatures are indistinguishable from neutral signatures as long as *s* < *c*_{H} as the susceptible allele is always fixed immediately. As soon as *s* >= *c*_{H} the signatures can be clearly distinguished from neutral signatures as coevolutionary dynamics are taking place (see S20 Fig for *r* = 30 and S21 Fig for *r* = 10).

Results shown for 30 repetitions and 30 PODs per value of the cost of infection (*s*). Results for single PODs are shown as dots. Model choice distinguishing a coevolution model with unknown costs of infection (*s*), host population size (*N*_{H}) and parasite population size (*N*_{P}) from a neutral model with unknown host and parasite population sizes. Note that for these points we added some jitter to the x-values in order to increase the readability of the plots.

Pairwise Manhattan distance (x-axis) and the difference between Tajima’s D of the host and of the parasite (y-axis) for the PODs used for inference in Scenario 1 and the 100,000 neutral simulations run for this scenario. Under the neutral model, host and parasite population sizes vary. Simulations under the neutral model are shown as grey open circles, and a bivariate normal kernel estimation has been applied to obtain a probability density of the summary statistic combinations. The PODs for scenario 1 are shown as diamonds and are coloured coded based on the true cost of infection (*s*).

#### Scenario 1: Parameter estimation.

Our results indicate that it is possible to jointly infer the cost of infection (*s*), the host population size (*N*_{H}) and the parasite population size (*N*_{P}) using polymorphism data from the host and parasite (Fig 3 for *r* = 30, S9 Fig for *r* = 10). Generally, the accuracy of inference mainly depends on 1) the true value of the cost of infection and the 2) the type of polymorphism data being used (host and parasite together, only host or only parasite).

Median of the posterior distribution (y-axis) for the cost of infection *s* (top, a-c), host population size (*N*_{H}) (middle, d-f) and parasite population size (*N*_{P}) (bottom, g-i) when inference is based on host and parasite summary statistics (left), only host summary statistics (middle) or only parasite summary statistics (right). The median of the posterior distribution (after post-rejection adjustment) is plotted for each POD. The true cost of infection for each POD is shown on the x-axis with jitter added to increase the readability. The *R*^{2}-value of a corresponding linear regression model is shown in each panel.

Inferences of the cost of infection and of the population sizes are the most accurate if both host and parasite polymorphism data are used (Fig 3, S9 Fig). Using only parasite polymorphism data is also quite accurate for inferring small to intermediate values of the cost of infection (*s* < 0.6) (Fig 3c, S9c Fig) where trench-warfare dynamics take place. In contrast, using only host polymorphism data shows markedly less accuracy in the same parameter range (Fig 3b, S9b Fig). Overall the inference results become less accurate when decreasing the number of repetitions to *r* = 10 (S9 Fig).

#### Scenario 2: Model choice.

Under scenario 2 and *r* = 30, model choice is suited to discriminate between coevolution and neutral evolution. Out of the 500 coevolution validation simulations 470 (417 for *r* = 10) are correctly classified as coevolving pairs of loci whereas 30 (83) are classified as neutrally evolving pairs, yielding a FNR of 6% (respectively 16.6% for *r* = 10). In addition, 489 (495 for *r* = 10) neutral simulated pairs of loci are correctly identified as evolving neutrally, yielding a FPR of 2.2% (respectively 1% for *r* = 10) (see S11 Fig for *r* = 30, S12 Fig for *r* = 10). When analysing results for the PODs the accuracy of model choice is very high under a higher cost of infectivity (*c*_{P} = 0.3). For a lower value of *c*_{P} = 0.1, the model choice becomes less accurate when *s* increases, especially when *c*_{H} = 0.1 (Fig 4 for *r* = 30, S13 Fig for *r* = 10). As for scenario 1, it is also apparent from Fig 5 that some PODs are found within the cloud of neutral simulations (see S14 Fig for *r* = 10). For high values of *s*, dynamics are indeed switching to arms-race generating fast recurrent selective sweeps. Note however, the interesting case of *c*_{H} = *c*_{P} = 0.1 which displays the worst accuracy for high values of *s*. This is explained by very fast recurrent selective sweeps along with very fast coevolutionary cycles due to the combination of high cost of resistance and low cost of infectivity. Note that such fast cycles affect more strongly other statistics (in particular the nucleotide diversity) than the three we present in Fig 5 (Tajima’s D host and parasite and PMD), thus highlighting the need to include a larger number of summary statistics in the ABC procedure. As for scenario 1, additional simulations show that for cases where the cost of infection is chosen to be smaller that the cost of resistance signatures are not distinct from those of coevolutionary loci, as the susceptible allele fixes immediately and thus, also the *INF*-allele in the parasite has always bears a fitness disadvantage. Once the cost of infection is larger than the cost of resistance, trench-warfare dynamics are taking place, and therefore, pairs of candidate loci are distinguishable from pairs of neutral loci (see S22 Fig for *r* = 30 and S23 Fig for *r* = 10).

Results are shown for *r* = 30 and 15 PODs per boxplot. The posterior density in support of the coevolution model (y-axis) is shown for PODs with varying cost of infection (*s*). The different panels reflect the combination of *c*_{H} and *c*_{P} for the respective PODs (left: *c*_{H} = 0.05, right: *c*_{H} = 0.1, top: *c*_{P} = 0.1, bottom: *c*_{P} = 0.3). Model choice has been run to distinguish a coevolution model with unknown costs of infection (*s*), cost of resistance (*c*_{H}) and cost of infectivity (*c*_{P}) from a neutral model with constant host and parasite population size (*N*_{H} = *N*_{P} = 10, 000). Results for single PODs are shown as dots and jitter added to the x-values to increase the readability.

Pairwise Manhattan distance (x-axis) and the difference between Tajima’s D of the host and of the parasite (y-axis) for the PODs used for inference in Scenario 2 and 100,000 neutral simulations. Simulations under the neutral model are shown as grey open circles. A bivariate normal kernel estimation has been applied to obtain a probability density of the different summary statistic combinations. The PODs for scenario 2 are shown in color. Colors reflect the true cost of infection (*s*) for a particular POD (see legend) and shapes indicate the combination of *c*_{H} and *c*_{P} (diamonds: *c*_{H} = 0.05, *c*_{P} = 0.1; circles: *c*_{H} = 0.05, *c*_{P} = 0.3; crosses: *c*_{H} = 0.01, *c*_{P} = 0.1; stars: *c*_{H} = 0.1, *c*_{P} = 0.3) for the respective POD.

#### Scenario 2: Parameter estimation.

As for scenario 1, the accuracy of inference for scenario 2 is best if data from both the host and the parasite are available (Figs 6 and 7 for *r* = 30, S15 and S16 Figs for *r* = 10). However, inference of the cost of infection *s* is less accurate compared to scenario 1. The most accurate inference results are obtained for intermediate costs of infection. This is due to the fact that signatures in the host and the parasite are differentially affected by the various costs (S10 Fig).

Median of the posterior distribution (y-axis) for the cost of infection *s* (top, a-c), cost of resistance (*c*_{H}) (middle, d-f) and cost of infectivity (*c*_{P}) (bottom, g-i) when inference is based on host and parasite summary statistics (left), only host summary statistics (middle) or only parasite summary statistics (right). The median of the posterior distribution (after post-rejection adjustment) is plotted for each POD. The true cost of infection for each POD is shown on the x-axis with jitter added to increase the readability. The *R*^{2}-value of a corresponding linear regression model is shown in each panel.

Median of the posterior distribution (y-axis) for the cost of infection *s* (top, a-c), cost of resistance (*c*_{H}) (middle, d-f) and cost of infectivity (*c*_{P}) (bottom, g-i) when inference is based on host and parasite summary statistics (left), only host summary statistics (middle) or only parasite summary statistics (right). The median of the posterior distribution (after post-rejection adjustment) is plotted for each POD. The true cost of infection for each POD is shown on the x-axis with jitter added to increase the readability. The *R*^{2}-value of a corresponding linear regression model is shown in each panel.

Inference of the cost of resistance (*c*_{H}) works reasonably well if polymorphism data only from the parasite are available. However, this comes at the cost of less accurate inference of the cost of infection (*s*) as both parameters are affecting the equilibrium frequency in the parasite (S3 Fig, Figs 6 and 7 for *r* = 30, S15 and S16 Figs for *r* = 10). This effect is especially pronounced when the cost of infection (*s*) is low and only the information from the parasite polymorphism data are available (see Fig 6c+6f). The inference of the cost of infectivity (*c*_{P}) is reasonably accurate if polymorphism data only from the host are available (Figs 6h and 7q). This is due to the fact that the cost of infectivity (*c*_{P}) mainly affects the equilibrium frequency in the host but not in the parasite (S3 Fig). Therefore, inference of this parameter does not work if only parasite polymorphism data are available (Figs 6i and 7r).

## Discussion

In the present study we explicit a link between coevolutionary dynamics (S3 Fig), the resulting genetic signatures (S4 and S10 Figs) and the subsequent amount of information which can be extracted from genetic signatures at the coevolving loci (Figs 3, 6 and 7). Our results indicate that under trench-warfare dynamics the allele frequencies at the non-trivial internal equilibrium point affect the strength of genetic signatures at the coevolving loci in both, the host and parasite. Furthermore, pairs of coevolving loci are well discriminated from pairs of neutral loci by ABC model choice (Figs 1 and 4, S7 and S13 Figs). However, the accuracy decreases for higher costs of infection. We further show as a proof of principle that it is possible to infer the parameters underlying the coevolutionary interaction from polymorphism data at the loci under coevolution if some relevant parameters such as diverse costs (Fig 3) or population sizes (Figs 6 and 7) are known. The inference is accurate if polymorphism data from both the host and the parasite are available from at least ten repetitions of the coevolutionary history (S9, S15 and S16 Figs).

### Coevolutionary dynamics and inference

As already shown in [19] there is a continuum of genetic signatures which can arise at the loci under coevolution. This contrasts to the often postulated dichotomy that arms-race dynamics result in strong selective sweep signatures and trench-warfare dynamics in strong balancing selection signatures.

In general, the strength of the selective signatures under trench-warfare dynamics is a result of the internal equilibrium frequencies, the fluctuations around these equilbrium frequencies, the amount of genetic drift in both partners and the proximity of these equilibrium frequencies to the fixation boundaries. When equilibrium frequencies are close to boundaries, alleles can be easily lost by drift and thus, arms-race dynamics take place although trench-warfare dynamics would be predicted based on the deterministic model (see S1 Fig). Moreover, if the costs are very small (for example, *c*_{H} = *c*_{P} = 0.01), the equilibrium frequencies may also be close to the boundaries, and coevolutionary dynamics are strongly affected by to genetic drift. This generates a discrepancy between the expected deterministic behaviour and the observed behaviour under finite population size [19, 41]. In such case, the polymorphism signature at the coevolutionary loci would appear as neutral and such loci would inflate the rate of false negatives. As most studies of coevolution, we thus rely on the costs of resistance and infectivity not to be too small (*e.g*. [6, 30, 32, 33, 38, 47]).

The strong link between equilibrium frequencies under trench-warfare dynamics and resulting genetic signatures can be explained in terms of the underlying approximated structured coalescent tree with two alleles in each species (*RES* and *res* for the host and *INF* and *ninf* for the parasite). This model is analogous to a two-demes model with gene flow [43]. When the frequencies of both alleles are fairly similar they have equal contributions to the sample, and the underlying coalescent tree is well balanced. Accordingly, we observe an excess of intermediate frequency variants in the SFS [11]. As the equilibrium frequencies move away from 0.5, the average sample configuration changes and the coalescent tree becomes less balanced (see S2 Fig). Therefore, the number of SNPs at intermediate frequencies drops and Tajima’s D decreases (S10 Fig). This link can be also observed when we modify our model to more realistic but complex models by either a) extending the model to more than two parasite generations per host generation (**Model B**, S17 and S19 Figs a+b) or b) allowing for allo-infections at rate 1 − *ψ* in the second parasite generation within host generation *g* (**Model C**, S18 and S19 Figs c+d).

There are three sources of stochasticity affecting the polymorphism data at the coevolutionary loci: 1) The effect of genetic drift on the allele frequency trajectory under coevolution (S1 Fig), 2) the stochasticity in the coalescent process for a given allele frequency trajectory and 3) the stochasticity in the neutral mutation process on top of the coalescent process. As the first type of stochasticity affects the ‘population’ sizes of the functional alleles in the host (in the parasite) over time, it also has a subsequent effect on the other two sources of stochasticity. Using data from several repetitions allows to better handle and to average out the effect of genetic drift on the variability of the allele frequency path and its subsequent effect on the observed summary statistics. Therefore we use the average of the summary statistics over several repetitions of the same coevolutionary history (*i.e*. *r* frequency paths) in our ABC. In future, such repeated data could be potentially obtained for example from microcosm experiments or repeated experiments such as performed by [48–51]). However, we acknowledge that for species with long generation times it might be extremely hard to perform the corresponding experiments. Nevertheless, the fact that we rely on data from repeated experiments also provides useful insights in a way that signatures under coevolution can be quite variable even if the parameters underlying the coevolutionary history are the same. This underlines the need for taking the effect of genetic drift into account when analysing host and parasite polymorphism data be it to detect loci under coevolution or to infer parameters of the past coevolutionary history.

Stemming from theoretical considerations [52], we envision two further possibilities to deal with the variability in allele frequency trajectories if data from repeated experiments cannot be obtained: a) using data from several populations or b) using data from several time points. As whole-genome sequencing data is becoming less expensive the information contained in the whole genome data from several host and parasite populations could be used to establish the demographic history such as migration rates and population sizes before inferring the coevolutionary parameters. Taken the demographic history into account, the coevolutionary parameters in each population could be inferred in a second step following our approach. We follow here the expectations from [52] that several populations give insights into the coevolutionary dynamics as these are found to be a different points of the coevolutionary cycles. However, this approach would rely on migration rates being not too high and that the environmental conditions and host-parasite interactions are similar across the different populations. If the migration rates are too high the independence between the different populations would be violated and the allele frequencies become synchronized between the different populations [52]. With a similar idea, whole genome-data obtained at different time points in a population [49, 50] would allow for estimating the amount of genetic drift between time-points (see [53]) and thereafter the coevolutionary parameters using our method with known prior on population sizes.

### Accuracy of inference

We first perform a model choice procedure for each scenario to assess the possibility to distinguish pairs of loci which are coevolving from pairs evolving independently from one another (in our case neutrally in each species). Therefore, we envision that the gene dataset can be divided into two categories of genes in hosts and parasites *a priori*: pairs of candidate loci possibly under coevolution, and pairs of other randomly selected genes. For example, the candidates can be resistance genes in the host plant [4–6, 12–16] and the corresponding predicted effectors in the parasite [17, 18]. The second category can be composed of genes involved in processes such as housekeeping, abiotic stress responses or photosynthesis in plants, and housekeeping genes and/or degrading enzymes with non-specific activities in parasites.

It is encouraging that our results show very good accuracy and low False Positive rates. Interestingly, the model choice accuracy is very good for low values of the infection parameter *s*, and thus we are more likely to identify pairs of loci which are coevolving under trench-warfare than under arms race dynamics. We show thus that in contrast to the somehow pessimistic view in [19] based on few statistics, extending the number of summary statistics does help to distinguish neutral from coevolving loci.

Regarding parameter inference, we show that estimations of parameters governing the coevolutionary dynamics is possible if they substantially shift the equilibrium frequencies and/or the dynamics and thus, the resulting genetic signatures. However, equilibrium frequencies can be shifted along the same axis by different parameter combinations. In such circumstances, it is only possible to infer a compound parameter if there is no *a priori* information on any of the parameters available. This identifiability problem is illustrated by the inference results for scenario 2 especially when only parasite polymorphism data are available (Figs 6 and 7). Here, both the cost of infection (*s*) and the cost of resistance (*c*_{H}) are overestimated. If however some parameter values are *a priori* known from experiments such as the cost of resistance in scenario 1, the other parameters (here the cost of infection) can be inferred conditional on this information. Whenever the parameters of interest have different effects on the equilibrium frequencies in the host and parasite, inference of both parameters is possible. This explains why inferences are usually the most accurate when host and parasite statistics are jointly used.

Our approach of jointly using host and parasite information is in line with recent method developments [54–56] which also show the value of analysing hosts and parasite in a joint framework. These mentioned methods can also be promising complementary approaches to our ABC in order to identify and to reduce the number of candidate loci.

### Additional demographic changes

An important assumption of our model is the absence of intra-locus recombination at the coevolutionary loci. Nevertheless, recombination does occur along the genomes of the host and the parasite, so that the coevolutionary loci evolve independently from other unlinked loci (for example on different chromosomes).

In such circumstances, it is possible to estimate past population size fluctuations based on whole-genome data of both species. Population size changes in host-parasite coevolution can be either independent of the coevolutionary interaction or arise as an immediate result of coevolutionary interaction, e.g. from epidemiological feedback or any other form of eco-evolutionary feedback. Independently of the particular source, demographic changes always affect all loci in the genome simultaneously. The genomic resolution of the latter type of population size changes has been shown to depend on the amplitude and time-scales of the population size fluctuations [53]. These authors have demonstrated that populations size fluctuations only leave a signature in the genome-wide parasite site frequency spectrum if they happen at a slow enough time scale. Irrespective of whether the demographic changes can be resolved from genome-wide data, the resulting genetic signatures at the coevolving loci will be always the result of underlying allele frequency path which is always confined to a 2d-plane for a bi-allelic locus. Further studies should therefore focus on the specific effect of eco-evolutionary feedback on the variability of the allele frequency path and the resulting effect of the population size changes on mutation supply at the coevolving loci. Doing so will help to refine our understanding how much information can be likely inferred under such circumstances.

### Scope, implications and applications of the presented approach

Based on the genetic signatures found for our two model extensions (S19 Fig) [24], we suggest that our findings are generally valid and are not restricted to the coevolution GFG-model used in the main text. We acknowledge that we assume the most simple type of coevolutionary interaction possible and strongly rely on data from repeated experiments. However, understanding possible links between dynamics, signatures and resulting accuracy of inference for this simple scenario is a useful starting point to develop further inference methods where several major loci [7] or quantitative traits [39] are involved. We further hope that our results stimulate further thinking on how genomic data from the host and parasite in combination with other sources of data such as phenotypic data, data from several populations, or several time-points could be used to gain a refined understanding of host-parasite coevolution. In addition, our approach should be applicable to several pairs of host and parasite coevolving loci as long as the coevolutionary dynamics are driven by few major loci without any epistatic and/or pleiotropic effect. These pairs could for example involve resistance genes from a single host species, each co-evolving independently with effectors from different parasite species (bacteria, fungi, …). If quantitative traits [7, 39] are involved into coevolution we expect the signatures to be weaker than in our model (see theory on polygenic selection and polymorphism signatures, [57]).

For many host-parasite models (including the one used here) it has been shown that the equilibrium frequencies in the host are substantially or exclusively affected by fitness penalties applying to the parasite and vice-versa [24, 33, 34]. Thus generally speaking, the strength of genetic signatures in either species are presumably most indicative about processes affecting the coevolving partner. We therefore speculate, that the balancing selection signatures which have been found at R-genes in *Arabidopsis thaliana* [6, 12] [13], *Solanum sp*. [4, 5, 14], *Phaseolus vulgaris* [58], *Capsella* [59], are indicative about the selective pressure in the coevolving parasite or parasite community. Conversely, the long term maintenance of strains in *Pseudomonas syringae* [60] could reflect fitness costs in *Arabidopsis thaliana*.

A complication for analysis can be the lack of recombination in genomes of microparasites such as viruses or bacteria. Phylogenetic methods exist to study the evolution of these parasites with very short generation time, and can allow to define groups of individuals or populations which could be used in inference methods such as ours or in co-GWAs [54–56, 61]. Note also that several methods have been developed to draw inference of the epidemiological parameters based on parasite sequence data (*e.g*. [62]). However, such methods study only short term epidemiological dynamics within few years, ignoring the effect of coevolution and Genotype (host) x Genotype (parasite) interactions. By contrast, our method intends to infer the parameters of long term coevolutionary dynamics driven by GxG interactions.

We finally point out a potential source of bias in coevolution studies, namely the possibility that scenarios other than coevolution, such as for example unilateral or independent adaptive evolution of hosts and/or parasites, can result in correlation of traits or allele frequencies which resemble those of true coevolution [63, 64]. At present, our approach cannot control for independent evolution at host and parasite loci generating polymorphism signatures mimicking that of true coevolution. Such pairs of loci would indeed inflate our false positive rate. We speculate here that based on polymorphism data from a single experimental data point alone, it is likely not possible to disentangle such scenarios from true coevolution. Nevertheless, using repeated experiments with and without the coevolution treatment [50, 51] along with molecular, functional and bioinformatics studies may help to resolve this issue in some host-parasite systems.

### Conclusion

We investigated here a link between coevolutionary dynamics and resulting genetic signatures and quantify the amount of information available in polymorphism data from the coevolving loci. Although, we started from a very simple coevolutionary interaction we show that model-based inference is possible if data from repeated experiments are available. With growing availability of highly resolved genome data, even of non-model species, it is important to gain a differentiated and deep understanding of the continuum of possible links between coevolutionary dynamics without or with eco-evolutionary feedbacks and their effect on polymorphism data. Such thorough understanding is the basis for devising appropriate sampling schemes, for using optimal combinations of diverse sources of information and for developing model-based refined inference methods. Our results and the suitability of the ABC approach open the door to further develop inference of past coevolutionary history based on genome-wide data of hosts and parasites from natural populations or controlled experiments. Lastly, as the false positive rate to detect genes under coevolution is smaller than 2.5% (*r* = 30) under the model choice procedure, our method can be used in combination with other methods such as co-GWAS or correlation of host and parasite allele frequencies among several populations as a starting point to identify host and parasite candidate loci for further functional studies.

## Supporting information

### S1 Fig. Schematic of the forces driving coevolutionary cycles, unstable vs. stable equilibrium points and the interaction of coevolution with genetic drift.

Schematic illustrating the evolutionary forces and interactions driving the coevolutionary cycles on top. The different coevolutionary dynamics in infinite population in interaction with genetic drift are shown on the bottom. The grey line always shows the expected dynamics in infinite population size over time. The frequency of the resistant allele in the host is always shown on the x-axis, the frequency of the infectivity allele in the parasite on the y-axis. The triangles are the unstable/stable internal equilibrium points of the model. The effect of genetic drift on the allele frequency path due to finite population size is always shown in color. Bottom left: Model with arms-race dynamics in infinite and finite population size. Bottom middle: Model with trench-warfare dynamics in infinite population size and arms-race dynamics in finite population size. Bottom right: Model with trench-warfare dynamics in infinite and finite population size.

https://doi.org/10.1371/journal.pcbi.1007668.s001

(TIF)

### S2 Fig. Coevolution dynamics in infinite population size, finite population size and site frequency spectra for Model A.

Influence of the cost of infection (*s*) on the coevolutionary dynamics and genetic signatures in **Model A**. The subfigures show the allele frequency trajectory in infinite population size (a-f, A-F), one exemplary allele frequency path in finite population size which takes genetic drift and functional mutations into account (d-f, D-F), the average unfolded host site frequency spectrum of *r* = 200 repetitions (I-VI) and the average unfolded parasite site frequency spectrum of *r* = 200 repetitions (VII-XII). In subfigures a-l each dot represents the frequency of resistant (*RES*) hosts (x-axis) and infective (*INF*) parasites (y-axis) at the beginning of a single host generation *g*. The same information is displayed in a slightly different way in subfigures A-L. Here, the frequencies of resistant (*RES*) hosts (light grey) and infective (*INF*) parasites (dark grey) (y-axis) are plotted over time (x-axis). Costs are fixed to *c*_{H} = 0.05, *c*_{P} = 0.1. The results in finite population size are plotted for *N*_{H} = *N*_{P} = 10, 000, *μ*_{Rtor} = *μ*_{ntoI} = *μ*_{rtoR} = *μ*_{Iton} = 10^{−5}. The site frequency spectra are shown for *θ*_{P} = *θ*_{H} = 5 and *n*_{H} = *n*_{P} = 50.

https://doi.org/10.1371/journal.pcbi.1007668.s002

(TIF)

### S3 Fig. Deterministic equilibrium frequencies Model A.

Deterministic equilibrium frequencies for model A for different combinations of cost of resistance *c*_{H} = (0.05, 0.1) (columns), cost of infectivity *c*_{P} = (0.1, 0.3) (rows) and cost of infection *s* = (0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8) (color of the squares). Only parameter combinations with trench-warfare dynamics are shown. Centres of the dots represent the stable equilbrium frequencies obtained by simulating numerically the recursion equations Eq (2) for 30,000 generations starting with an initial frequency of *R*_{0} = 0.2 resistant hosts and *a*_{0} = 0.2 infective parasites. Heads of the arrows represent the equilibrium frequencies based on Eq (3) which slightly differ from the numerical computations due to analytical approximations.

https://doi.org/10.1371/journal.pcbi.1007668.s003

(TIF)

### S4 Fig. Tajima’s D Model A varying popsizes.

Tajima’s D (y-axis) for **Model A** for various cost of infection *s* (x-axis) and different combinations of *N*_{P} (*N*_{P} = 5, 000 top, *N*_{P} = 10, 000 middle, *N*_{P} = 15, 000 bottom) and *N*_{H} (*N*_{H} = 5, 000 left, *N*_{H} = 10, 000 middle, *N*_{H} = 15, 000 right). The mean and standard error of Tajima’s D of the parasite population (dark grey) and of the host population (light grey) are plotted for *r* = 200 repetitions. Note that subfigure *e* corresponds to S9a Fig. The other parameters are fixed to: *c*_{H} = 0.05, *c*_{P} = 0.1, *θ*_{H} = *N*_{H}/2000, *θ*_{P} = *N*_{P}/2000, *n*_{H} = *n*_{P} = 50, *μ*_{Rtor} = *μ*_{rtoR} = *μ*_{ntoI} = *μ*_{Iton} = 10^{−5}.

https://doi.org/10.1371/journal.pcbi.1007668.s004

(TIF)

### S5 Fig. Cross-validation model choice scenario 1 for r = 30 repetitions.

Leave-on-out cross-validation result for distinguishing the coevolution model with unknown costs of infection (*s*), host population size (*N*_{H}) and parasite population size (*N*_{P}) from a neutral model with a unknown host and parasite population sizes. Cross-validation results are shown for *r* = 30 and are based on 500 randomly chosen ABC-simulations for each model.

https://doi.org/10.1371/journal.pcbi.1007668.s005

(TIF)

### S6 Fig. Cross-validation model choice scenario 1 for r = 10 repetitions.

Leave-on-out cross-validation result for distinguishing the coevolution model with unknown costs of infection (*s*), host population size (*N*_{H}) and parasite population size (*N*_{P}) from a neutral model with unknown host and parasite population sizes. Cross-validation results are shown for *r* = 10 and are based on 500 randomly chosen ABC-simulations for each model.

https://doi.org/10.1371/journal.pcbi.1007668.s006

(TIF)

### S7 Fig. Model choice results for PODs from scenario 1 for r = 10 repetitions.

Model choice results for scenario 1 for *r* = 10. Model choice has been run to distinguish a coevolution model with unknown costs of infection (*s*), host population size (*N*_{H}) and parasite population size (*N*_{P}) from a neutral model with unknown host and parasite population sizes. Model choice is shown for *r* = 30 repetitions and based on the 1% simulations having the closest summary statistics to those of the PODs. The posterior probability in support of the coevolution model (y-axis) is shown for PODs with different costs of infection (*s*) (30 PODs for each *s*). Results for single PODs are shown as dots. Note that for these points we added some jitter to the x-values in order to increase the readability of the plots.

https://doi.org/10.1371/journal.pcbi.1007668.s007

(TIF)

### S8 Fig. Pairwise Manhattan distance and Δ Tajima’s D (host-parasite) for the PODs under scenario 1 compared to simulations under a neutral model for r = 10.

Pairwise Manhattan distance (x-axis) and the difference between Tajima’s D of the host and of the parasite (y-axis) for the PODs used for inference in Scenario 1 and the 100,000 neutral simulations run for this scenario. Under the neutral model, host and parasite population sizes vary. Simulations under the neutral model are shown as grey open circles, and a bivariate normal kernel estimation has been applied to obtain a probability density of the summary statistic combinations. The PODs for scenario 1 are shown as diamonds and are coloured coded based on the true cost of infection (*s*).

https://doi.org/10.1371/journal.pcbi.1007668.s008

(TIF)

### S9 Fig. Inference results Scenario 1 for r = 10.

Median of the posterior distribution (y-axis) for the cost of infection *s* (top, a-c), host population size (*N*_{H}) (middle, d-f) and parasite population size (*N*_{P}) (bottom, g-i) when inference is based on host and parasite summary statistics (left), only host summary statistics (middle) or only parasite summary statistics (right) for scenario 1. The median of the posterior distribution (after post-rejection adjustment) is plotted for each POD in scenario 1. The true cost of infection for each POD is shown on the x-axis with jitter added to increase the readability.

https://doi.org/10.1371/journal.pcbi.1007668.s009

(TIF)

### S10 Fig. Tajima’s D Model A for different costs of infection, resistance and infectivity.

Tajima’s D (y-axis) for model A for various cost of infection *s* (x-axis). The results are shown for different combinations of *c*_{P} (*c*_{P} = 0.1 top, *c*_{P} = 0.3 bottom) and *c*_{H} (*c*_{H} = 0.05 left, *c*_{H} = 0.1 right). The mean and standard error of Tajima’s D of the parasite population (dark grey) and of the host population (light grey) are plotted for *r* = 200 repetitions. The dashed-dotted line shows the expected value of Tajima’s D in a Wright-Fisher population with constant population size. Tajima’s ≪ 0 is an indicator of selective sweeps Tajima’s D ≫ 0 is an indicator of balancing selection. The other parameters are fixed to: *N*_{H} = *N*_{P} = 10, 000, *n*_{H} = *n*_{P} = 50, *θ*_{H} = *θ*_{P} = 5, *μ*_{Rtor} = *μ*_{rtoR} = *μ*_{ntoI} = *μ*_{Iton} = 10^{−5}.

https://doi.org/10.1371/journal.pcbi.1007668.s010

(TIF)

### S11 Fig. Cross-validation model choice scenario 2 for r = 30 repetitions.

Leave-on-out cross-validation result for distinguishing the coevolution model with unknown costs of infection (*s*), cost of resistance (*c*_{H}) and cost of infectivity (*N*_{P}) from a neutral model constant host and parasite population sizes (*N*_{H} = *N*_{P} = 10, 000). Cross-validation results are shown for *r* = 30 and are based on 500 randomly chosen ABC-simulations for each model.

https://doi.org/10.1371/journal.pcbi.1007668.s011

(TIF)

### S12 Fig. Cross-validation model choice scenario 2 for r = 10 repetitions.

Leave-on-out cross-validation result for distinguishing the coevolution model with unknown costs of infection (*s*), cost of resistance (*c*_{H}) and cost of infectivity (*N*_{P}) from a neutral model constant host and parasite population sizes (*N*_{H} = *N*_{P} = 10, 000). Cross-validation results are shown for *r* = 10 and are based on 500 randomly chosen ABC-simulations for each model.

https://doi.org/10.1371/journal.pcbi.1007668.s012

(TIF)

### S13 Fig. Posterior probability in support of the coevolution model (against a neutral model) for scenario 2.

Results are shown for *r* = 10 and 15 PODs per boxplot. The posterior density in support of the coevolution model (y-axis) is shown for PODs with varying cost of infection (*s*). The different panels reflect the combination of *c*_{H} and *c*_{P} for the respective PODs (left: *c*_{H} = 0.05, right: *c*_{H} = 0.1, top: *c*_{P} = 0.1, bottom: *c*_{P} = 0.3). Model choice has been run to distinguish a coevolution model with unknown costs of infection (*s*), cost of resistance (*c*_{H}) and cost of infectivity (*c*_{P}) from a neutral model with constant host and parasite population size (*N*_{H} = *N*_{P} = 10, 000). Results for single PODs are shown as dots and jitter added to the x-values to increase the readability.

https://doi.org/10.1371/journal.pcbi.1007668.s013

(TIF)

### S14 Fig. Pairwise Manhattan distance and Δ Tajima’s D (host-parasite) for the PODs under scenario 2 compared to simulations under a neutral model for r = 10.

Pairwise Manhattan distance (x-axis) and the difference between Tajima’s D of the host and of the parasite (y-axis) for the PODs used for inference in Scenario 2 and 100,000 neutral simulations. Simulations under the neutral model are shown as grey open circles. A bivariate normal kernel estimation has been applied to obtain a probability density of the different summary statistic combinations. The PODs for scenario 2 are shown in color. Colors reflect the true cost of infection (*s*) for a particular POD (see legend) and shapes indicate the combination of *c*_{H} and *c*_{P} (diamonds: *c*_{H} = 0.05, *c*_{P} = 0.1; circles: *c*_{H} = 0.05, *c*_{P} = 0.3; crosses: *c*_{H} = 0.01, *c*_{P} = 0.1; stars: *c*_{H} = 0.1, *c*_{P} = 0.3) for the respective POD.

https://doi.org/10.1371/journal.pcbi.1007668.s014

(TIF)

### S15 Fig. Inference results Scenario 2 for r = 10.

Median of the posterior distribution (y-axis) for the cost of infection *s* (top, a-c), cost of resistance (*c*_{H}) (middle, d-f) and cost of infectivity (*c*_{P}) (bottom, g-i) when inference is based on host and parasite summary statistics (left), only host summary statistics (middle) or only parasite summary statistics (right) for scenario 2. The median of the posterior distribution (after post-rejection adjustment) is plotted for each POD in scenario 2. The true cost of infection for each POD is shown on the x-axis with jitter added to increase the readability. The *R*^{2}-value of a corresponding linear regression model is shown in each panel.

https://doi.org/10.1371/journal.pcbi.1007668.s015

(TIF)

### S16 Fig. Inference results Scenario 2 for r = 10.

Median of the posterior distribution (y-axis) for the cost of infection *s* (top, a-c), cost of resistance (*c*_{H}) (middle, d-f) and cost of infectivity (*c*_{P}) (bottom, g-i) when inference is based on host and parasite summary statistics (left), only host summary statistics (middle) or only parasite summary statistics (right) for scenario 2. The median of the posterior distribution (after post-rejection adjustment) is plotted for each POD in scenario 2. The true cost of infection for each POD is shown on the x-axis with jitter added to increase the readability. The *R*^{2}-value of a corresponding linear regression model is shown in each panel.

https://doi.org/10.1371/journal.pcbi.1007668.s016

(TIF)

### S17 Fig. Equilibrium frequencies Model B.

Deterministic equilibrium frequencies for **Model B** for a) *T* = 5 parasite generations (left) and b) *T* = 10 parasite generations (right) per host generation. The equilibrium frequencies for different combinations of cost of resistance *c*_{H} = (0.05, 0.1) (columns), cost of infectivity *c*_{P} = (0.1, 0.3) (rows) and cost of infection *s* = (0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8) (color of the squares) are shown. Only combinations with trench-warfare dynamics are shown. Centres of the squares represent the equilbrium frequencies obtained by simulating numerically the recursion equations in S1 File for *g*_{max} = 30, 000 host generations starting with an initial frequency of *R*_{0} = 0.2 resistant hosts and *a*_{0} = 0.2 infective parasites.

https://doi.org/10.1371/journal.pcbi.1007668.s017

(TIF)

### S18 Fig. Equilibrium frequencies Model C.

Deterministic equilibrium frequencies for **Model C** (auto-allo-infection model) with *T* = 2 parasite generations per host generation and *ψ* = 0.95. The equilibrium frequencies for different combinations of cost of resistance *c*_{H} = (0.05, 0.1) (columns), cost of infectivity *c*_{P} = (0.1, 0.3) (rows) and cost of infection *s* = (0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8) (color of the squares) are shown. Only combinations which result in trench-warfare dynamics are plotted. Centres of the squares represent the equilbrium frequencies obtained by simulating numerically the recursion equations in S1 File for *g*_{max} = 30, 000 host generations starting with an initial frequency of *R*_{0} = 0.2 resistant hosts and *a*_{0} = 0.2 infective parasites. Heads of the arrows represent the equilibrium frequencies based on Eq (3) which corresponds to the case *ψ* = 1 [24].

https://doi.org/10.1371/journal.pcbi.1007668.s018

(TIF)

### S19 Fig. Tajima’s D and pairwise manhattan distance Model B and C.

Mean and standard error of Tajima’s D (a+c) and pairwise manhattan distance (PMD) (b+d) for various costs of infection *s* (x-axis) and *r* = 200 repetitions. Results for **Model B** (pure autoinfection model with *T* = 5 and *T* = 10) are shown at the top, results for **Model C** (auto-allo-infection model with *ψ* = 0.95) are shown at the bottom. The other parameters are fixed to: *c*_{H} = 0.05 and *c*_{P} = 0.1. Initial frequencies *R*_{0} and *a*_{0} in *a* and *b* are chosen randomly from a uniform distribution between 0 and 1 while *R*_{0} = *a*_{0} = 0.2 in *c* and *d*.

https://doi.org/10.1371/journal.pcbi.1007668.s019

(TIF)

### S20 Fig. Pairwise Manhattan distance and Δ Tajima’s D (host-parasite) for PODs with low costs of infection (*s* = {0.01 − 0.09}) under scenario 1 compared to simulations under a neutral model for r = 30.

Pairwise Manhattan distance (x-axis) and the difference between Tajima’s D of the host and of the parasite (y-axis) for the PODs used for inference in Scenario 1 and the 100,000 neutral simulations run for this scenario. Under the neutral model, host and parasite population sizes vary. Simulations under the neutral model are shown as grey open circles, and a bivariate normal kernel estimation has been applied to obtain a probability density of the summary statistic combinations. The PODs for scenario 1 are shown as diamonds and are coloured coded based on the true cost of infection (*s*).

https://doi.org/10.1371/journal.pcbi.1007668.s020

(TIF)

### S21 Fig. Pairwise Manhattan distance and Δ Tajima’s D (host-parasite) for PODs with low costs of infection (*s* = {0.01 − 0.09}) under scenario 1 compared to simulations under a neutral model for r = 10.

*s*).

https://doi.org/10.1371/journal.pcbi.1007668.s021

(TIF)

### S22 Fig. Pairwise Manhattan distance and Δ Tajima’s D (host-parasite) for PODs with low costs of infection (*s* = {0.01 − 0.09}) under scenario 2 compared to simulations under a neutral model for r = 30.

Pairwise Manhattan distance (x-axis) and the difference between Tajima’s D of the host and of the parasite (y-axis) for the PODs used for inference in Scenario 2 and 100,000 neutral simulations. Simulations under the neutral model are shown as grey open circles. A bivariate normal kernel estimation has been applied to obtain a probability density of the different summary statistic combinations. The PODs for scenario 2 are shown in color. Colors reflect the true cost of infection (*s*) for a particular POD (see legend) and shapes indicate the combination of *c*_{H} and *c*_{P} (diamonds: *c*_{H} = 0.05, *c*_{P} = 0.1; circles: *c*_{H} = 0.05, *c*_{P} = 0.3; crosses: *c*_{H} = 0.01, *c*_{P} = 0.1; stars: *c*_{H} = 0.1, *c*_{P} = 0.3) for the respective POD.

https://doi.org/10.1371/journal.pcbi.1007668.s022

(TIF)

### S23 Fig. Pairwise Manhattan distance and Δ Tajima’s D (host-parasite) for PODs with low costs of infection (*s* = {0.01 − 0.09}) under scenario 2 compared to simulations under a neutral model for r = 30.

*s*) for a particular POD (see legend) and shapes indicate the combination of *c*_{H} and *c*_{P} (diamonds: *c*_{H} = 0.05, *c*_{P} = 0.1; circles: *c*_{H} = 0.05, *c*_{P} = 0.3; crosses: *c*_{H} = 0.01, *c*_{P} = 0.1; stars: *c*_{H} = 0.1, *c*_{P} = 0.3) for the respective POD.

https://doi.org/10.1371/journal.pcbi.1007668.s023

(TIF)

### S1 File. Additional information on coevolutionary models.

https://doi.org/10.1371/journal.pcbi.1007668.s024

(PDF)

### S2 File. Details Pairwise Manhattan Distance (PMD).

https://doi.org/10.1371/journal.pcbi.1007668.s025

(PDF)

## Acknowledgments

We thank Amandine Cornille and Cas Retel for helpful comments on early versions of the manuscript and Lukas Heinrich for performing preliminary analyses.

## References

- 1. Thrall PH, Laine AL, Ravensdale M, Nemri A, Dodds PN, Barrett LG, et al. Rapid genetic change underpins antagonistic coevolution in a natural host-pathogen metapopulation. Ecol Lett. 2012;15(5):425–435. pmid:22372578
- 2. Decaestecker E, Gaba S, Raeymaekers JAM, Stoks R, Van Kerckhoven L, Ebert D, et al. Host-parasite’Red Queen’ dynamics archived in pond sediment. Nature. 2007;450(7171):870–U16. pmid:18004303
- 3. Martiny JBH, Riemann L, Marston MF, Middelboe M. Antagonistic Coevolution of Marine Planktonic Viruses and Their Hosts. Annu Rev Mar Sci. 2014;6(1):393–414.
- 4.
Rose LE, Michelmore RW, Langley CH. Natural variation in the Pto disease resistance gene within species of wild tomato (
*Lycopersicon*). II. Population genetics of Pto. Genetics. 2007;175(3):1307–1319. pmid:17179076 - 5. Hoerger AC, Ilyas M, Stephan W, Tellier A, van der Hoorn RAL, Rose LE. Balancing selection at the Tomato RCR3 guardee gene family maintains variation in strength of pathogen defense. PLOS Genet. 2012;8(7).
- 6.
Stahl EA, Dwyer G, Mauricio R, Kreitman M, Bergelson J. Dynamics of disease resistance polymorphism at the Rpm1 locus of
*Arabidopsis*. Nature. 1999;400(6745):667–671. pmid:10458161 - 7. Shin J, MacCarthy T. Antagonistic Coevolution Drives Whack-alpha-Mole Sensitivity in Gene Regulatory Networks. PLOS Comput Biol. 2015;11(10). pmid:26451700
- 8. Maynard Smith J, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res. 1974;23(1):23–35.
- 9. Woolhouse M, Webster J, Domingo E, Charlesworth B, Levin B. Biological and biomedical implications of the co-evolution of pathogens and their hosts. Nat Genet. 2002;32(4):569–577. pmid:12457190
- 10. Holub E. The arms race is ancient history in Arabidopsis, the wildflower. Nat Rev Genet. 2001;2(7):516–527. pmid:11433358
- 11. Charlesworth B, Nordborg M, Charlesworth D. The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genet Res. 1997;70(2):155–174. pmid:9449192
- 12.
Bakker EG, Toomajian C, Kreitman M, Bergelson J. A genome-wide survey of R gene polymorphisms in
*Arabidopsis*. Plant Cell. 2006;18(8):1803–1818. pmid:16798885 - 13. Karasov TL, Kniskern JM, Gao L, DeYoung BJ, Ding J, Dubiella U, et al. The long-term maintenance of a resistance polymorphism through diffuse interactions. Nature. 2014;512(7515):436–U472. pmid:25043057
- 14.
Caicedo A, Schaal B. Heterogeneous evolutionary processes affect R gene diversity in natural populations of
*Solanum pimpinellifolium*. P Natl Acad Sci USA. 2004;101(50):17444–17449. - 15. Obbard DJ, Jiggins FM, Bradshaw NJ, Little TJ. Recent and Recurrent Selective Sweeps of the Antiviral RNAi Gene Argonaute-2 in Three Species of Drosophila. Mol Biol Evol. 2011;28(2):1043–1056. pmid:20978039
- 16. Stam R, Silva-Arias GA, Tellier A. Subsets of NLR genes show differential signatures of adaptation during colonization of new habitats. New Phytol. 2019;224(1):367–379. pmid:31230368
- 17. Schweizer G, Muench K, Mannhaupt G, Schirawski J, Kahmann R, Dutheil JY. Positively Selected Effector Genes and Their Contribution to Virulence in the Smut Fungus Sporisorium reilianum. Genome Biol Evol. 2018;10(2):629–645. pmid:29390140
- 18. Sánchez-Vallet A, Fouché S, Fudal I, Hartmann FE, Soyer JL, Tellier A, et al. The Genome Biology of Effector Gene Evolution in Filamentous Plant Pathogens. Ann Rev Phytopathol. 2018;56(1):21–40.
- 19. Tellier A, Moreno-Gamez S, Stephan W. Speed of adaptation and genomic footprints of host-parasite coevolution under arms race and trench warfare dynamics. Evolution. 2014;68(8):2211–2224. pmid:24749791
- 20. Gokhale CS, Papkou A, Traulsen A, Schulenburg H. Lotka-Volterra dynamics kills the Red Queen: population size fluctuations and associated stochasticity dramatically change host-parasite coevolution. BMC Evol Biol. 2013;13. pmid:24252104
- 21. Ejsmond MJ, Radwan J. Red Queen Processes Drive Positive Selection on Major Histocompatibility Complex (MHC) Genes. PLOS Comput Biol. 2015;11(11). pmid:26599213
- 22. Salathe M, Scherer A, Bonhoeffer S. Neutral drift and polymorphism in gene-for-gene systems. Ecol Lett. 2005;8(9):925–932.
- 23. Seger J. Dynamics of some simple host-parasite models with more than 2 genotypes in each species. P Roy Soc B-Biol Sci. 1988;319(1196):541–555.
- 24. Tellier A, Brown JKM. Stability of genetic polymorphism in host-parasite interactions. P Roy Soc B-Biol Sci. 2007;274(1611):809–817.
- 25. Kwiatkowski M, Engelstaedter J, Vorburger C. On Genetic Specificity in Symbiont-Mediated Host-Parasite Coevolution. PLOS Comput Biol. 2012;8(8). pmid:22956894
- 26. Agrawal A, Lively C. Infection genetics: gene-for-gene versus matching-alleles models and all points in between. Evol Ecol Res. 2002;4(1):79–90.
- 27. Engelstaedter J. Host-Parasite coevolutionary dynamics with generalized success/failure infection genetics. Am Nat. 2015;185(5):E117–E129.
- 28. Flor HH. Current Status of the Gene For Gene Concept. Annu Rev Phytopathol. 1971;9:275–296.
- 29.
Kraaijeveld A, Godfray H. Trade-off between parasitoid resistance and larval competitive ability in
*Drosophila melanogaster*. Nature. 1997;389(6648):278–280. pmid:9305840 - 30. Bergelson J, Purrington C. Surveying patterns in the cost of resistance in plants. Am Nat. 1996;148(3):536–558.
- 31.
Lenski R. Experimental studies of pleiotropy and epistasis in
*Escherichia-Coli*.1. Variation in competitive fitness among mutants resistant to Virus-T4. Evolution. 1988;42(3):425–432. pmid:28564005 - 32. Thrall P, Burdon J. Evolution of virulence in a plant host-pathogen metapopulation. Science. 2003;299(5613):1735–1737. pmid:12637745
- 33. Leonard KJ. Stability of equilibria in a gene-for-gene coevolution model of host-parasite interactions. Phytopathology. 1994;84(1):70–77.
- 34. Frank S. Models of plant pathogen coevolution. Trends Genet. 1992;8(6):213–219. pmid:1496557
- 35. Csillery K, Blum MGB, Gaggiotti OE, Francois O. Approximate Bayesian Computation (ABC) in practice. Trends Ecol Evol. 2010;25(7):410–418. pmid:20488578
- 36. Beaumont MA, Zhang WY, Balding DJ. Approximate Bayesian Computation in population genetics. Genetics. 2002;162(4):2025–2035. pmid:12524368
- 37. Sunnaker M, Busetto AG, Numminen E, Corander J, Foll M, Dessimoz C. Approximate Bayesian Computation. PLOS Comput Biol. 2013;9(1). pmid:23341757
- 38. Tian D, Traw M, Chen J, Kreitman M, Bergelson J. Fitness costs of R-gene-mediated resistance in Arabidopsis thaliana. Nature. 2003;423(6935):74–77. pmid:12721627
- 39. Nuismer SL, Week B. Approximate Bayesian estimation of coevolutionary arms races. PLOS Comput Biol. 2019;15(4).
- 40. Ashby B, Iritani R, Best A, White A, Boots M. Understanding the role of eco-evolutionary feedbacks in host-parasite coevolution. J Theor Biol. 2019;464:115–125. pmid:30586552
- 41. Kirby GC, Burdon JJ. Effects of mutation and random drift on Leonard’s gene-for-gene coevolution model. Phytopathology. 1997;87(5):488–493. pmid:18945102
- 42. Ewing G, Hermisson J. MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics. 2010;26(16):2064–2065. pmid:20591904
- 43. Nordborg M. Structured coalescent processes on different time scales. Genetics. 1997;146(4):1501–1514. pmid:9258691
- 44. Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L. ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics. 2010;11. pmid:20202215
- 45. Csillery K, Francois O, Blum MGB. abc: an R package for approximate Bayesian computation (ABC). Methods Ecol Ecvol. 2012;3(3):475–479.
- 46. Leuenberger C, Wegmann D. Bayesian Computation and Model Selection Without Likelihoods. Genetics. 2010;184(1):243–252. pmid:19786619
- 47. Brown JKM, Tellier A. Plant-parasite coevolution: Bridging the gap between genetics and ecology. Annu Rev Phytopathol. 2011;49:345–367. pmid:21513455
- 48. Hall AR, Scanlan PD, Morgan AD, Buckling A. Host-parasite coevolutionary arms races give way to fluctuating selection. Ecol Lett. 2011;14:635–642. pmid:21521436
- 49. Frickel J, Sieber M, Becks L. Eco-evolutionary dynamics in a coevolving host-virus system. Ecol Lett. 2016;19(4):450–459. pmid:26898162
- 50. Retel C, Kowallik V, Huang W, Werner B, Künzel S, Becks L, et al. The feedback between selection and demography shapes genomic diversity during coevolution. Science Advances. 2019;5(10). pmid:31616788
- 51. Papkou A, Guzella T, Yang W, Koepper S, Pees B, Schalkowski R, et al. The genomic basis of Red Queen dynamics during rapid reciprocal host–pathogen coevolution. P Natl Acad Sci USA. 2019;116(3):923–928.
- 52. Gandon S, Buckling A, Decaestecker E, Day T. Host-parasite coevolution and patterns of adaptation across time and space. Journal of Evolutionary Biology. 2008;21(6):1861–1866. pmid:18717749
- 53. Živković D, John S, Verin M, Stephan W, Tellier A. Neutral genomic signatures of host-parasite coevolution. BMC Evol Biol. 2019;19:230.
- 54. MacPherson A, Otto SP, Nuismer SL. Keeping pace with the Red Queen: Identifying the genetic basis of susceptibility to infectious disease. Genetics. 2018;208:779–789. pmid:29223971
- 55. Nuismer SL, Jenkins CE, Dybdahl MF. Identifying coevolving loci using interspecific genetic correlations. Ecol Evol. 2017;7(17):6894–6903. pmid:28904769
- 56. Wang M, Roux F, Bartoli C, Huard-Chauveau C, Meyer C, Lee H, et al. Two-way mixed-effects methods for joint association analysis using both host and pathogen genomes. P Natl Acad Sci USA. 2018.
- 57. Jain K, Stephan W. Modes of Rapid Polygenic Adaptation. Mol Biol Evol. 2017;34(12):3169–3175. pmid:28961935
- 58.
De Meaux J, Cattan-Toupance I, Lavigne C, Langin T, Neema C. Polymorphism of a complex resistance gene candidate family in wild populations of common bean (
*Phaseolus vulgaris*) in Argentina: comparison with phenotypic resistance polymorphism. Mol Ecol. 2003;12(1):263–273. pmid:12492894 - 59.
Gos G, Slotte T, Wright SI. Signatures of balancing selection are maintained at disease resistance loci following mating system evolution and a population bottleneck in the genus
*Capsella*. BMC Evol Biol. 2012;12. pmid:22909344 - 60.
Karasov TL, Almario J, Friedemann C, Ding W, Giolai M, Heavens D, et al.
*Arabidopsis thaliana*and*Pseudomonas*pathogens exhibit stable associations over evolutionary timescales. Cell Host Microbe. 2018;24(1):168+. pmid:30001519 - 61. Bartha I, Carlson JM, Brumme CJ, McLaren PJ, Brumme ZL, John M, et al. A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control. eLife. 2013;2. pmid:24171102
- 62. Leventhal GE, Guenthard HF, Bonhoeffer S, Stadler T. Using an Epidemiological Model for Phylogenetic Inference Reveals Density Dependence in HIV Transmission. Mol Biol Evol. 2014;31(1):6–17. pmid:24085839
- 63. Janzen DH. When is it coevolution. Evolution. 1980;34(3):611–612. pmid:28568694
- 64. Nuismer SL, Gomulkiewicz R, Ridenhour BJ. When Is Correlation Coevolution? Am Nat. 2010;175(5):525–537. pmid:20307203
- 65. Watterson GA. Number of Segregating Sites In Genetic Models Without Recombination. Theor Popul Biol. 1975;7(2):256–276. pmid:1145509
- 66. Nei M, Tajima F. DNA polymorphism detectable by restriction endonucleases. Genetics. 1981;97:145–165. pmid:6266912
- 67. Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123:585–595. pmid:2513255
- 68. Fu YX, Li WH. Statistical tests of neutrality of mutations. Genetics. 1993;133:693–709. pmid:8454210
- 69. Fay JC, Wu CI. Hitchhiking under positive darwinian selection. Genetics. 2000;55(1):1405–1413.
- 70. Zeng K, Fu YX, Shi S, Wu CI. Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics. 2006;174(3):1431–1439. pmid:16951063