Double drives and private alleles for localised population genetic control

Synthetic gene drive constructs could, in principle, provide the basis for highly efficient interventions to control disease vectors and other pest species. This efficiency derives in part from leveraging natural processes of dispersal and gene flow to spread the construct and its impacts from one population to another. However, sometimes (for example, with invasive species) only specific populations are in need of control, and impacts on non-target populations would be undesirable. Many gene drive designs use nucleases that recognise and cleave specific genomic sequences, and one way to restrict their spread would be to exploit sequence differences between target and non-target populations. In this paper we propose and model a series of low threshold double drive designs for population suppression, each consisting of two constructs, one imposing a reproductive load on the population and the other inserted into a differentiated locus and controlling the drive of the first. Simple deterministic, discrete-generation computer simulations are used to assess the alternative designs. We find that the simplest double drive designs are significantly more robust to pre-existing cleavage resistance at the differentiated locus than single drive designs, and that more complex designs incorporating sex ratio distortion can be more efficient still, even allowing for successful control when the differentiated locus is neutral and there is up to 50% pre-existing resistance in the target population. Similar designs can also be used for population replacement, with similar benefits. A population genomic analysis of CRISPR PAM sites in island and mainland populations of the malaria mosquito Anopheles gambiae indicates that the differentiation needed for our methods to work can exist in nature. Double drives should be considered when efficient but localised population genetic control is needed and there is some genetic differentiation between target and non-target populations.


Genetics
Our baseline model has 2 autosomal loci each with 3 alleles. At the first locus A is the wildtype allele; a is the transgenic construct inserted into and disrupting the wild type allele; and a is a cleavage-resistant allele produced by end-joining repair. For population suppression we assume that the A locus is needed for female survival past the pupal stage, that the resistant allele is non-functional, and that functional resistance is not possible [1]. When A is a haplo-sufficient gene, we assume no reduction in survival probability when carrying only one functional copy of the gene. At the second locus there are initially 2 segregating alleles of equal fitness, B and b, with B being more common in the target population than in the nontarget population (and vice versa for b). The third allele, β, is the transgenic construct which can home into B but not into b. Where the differentiated B locus is neutral, we assume all endjoining repair events produce cleavage-resistant alleles (b). In the case where the differentiated B locus is an essential gene, we extend the model to have a 4th allele at this locus, bj, which is a non-functional, cleavage resistant, version of B that can be formed by end-joining repair and assume all end-joining leads to non-functional alleles. We also assume that the β construct has been engineered to have minimal effects on fitness (e.g., by having a recoded version of the B allele, or being inserted into an artificial intron). For comparison we also model a single drive consisting of a single construct homing into a female viability gene, and allow for both functional resistance genes that may pre-exist in the population and non- For population replacement where a carries a desirable cargo gene and β is in the differentiated locus, we consider three cases, where (i) both A and B loci are selectively neutral; (ii) the A locus is selectively neutral and B is a haplo-insufficient essential gene; or (iii) the A locus is a haplo-insufficient essential gene and the B locus is selectively neutral. In those cases where the A or B locus is an essential gene, the a or β construct is assumed to have been designed to have minimal effects on fitness (as above), end-joining produces non-functional alleles which are dominant lethal, and any pre-existing resistant allele (at B) is selectively neutral. In each case there are 3 alleles at each locus, except when B is an essential gene, when there are 4. For comparison we also consider a single drive consisting of a construct that homes into a neutral site (3 allele model) or into a haplo-insufficient essential gene (4 allele model; Fig 6 in main text).

Fitness effects of disrupting host genes
The fitness of a genotype depends both on the disruption of host genes caused by insertions and mutations and on the activity of the construct(s) it carries. For the fitness effects due to disruption of host genes the fitness of A/A and B/B homozygotes is standardised to 1, and the fitness of genotypes homozygous for the construct a/a or β/β is 1 − ! ; homozygous for the cleavage resistant allele a/a or b/b is 1 − " ; heterozygous for the construct A/a or B/β is 1 − ℎ ! ! ; heterozygous for the cleavage resistant allele A/a or B/b is 1 − ℎ " " or heterozygous carrying both the construct and cleavage-resistant allele a/a or β/b is 1 − !" , where ! , " and !" are selection coefficients and ℎ ! and ℎ " are dominance coefficients, each of which differ depending on the locus (A and B) and between sexes (see S2 Table for more details of locus-specific fitness costs for each scenario modelled).

Construct activity and fitness effects
Each construct consists of one or more transcription units (Cas9, gRNA, X-shredder, Cargo), and the activity associated with each is assumed to be dominant (i.e., dosage independent). For genotypes carrying at least one Cas9, one gRNA and the site targeted by the gRNA (A or B), cleavage of the target site occurs with probability , after which end-joining repair occurs with probability converting the target site to a resistant allele (a or b), and homing occurs with probability 1 − converting the target site to the construct (a or β). For males carrying at least one X-shredder, sperm is produced carrying Y or X chromosomes at a ratio of : 1 − . In all models, we assume a fitness cost due to off-target cleavage ( ) of 1%, where fitness is reduced by a factor 1 − if there is at least one Cas9 and one gRNA present, or by a factor (1 − ) # if there is at least one Cas9 and two different gRNAs present.
To model costs due to somatic expression of the nuclease, genotypes carrying at least one Cas9, one gRNA and the site targeted by the gRNA, have fitness reduced by a factor 1 − . Similarly, to model costs associated with X-shredding, males carrying at least one X-shredder have fitness reduced by a factor 1 − . With the exception of Fig 5B and C in the main text, we assume both and to be zero. For population replacement we assume the cargo has a fitness cost ( ) of 20% in females, with no effect on males. In all cases the fitness costs associated with each unit's activity are assumed to be dominant (i.e., dosage independent), and for simplicity, are assumed to affect survival after density dependence and before censusing (e.g., at the pupal stage).
The overall fitness of a genotype is the product of the relative fitness due to host gene disruption at locus A, host gene disruption at locus B and construct activity.

Analyses of evolutionary stability
We extend both the 3-allele and 4-allele models to allow for loss-of-function mutations of each transcription unit (Cas9, gRNA, X-shredder, cargo). After release of fully functional constructs, we assume that each unit mutates with probability µ; mutation can only occur during homing; it is possible for more than one unit to mutate during a single homing event; and mutation back to a functional unit does not occur. Constructs carrying non-functional units are expected to carry the fitness costs associated with disrupting the function of the host gene but no longer carry costs due to the transcription unit activity (e.g., off-target cleavage).
Although at the sequence level constructs carrying only non-functional units are expected to differ from cleavage-resistant alleles produced through end-joining repair, both are assumed to be resistant to cleavage and carry the same fitness costs, and we model them as a single allele.

Population biology
We model a population with discrete, non-overlapping generations. In each generation, we assume the population mates randomly, that females produce fertilised eggs and that males are not limiting in their production. Juvenile survival is density-dependent, such that the probability of surviving is q $ g g%& ! , where q $ is the density-independent probability the juvenile survives to adulthood, g determines the strength of density dependent mortality and $ is the number of juveniles in the population. All our results are in terms of relative population sizes (compared to the pre-release equilibrium), and are unaffected by the precise value of g. The intrinsic rate of increase (Rm) of the wild-type population is 'q ! # . After density-dependent juvenile mortality, additional genotype-dependent mortality occurs before juveniles mature into adults, taking into account fitness costs associated with the constructs and cleavage-resistant alleles. Adult males and females then produce gametes, during which recombination occurs between the A and B loci with probability , and, depending on the genotype, cleavage followed by homing or mutation may occur in males or females, and shredding of the Xchromosome may occur in males.

Population censusing
Relative population densities and allele frequencies are of populations censused at the adult stage. The correlation between the constructs a and β is calculated as: where, a( is the frequency of aβ chromosomes in the population and a and ( are the frequency of a and β alleles in the population respectively.

PAM site analysis in An. gambiae
To identify PAM sites (NGG or CCN) in An. gambiae with the type of differentiion required for our approach, SNP variants from the Phase II Ag1000G dataset were first screened for the presence of at least one G or C allele. Positions upstream and downstream of the variant were then screened to calculate frequencies of GG and CC dinucleotides, using the AgamP4 reference genome for invariant neighbouring sites, or the Ag1000G SNP data for neighbouring variants. Since phased data was only available for biallelic variants, dinucleotides containing two variants, of which at least one was multiallelic, were not screened Bioko An. gambiae n=18 and Guinea An coluzzii n=8, where n is the number of sequences and n/2 is the number of individuals sampled). We included populations of An. coluzzii and of uncertain species assignment because there can be frequent hybridisation with An. gambiae.
PAM sites containing >5% missing data at any sampling location were removed (0.5% PAMs), leaving a total of 13,462,450 polymorphic PAM sites. The per-site PAM frequencies were used to identify PAM sites present in the island population of interest, and at frequencies <10%, <5% or absent in all other populations, excluding the An. coluzzii population from Guinea due to low sample size (n=8, from 4 individuals). Analysis of the Ag1000G data made heavy use of the python package scikit-allel (https://doi.org/10.5281/zenodo.3238280).