Resistance to a CRISPR-based gene drive at an evolutionarily conserved site is revealed by mimicking genotype fixation

CRISPR-based homing gene drives can be designed to disrupt essential genes whilst biasing their own inheritance, leading to suppression of mosquito populations in the laboratory. This class of gene drives relies on CRISPR-Cas9 cleavage of a target sequence and copying (‘homing’) therein of the gene drive element from the homologous chromosome. However, target site mutations that are resistant to cleavage yet maintain the function of the essential gene are expected to be strongly selected for. Targeting functionally constrained regions where mutations are not easily tolerated should lower the probability of resistance. Evolutionary conservation at the sequence level is often a reliable indicator of functional constraint, though the actual level of underlying constraint between one conserved sequence and another can vary widely. Here we generated a novel adult lethal gene drive (ALGD) in the malaria vector Anopheles gambiae, targeting an ultra-conserved target site in a haplosufficient essential gene (AGAP029113) required during mosquito development, which fulfils many of the criteria for the target of a population suppression gene drive. We then designed a selection regime to experimentally assess the likelihood of generation and subsequent selection of gene drive resistant mutations at its target site. We simulated, in a caged population, a scenario where the gene drive was approaching fixation, where selection for resistance is expected to be strongest. Continuous sampling of the target locus revealed that a single, restorative, in-frame nucleotide substitution was selected. Our findings show that ultra-conservation alone need not be predictive of a site that is refractory to target site resistance. Our strategy to evaluate resistance in vivo could help to validate candidate gene drive targets for their resilience to resistance and help to improve predictions of the invasion dynamics of gene drives in field populations.

Introduction CRISPR/Cas9 homing gene drives have shown much promise in providing an effective drive mechanism for novel genetic population modification and suppression strategies against insect vectors of major diseases, in particular the mosquitoes Anopheles gambiae and Anopheles stephensi [1][2][3][4]. In general, these gene drives comprise a genetic construct containing a source of ubiquitously expressed guide RNA (gRNA) and Cas9 endonuclease that is expressed in the germline and together they cut a very specific chromosomal target site. Homology Directed Repair (HDR) leads to copying over of the entire gene drive construct into the target site in the homologous chromosome, leading to homozygosity in the germline of carrier individuals. Therefore, unlike Mendelian inheritance, potentially all offspring will receive a copy of the transgene, regardless of which chromosome they inherit from the parent. Modelling has suggested that suppression of disease vector populations could be achieved by designing the gene drive to target and disrupt haplosufficient genes (i.e. genes where one functioning copy is sufficient to maintain normal function) that are essential in the soma for mosquito viability or fertility, and confining the homing reaction to the germline [5][6][7]. Under this scenario, individuals that contain one copy of the gene drive are 'carriers' that transmit the drive to a very high portion of the offspring, resulting in an increase in its frequency, which in turn places a strong genetic load-i.e reduction in fitness-on the population. If the load is high enough when the drive increases to its equilibrium frequency, population suppression or elimination can occur.
Selection of mutations at the target site, or pre-existing alleles in the population, however, can hinder this type of gene drive strategy and prevent further spreading of the drive element [8]. Mutations that prevent Cas9 cleavage at the intended cut site, whilst maintaining the function of the targeted gene, are referred to as 'functional resistance" alleles. Such alleles can preexist in the target population or arise when the repair of the cleaved chromosome is mediated by non-homologous end joining (NHEJ) rather than HDR [8]. One approach to reduce the development of resistance to a gene drive is to target regions that are structurally or functionally constrained and therefore less likely to tolerate insertions or deletions ('indels') and substitutions that could lead to resistance. As a proof of principle, this approach has been used to mitigate resistance in small cage/ scale testing observed in previous applications that targeted female fertility genes [2]. By targeting a highly conserved 18bp sequence within the A. gambiae doublesex (dsx) gene, complete suppression of wild-type laboratory populations was achieved, while no alleles resistant to the gene drive were selected [1]. Although these results are promising for the use of population suppression gene drives against these malaria vectors, it is important to have a range of suitable target genes, allowing a more tailored approach to achieve the desired levels of suppression and avoidance of resistance. Moreover, the choice of suitable target gene will represent a tradeoff between the magnitude of the load it imposes on a population when disrupted and its propensity to tolerate mutations that are resistant to the gene drive.
Our ability to identify novel genes containing gene drive targets that show high functional constraint has been greatly enhanced by genome resequencing efforts that have covered species across the Anopheles genus, spanning over 100 million years on an evolutionary timescale, as well as wild caught individuals of the closely related Anopheles gambiae and Anopheles coluzzii species [9][10][11]. A recent analysis of genomic data from 21 Anopheles species revealed over 8000 ultra-conserved sites (100% identity) of sufficient length (18bp; corresponding to the minimal sequence recognised by a guide RNA) to be targetable by homing-based gene drives [12]. It is to be expected that among this set of sequences there will be variation in their tolerance for changes to their underlying biological functions.
In this study, we focussed our analysis on ultra-conserved sites within putative haplosufficient essential genes as potential target sites for robust gene drives aimed at population suppression. Having generated an active gene drive targeting one of these essential genes, as proof of principle we subjected it to a maximal selection pressure to test its resilience to the selection of gene drive resistant alleles and its overall suitability for vector control.

AGAP029113 is a haplosufficient essential gene in Anopheles gambiae
A recent analysis of ultra-conserved elements (UCEs) from 21 different Anopheles species and search of functional annotations of genes containing UCEs by O'Loughlin et al [12] identified ultra-conserved targets potentially affecting female fertility or lethal recessive phenotype that could be suitable targets for vector control. Amongst the list of potential targets, the AGAP029113 gene is of interest as it contains multiple ultra-conserved sites with a total of 140 invariant sites. Its apparent orthologue in Drosophila melanogaster is Smrter (Smr), several null alleles of which lead to a recessive lethal phenotype, consistent with this gene being haplosufficient.
The AGAP029113 gene is located on Chromosome 2L: 2895973-2931030 and was recently reassigned as a fusion of AGAP004734 and AGAP004732 (Vectorbase.org). It consists of 12 exons (S1 Fig) and contains domains putatively involved in interaction with the G-protein suppressor 2 pathway, DNA repair exonuclease subunit and DNA binding domains (Homeobox-like domain superfamily), suggesting a role in signalling, replication, recombination and DNA repair [13]. Its high level of conservation and expression in larvae and in female ovaries [14,15], which does not change with increasing adult age [16] suggests that this gene is required during mosquito development in both sexes [15].
We chose a target site within AGAP029113 that is ultra-conserved across Culicidae (Culicidae; last common ancestor~150 million years ago) but not in Drosophila melanogaster genomes (Drosphilidae; 260 m.y.a.) (S1 Fig). Even within An. gambiae and Anopheles coluzzii populations (2784 wild Anopheles mosquitoes sampled across Africa), this particular target site showed only negligible variations of less than 0.6% frequency [17] (S1 Fig). This does not exclude the existence of further polymorphisms in wild populations but suggests that alteration of this sequence either naturally or by gene drive may have a strong fitness cost.
We then disrupted this candidate gene by inserting a GFP 'docking' cassette ('hdrGFP') into the target site within exon 5, which is expected to prevent the generation of a functional AGAP029113 transcript and likely represents a null allele. The insertion of this cassette was performed by CRISPR-mediated HDR (Fig 1A), using previously described methodology [1,2]. After confirming the correct integration of the docking cassette ( Fig 1B) we crossed heterozygous individuals with each other and scored their progeny. Among larvae we observed heterozygous (29113 hdrGFP/+ ) and homozygous (29113 hdrGFP/hdrGFP ) individuals at the expected Mendelian ratio (Fig 1C). However, individuals homozygous for the null allele failed to emerge to adulthood. During general maintenance of the strain we noticed no obvious fitness effects in individuals heterozygous for the null allele, suggesting that this gene may represent a suitable haplosufficient essential gene as a target for a population suppression gene drive strategy, whereby 'carriers' with a gene drive disrupting a single copy of the target gene need to be competitive to be able to pass on the gene drive to their offspring.

Phenotypic assessment of a gene drive targeting an essential gene
To assess the phenotype in the presence of a gene drive construct targeting this locus, an active Cas9/gRNA cassette (29113 CRISPRh ) was inserted into the locus as described previously [1,2]. The 29113 CRISPRh construct contained Cas9 with expression controlled by the zero population growth (zpg) germline promoter [1], together with a dominant RFP marker gene to assist in tracking the inheritance of the gene drive (Fig 2A). This was placed in locus via recombinase mediated cassette exchange (RMCE), replacing the GFP cassette with the 29113 CRISPRh cassette. This gene drive construct showed high rates of biased inheritance from heterozygous (29113 CRISPRh/+ ) female (92.7% ± 2.9) and male mosquitoes (91.3% ± 3.2) (Fig 2B).
Impaired fecundity and fertility of heterozygotes has been observed in other gene drive strains (1,2). We performed fecundity assays where heterozygous (29113 CRISPRh/+ ) males and heterozygous (29113 CRISPRh/+ ) females were crossed to wild-type G3 mosquitoes crossed repeatedly over multiple generations (to account for any parental effects as described in Hammond et al., 2021 [18]) and the number of eggs laid and hatched larvae was counted by individual oviposition of females. The wild-type G3 strain was used as control. There was no significant difference in the number of eggs across all 5 crosses (Fig 3A; quasi-Poisson GLM, F = 0.458, df1 = 4, df2 = 90, p = 0.766). However, the hatching rate of the progeny of 29113 CRISPRh/+ females was reduced by 21.6% relative to wild-type controls (Fig 3B; z-test, p = 0.046). There was also increased mortality during larvae to adult development of about 37.8% and 46.2% among the offspring of heterozygous males and females, respectively (Fig 3C; z-test, 29113 CRISPRh/+ males p = 3.33e-5; 29113 CRISPRh/+ females p = 4.88e-8).

Assessment of 29113 CRISPRh/+ spread in a cage population experiment
Despite the fitness costs apparent with this gene drive in heterozygous carrier individuals, the extent of biased inheritance was still expected to cause the element to increase in frequency in a population [1,2]. To test this, we seeded 29113 CRISPRh/+ individuals into a wild-type caged population (total initial population size of 400) at a starting frequency of 20%, in triplicate, and monitored the frequency of gene drive individuals over time. The same experiment was performed in parallel with the non-driving 29113 hdrGFP/+ mosquitoes as control.
For releases with the control strain the frequency of individuals with the control allele decreased slowly over 3 generations to an average of 10.6%, consistent with the expectation for a non-driving null allele ( Fig 4A). However, in the gene drive release cages, despite observing an initial increase in the frequency of gene drive carriers in the generation after release, this rapidly declined, to the extent that by 3 generations post-release the gene drive was lost from the population in all three replicates ( Fig 4B). Rapid loss of the gene drive suggests that there is a fitness cost associated with the Cas9/gRNA expression from the gene drive allele that prevents its persistence. To understand possible fitness costs, we used a simple deterministic model of a single, randomly mating population with two life stages (juveniles and adults), two sexes and discrete non-overlapping generations, following the structure of Beaghton et al [19] and incorporated the heterozygote fitness costs identified from the phenotypic assays. They included a) no fitness costs (blue dashed line) b) fitness costs that were observed in the phenotypic assays (21% reduced hatching rate in 29113 CRISPRh/+ females and reduced larvae to adult eclosion rate of 37.8% and 46.2% for 29113 CRISPRh/+ males and females) (light red dashed line) and c) further fitness costs (90% for 29113 CRISPRh/+ males and females) that were not measured and were chosen arbitrarily to signify a very high genetic load (dark red dashed line). Overall, the data fitted the model c) assuming high additional fitness costs associated with the gene drive construct. This could include, for example, fitness costs such as a prolonged larval developmental time or a reduced adult mating success, which we did not assess. We also investigated the possibility of the generation and subsequent selection of a functional resistant allele to the gene drive that could have led to loss of the transgene. Such alleles can be formed when Cas9-mediated cuts are repaired by error-prone end joining DNA repair pathways producing small insertions or deletions or substitutions [8]. Our models showed that such resistant alleles would have had no impact on the transgenic rate over the first 5 generations of the population experiment (S2 Fig). This suggests that the dominant factor in determining the drive loss was the unexpected additional fitness costs associated with harbouring the drive element in heterozygosity, rather than the emergence and selection of resistance. Since these fitness effects were not observed in individuals heterozygous for the simple null allele, it suggests they are related to expression of the Cas9 nuclease, which may or may not be ameliorated in the future by changes to the design of this drive construct and its expression. Nonetheless, an important question is the resilience of this target site, and indeed any target site, to resistance in the face of a suppression gene drive.

Non-drive alleles generated at the target site by Cas9 nuclease activity
To investigate the array of target site mutations that can be generated at the ultra-conserved site we devised a cross that enriched for non-drive alleles exposed to the nuclease from the gene drive ( Fig 5A). In this assay the non-drive alleles are balanced against a marked (GFP+) null allele of the haplosufficient target gene. Consistent with an essential role for the AGAP029113 gene between larval and adult transition, the distribution of in-frame and outof-frame alleles was markedly different between the larval stage (where a complete null genotype is tolerated) and the adult stage (where it is not): pooled sequencing of the target site in L1 instar larvae revealed that 26.5% of non-drive alleles contained a target site mutation, of which 10.8% were in-frame ( Fig 5B); sampling at the adult stage, only 5.7% of non-drive alleles  contained mutations, of which 99.3% were in-frame. The persistence of in-frame alleles in the adult stage strongly suggests that these restore function, at least partially, to the essential target gene. If these mutations are also resistant to further cleavage by the gene drive then they would meet the criteria of being functionally resistant and thus be positively selected in the presence of the gene drive.

Selection of 29113 CRISPRh resistant alleles by mimicking genotype fixation in a caged population
In order to investigate our assumption that these mutations could be strongly selected, we mimicked a scenario of genotype fixation (i.e., all mosquitoes have at least one copy of the gene drive), where selection pressure for functional resistance alleles is expected to be highest. We crossed individuals heterozygous for the gene drive construct, allowing all offspring to survive and reproduce for multiple generations. Given our previous estimate of gene drive transmission rate (93%), under this scenario the vast majority (87% (0.93 2 )) of progeny would be expected to be homozygous for the gene drive and so would die during larval development. Of the progeny that survive to form the following generation, the majority would be heterozygous and destined to produce predominantly non-viable offspring. Under these conditions a functional resistance allele will therefore have a high selective advantage since it will confer viability on any genotype that inherits it.
Over the first five generations of this experiment, the gene drive population (marked in red) was constantly suppressed (Fig 6A), producing very few eggs compared to the non-drive 29113 hdrGFP/+ controls (in green), consistent with the majority of the offspring lacking a functional copy of the AGAP029113 gene. However, by generations 6 and 7 we observed a rise in frequency of larvae as well as the number of adults and eggs produced by the gene drive cage. Together, these observations are consistent with a breakthrough of functional resistance alleles that are refractory to gene drive invasion and restore the carrying capacity of the population. Indeed, sequencing of adults from the G7 generation (Fig 6B) showed over 87% of reads contained the same in-frame mutation at the Cas9 target site. We also detected this single nucleotide substitution, which is silent, amongst other mutations recovered in larvae and adults in our previous screen (S3 Fig). Since this mutation was not found in the laboratory wild-type population (despite sequencing of 360 wild-type individuals and alignment of 35817 reads), the most likely explanation is that it is a de novo mutation caused by end-joining repair following nuclease activity of the gene drive. This specific mutation was selected and rapidly increased in frequency over generations likely due to a fitness advantage compared to other indels found in the single generation assay.

Discussion
Any strategy, from antibiotics to insecticides, that aims to impose a fitness load on a population inevitably leads to strong selection for resistance. For gene drives, the easiest pathway for resistance to occur is for an allele to arise that codes for a functioning gene that lacks the endonuclease recognition site, thereby blocking the biased inheritance of the gene drive element. These alleles may pre-exist in the population, arise through spontaneous point mutation, or arise through error-prone (non-HDR) DNA repair pathways that introduce small mutations in the repair of the double stranded break generated by the gene drive.
We developed a CRISPR homing gene drive targeting an ultra-conserved site in the viability gene AGAP0029113 and showed that this gene is essential for development of both male and female mosquitoes. However, the invasive potential of the gene drive construct was hindered by high fitness costs in the heterozygous 29113 CRISPRh /+ individuals compared to the wildtype strain. Fitness costs in gene drive carriers are not unusual and have been reported for other gene drive strains [1,18,20,21]. A fitness cost, per se, in carriers will not prevent a gene drive increasing in frequency provided that the magnitude of its biased inheritance outweighs the fitness cost suffered in the carrier. However, in the case here, the fitness costs were severe enough to prevent the gene drive invading at all. One explanation for the fitness costs may be nuclease expression in the soma that converts the wild type allele to a null, thereby creating mosaic individuals with cells containing no functional copy of this essential gene. Mosaicism can derive from 'leaky' expression from the otherwise germline-restricted promoter or from parental deposition into the embryo. Indeed, the fraction of hatched eggs was reduced when the gene drive was present in the mother but not the father (Fig 3B), consistent with a small maternal contribution of Cas9 to the progeny. Though maternal effects of this kind are usually considered an inherent property of the germline promoter used to express Cas9, previous use of the zpg promoter has revealed a paternal [1], maternal (this study), or no parental effect at all [18] depending on its genomic site of integration and potentially its orientation with respect to nearby enhancers. Furthermore, we cannot exclude that the same level of mosaicism may lead to a stronger phenotypic effect at this target. An alternative explanation is that the target gene has a germline function in addition to its essential function in the soma, with the result that gene drive conversion to homozygosity in the germline leads to reduced fecundity. In our case it is difficult to disentangle such effects from general fitness effects since the competing hypothesis of somatic mosaicism could similarly lead to reduced fecundity.
Potentially such effects are more problematic for essential genes that are required during development in both sexes, compared to female fertility genes, since in the latter case heterozygous carrier males are unaffected. This is important for the phenotypic assessment of the suitability of future gene drive candidate genes required for viability in males and females and more emphasis should be placed on the restriction of gene drive activity to the germline. To look at the potential for gene drive resistance to arise at our chosen ultra-conserved target site, we devised both a single generation resistance assay and a population experiment designed to mimic the highest selection pressure. In both scenarios we recovered the same single point mutation at the target site. Possibly, the resistant allele we isolated might carry a small fitness cost which would be selected against in natural mosquito populations, which is why it was not detected within variation data from the 2784 An. gambiae, An. coluzzii, Anopheles arabiensis and hybrid individuals, collected from 19 countries in Africa as part of Phase 3 of the Anopheles gambiae 1000 Genomes Project [17]. However, when faced with a gene drive, any small fitness cost is offset by the relative advantage the mutation confers against the gene drive allele. Therefore, even where genomic locations are ultra-conserved across various species and populations over a wide geographic area, their mutation may be tolerated under the selective pressure of gene drive. Further, different ultra-conserved sites could have very different functional constraints and selection pressure; studies have shown that even without any obvious functional constraints sequences can remain ultra-conserved between different species [22]. Consequently, how to assess the potential of resistance development at any given conserved target site is of fundamental importance when considering gene drives of this type. As a general rule, for a population suppression gene drive, the largest load is imparted on a population when the target is a gene essential for female fertility or viability, yet a considerable, though reduced, level of suppression can be achieved when the gene produces a recessive lethal phenotype that manifests in both sexes [23]. Given that there may be a paucity of genes with functionally constrained target sites that are sufficiently robust to resistance it may be that the optimal target site choice will consider a balance between the suppressive load conferred by the gene drive and the intrinsic capability of its target site to tolerate resistant mutations.
Our mimicking of genotype fixation could be a useful experimental approach for swiftly assessing target site susceptibility, since it isolates resistant alleles faster than possible in population experiments of gene drive spread into wild-type populations-the selective advantage to a resistant allele over a wild type allele is highest when the likelihood of either allele finding themselves balanced against a gene drive allele is highest. A similar rationale applies in the resistance assay balancing the gene drive against a marked null allele [18,24] that we also employed here, however in this assay no measure is given of how strongly such potentially resistant alleles are selected or, indeed, of whether they are resistant to the gene drive nuclease. One feature of targeting a gene essential in early development, as we did here, is that it allows a much higher throughput of screening, because selection happens during larval rearing, where rearing capacity is much less limited, meaning that all surviving adults have been through the selection bottleneck. In contrast, for a gene drive that targets female fertility in the adult, most adults in this scenario will be homozygous for the gene drive and selection of a resistant allele only occurs in the small subset of females that have the opportunity to mate and contain at least one non-drive allele.
It will be essential to build on successes to date that have shown gene drives that can suppress populations robustly, without obvious selection for resistance, for example when targeting the female isoform of the sex determination gene doublesex (1). In that specific case the gene drive targets a conserved site at an intron-exon boundary and it may be that the sequence is important at the RNA level and thus has even less degeneracy. Considerations such as this, or choosing target sites where the cleavage site lies in non-degenerate codons, or the non-wobble positions of degenerate codons, may add additional levels of resilience to a gene drive. However, at scale, in a field setting and considering the vast population sizes of mosquitoes, it is likely that such drives will need to be augmented by targeting multiple sites in the same gene, in order to greatly reduce the likelihood of multi-resistant alleles arising on the same haplotype [25][26][27][28][29]. A similar effect can be achieved by releasing multiple gene drives that independently target separate loci [23,30]. Moreover, using a similar logic, it could be prudent to have gene drives targeting genes in a range of different, independent biological pathways. In all cases, it will be essential to prioritise target site choice according not just to its suitability in terms of desired phenotypic effect when targeted, but its resilience to the generation of resistant alleles. Our results here show a pathway for how testing of ultra-conserved sites might proceed in order to evaluate these determinants for future gene drive designs in vivo.

Ethics statement
All animal work was conducted according to UK Home Office Regulations and approved under Home Office License PPL 70/8914.

Generation of CRISPR and donor constructs
utilized as described previously and modified using Golden Gate cloning to generate a PolIII transcription unit containing the AGAP029113-specific gRNA [1,2]. The donor plasmid was assembled by MultiSite Gateway cloning (Invitrogen) and contained a GFP transcription unit under the control of the 3xP3 promoter enclosed within two reversible ϕC31 attP recombination sequences flanked both 5 0 and 3 0 by 2 kb sequence amplified using primer 29113 ex5 B1 f (CAACCAAGTAGTTACTGTGCTC) and 29113 ex5 B4 r (GTCTTTTGTTGTTGTTCACGT) and primer 29113 ex5 B3 f (TGTAGGCCGTGATCGTGC) and 29113 ex5 B2 r (GCGACACC ATACTCCGATG) of the exon 5 target site in AGAP029113. To generate the 29113 CRISPRh allele a gRNA spacer (GAACAACAACAAAAGACTGTAGG) bearing complete homology to the intended target sequence was inserted by Golden Gate cloning into a CRISPR construct (p17410) as described previously [1,2]. This vector contains a human codon-optimized Cas9 gene (hCas9) under control of the zpg promoter and a U6::gRNA cassette with a BsaI cloning site, as well as a visual marker (3xP3::RFP). This construct sequence is flanked by attB recombination sites, which will allow the recombinase-mediated in vivo cassette exchange for the homing allele.

Generation of the 29113 hdrGFP docking line and 29113 CRISPRh line
In order to generate the docking line, we injected 253 eggs of the Anopheles gambiae G3 strain with the 29113gRNA-modified p16510 plasmid (300ng/μl) and donor plasmid (300ng/μl) and obtained 7 transients that led to 6 transgenic individuals. Site-specific Integration was confirmed by PCR using primers binding the docking construct, 5 0 GFP-R (TGAACAGCTCCT CGCCCTTG) and a primer binding the genome outside of the homology arms: DL_29113 Ex5_F2 (TTCCACCTCTCGCTCGTAGT) and sequencing of the flanking sites. For the recombinase-mediated cassette exchange reactions a mix containing the CRISPR plasmid (200 ng/μl) and 400 ng/μl vasa2::integrase helper plasmid [31] was injected into embryos of the 29113 hdrGFP docking lines. Progeny from the outcross of surviving G 0 individuals to WT were screened for the presence of RFP and the absence of GFP that should be indicative of a successful cassette-exchange event.

Assessment of homing rate of CRISPR construct
To assess the ability of this 29113 CRISPRh construct to home at super-Mendelian rates, individuals heterozygous (29113 CRISPRh/+ ) for the construct were crossed with wild type individuals and the progeny scored for inheritance of the CRISPR construct (via the proxy of RFP) from individual lays (N = 15 per cross).

Molecular characterisation of homozygotes and heterozygotes for the 29113 hdrGFP allele
Males and females heterozygote for the 29113 hdrGFP allele were crossed with each other and 60 L1-L2 larvae were analysed by Multiplex PCR using primer 3'-GFP-F (GCCCTGAGCAAAGA CCCCAA), 5'-GFP R (TGAACAGCTCCTCGCCCTTG) and the primer DL_29113 Ex5_F2 (TTCCACCTCTCGCTCGTAGT) flanking the transgenic construct. The same PCR was repeated for 60 adult mosquitoes from the same cross.

Fertility and fecundity data
The 29113 CRISPRh strain was maintained in 2 batches by crossing either transgenic males with wild type females and by crossing transgenic females with wildtype males. This was done to account for any parental effects caused by deposition of the Cas9. The 29113 hdrGFP strain was maintained by crossing transgenic females with wildtype males. G roups of a minimum of 40 virgin heterozygotes carrying the 29113 hdrGFP/+ or 29113 CRISPRh/+ genotype respectively were mated to an equal number of virgin wildtype mosquitoes for 5 days and the number of eggs and offspring was counted from individuals lays (N�12). Females that failed to lay eggs and did not contain sperm in their spermathecal were excluded from the analysis. Raw data including list of test statistics and sample sizes, as well all raw experimental data points, are included in S1_Data as an Excel file.

Larval to adult eclosion rate
Groups of a minimum of 40 virgin heterozygotes carrying the 29113 hdrGFP/+ or 29113 CRISPRh/+ genotype respectively were mated to an equal number of virgin wildtype mosquitoes for 5 days. 100 transgenic L1 larvae were selected and reared to adulthood for each experiment (N�3).

Population cage experiments with 20% starting frequency
First instar mosquito larvae heterozygous for the 29113 CRISPRh allele were mixed within 12 h of eclosion with age-matched WT larvae at a ratio of 1:5 in rearing trays at a density of 200 per tray (in *1 litre rearing water). The mixed population was used to seed three cages (36cm 3 ) with 400 adult mosquitoes each. For three generations, each cage was fed after allowing 5 days for mating, and an egg bowl placed in the cage 48 h after a blood meal to allow overnight oviposition. A random sample of 450 eggs was selected. After allowing full eclosion, offspring were scored under fluorescence microscopy for the presence or absence of the RFP-linked CRISPR h allele, then reared together in the same trays and then used to populate the next generation.

Population model
To model the results of the population cage experiment, we use a simple deterministic model of a single, randomly mating population with two life stages (juveniles and adults), two sexes and discrete non-overlapping generations, following the structure of Beaghton et al [19], without considering parental deposition. We initially consider 3 alleles: wild-type (W), transgenic (T) and cleavage resistant (R). We model insertion of the transgene into a recessive gene required for survival to adulthood and assume all cleavage resistant alleles are non-functional, therefore T/T, T/R and R/R adults are inviable. For the 29113 CRISPRh transgene, cleavage of the W allele occurs in W/T individuals, followed by either homology directed repair, converting W to T, or non-homologous end joining, converting W to R. Cleavage and end joining rates were calculated from the proportion of progeny inheriting the 29113 CRISPRh allele (d) (F) = 0.927, (M) = 0.913 (Fig 3B) and the probability that non-homed alleles are R (u) = 0.266 (based on % of exposed but unmodified alleles from Fig 5B)  model functional resistance, we extend the model to include a fourth allele (r), which is both cleavage resistant and fully functional. Here we allow a proportion (p) of non-homologous end joining products to be functional (r) and 1−p to be non-functional (R). To incorporate the heterozygote fitness costs identified from the phenotypic assays we consider female heterozygotes carrying the transgene to lay fewer viable eggs compared to wildtype: fecundity cost (1-relative hatching rate) (F) = 0.216 (Fig 3B), and male and female heterozygotes to have reduced survival to adulthood: somatic cost (1 -relative eclosion rate) (M) = 0.378, (F) = 0.462, ( Fig  3C). We also consider the case where heterozygotes carrying the transgene have additional somatic costs of other unknown factors: somatic cost (M) = 0.9, (F) = 0.9. In both cases W/R and W/r individuals are assumed to be fully fit. When modelling the 29113 hdrGFP line we assumed mendelian inheritance of the transgene and no heterozygous fitness costs. Following the experimental protocols, the frequency of transgenics was calculated at the juvenile stage, except for generation G 0 which reflects that of the starting adult population. Code for running all the simulations is available on Github and linked to this article.
Small population cage experiment mimicking fixation of the drive 20 male and 20 virgin female adults heterozygous for the 29113 CRISPRh allele were placed in a small cage (25cm 3 ). Females were blood-fed after 5 days and an egg bowl was placed in the cage 48h post blood meal. After allowing full eclosion the number of eggs and RFP+ and RFPlarvae were recorded. All offspring were reared to adulthood and crossed with each other. This was continued for 7 generations. The same procedure was repeated for adults heterozygous for the 29113 hdrGFP allele for 2 generations. Offspring was screened for GFP+.

Resistance assay
400 males heterozygous for the 29113 CRISPRh allele (RFP+) were crossed to 400 females heterozygous for the 29113 hdrGFP allele (GFP+). To collect L1 for DNA extractions, 7000 individuals were screened for GFP using COPAS. Larvae that were GFP+ only or RFP/GFP+ (RFP+ only and larvae with no fluorescent marker were discarded) were pooled and frozen in ethanol at -20˚C. To collect adults for DNA extraction, large number of larvae were needed (as the majority would die before reaching adulthood). 10,000 L1 larvae were screened for GFP+, and the sorted larvae were split into 20 trays of 500 individuals. A single tray was then reassessed for frequency of GFP+ and RFP/GFP+. This tray was then rescreened at L4 stage for frequency of GFP+, RFP/GFP+ and the rate of larval mortality. Surviving larvae were collected from all 20 trays and screened manually for GFP+. These were allowed to emerge as adults before collection and storage at -20˚C for later DNA extraction.

PCR of target site and deep sequencing analysis preparation
Pooled DNA extractions (minimum 90 adult mosquitoes or 3500 L1 larvae) were performed using the Wizard Genomic DNA Purification kit (Promega). A 349 bp locus containing the predicted on-target cleavage site was amplified with primers containing Illumina Nextera Transposase adapters (underlined), 29113-F (TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCT GCTTGCGGGAACACTGTTAG) and 29113-R (GTCTCGTGGGCTCGGAGATGTGTATAA GAGACAGTGTTAAGGAAGTCCCGACAGCG). PCR reactions with KAPA HiFi HotStart ReadyMix PCR Kit (Kapa Biosystems) were setup with 80ng genomic DNA. These PCR reactions, library preparation and deep sequencing were performed as previously described [1].

Deep sequencing analysis
CRISPResso software v2.0. 29 [32] was utilised for analysis of amplicon sequencing around the on-target site, with the parameter-q 30. Reads containing indels and substitutions 1bp either side of the predicted cleavage site were tallied as modified. Frameshift analysis was performed within CRISPResso2 using parameter-c to input the coding sequence.

Statistical analysis
To compare the number of eggs laid by different strains, a quasi-Poisson generalised linear model (GLM) was fitted to the egg counts, with stains (5 levels, or categories) being the main effect. Quasi-binomial was used to explore the difference in the hatching rates across. The quasi family was adopted as the data exhibits overdispersion. A mixed-effect binomial GLM was run to study the eclosion rates for the various strains and across generations. A random effect was included for experiments with replicates.