MGSurvE: A framework to optimize trap placement for genetic surveillance of mosquito populations

Genetic surveillance of mosquito populations is becoming increasingly relevant as genetics-based mosquito control strategies advance from laboratory to field testing. Especially applicable are mosquito gene drive projects, the potential scale of which leads monitoring to be a significant cost driver. For these projects, monitoring will be required to detect unintended spread of gene drive mosquitoes beyond field sites, and the emergence of alternative alleles, such as drive-resistant alleles or non-functional effector genes, within intervention sites. This entails the need to distribute mosquito traps efficiently such that an allele of interest is detected as quickly as possible—ideally when remediation is still viable. Additionally, insecticide-based tools such as bednets are compromised by insecticide-resistance alleles for which there is also a need to detect as quickly as possible. To this end, we present MGSurvE (Mosquito Gene SurveillancE): a computational framework that optimizes trap placement for genetic surveillance of mosquito populations such that the time to detection of an allele of interest is minimized. A key strength of MGSurvE is that it allows important biological features of mosquitoes and the landscapes they inhabit to be accounted for, namely: i) resources required by mosquitoes (e.g., food sources and aquatic breeding sites) can be explicitly distributed through a landscape, ii) movement of mosquitoes may depend on their sex, the current state of their gonotrophic cycle (if female) and resource attractiveness, and iii) traps may differ in their attractiveness profile. Example MGSurvE analyses are presented to demonstrate optimal trap placement for: i) an Aedes aegypti population in a suburban landscape in Queensland, Australia, and ii) an Anopheles gambiae population on the island of São Tomé, São Tomé and Príncipe. Further documentation and use examples are provided in project’s documentation. MGSurvE is intended as a resource for both field and computational researchers interested in mosquito gene surveillance.

and biological modifying parameters (ρ), which control the shape of the kernel.These parameters are calibrated to expected distances that mosquitoes fly on a daily basis (e.g., the decay rate of an exponential function).To account for resource-directed movement, this shape parameter is then multiplied by the probability of movement (λ), from the source site-type (ŝ i ) to the destination's site-type (ŝ j ), which accounts for the expected probabilities that individuals jump from one point-type to another.
We calculate the movement weights between all the sites in the landscape and normalize the resulting matrix to obtain our migration matrix τ : December 18, 2023 1/8 All rows of τ sum to 1, as individuals are limited to move within the s n set of sites.

Movement with Traps
To calculate the movement matrix incorporating traps as absorbing states (χ), we follow a similar process to calculate the trap probability weights (δ), as we did for the movement weights (α).We determine the movement probability from any given site (s i ) to any given trap (t j ) based on the attractiveness profile (η), and the trap-type attractiveness relative to the current mosquito resource (ϕ( ŝi , tj )).Equations 4 and 5 are included to mathematically describe that traps are absorbing states: With these pieces in place, we can define our normalized full movement block matrix with traps as follows, noting that rows must be renormalized to account for possible movement from a site to a trap or to another site: Here, τ is our original migration matrix, ν is the probability that a random walker moves from a site to a trap, the null matrix 0 represents the absorbing nature of the traps, and I is the identity matrix stating that walkers who fall into a trap will stay in that trap.

Fitness Function
With the Markov chain transition matrix including absorbing traps in canonical form (χ), we can use the fundamental matrix (F ), to calculate a range of Markov chaing properties, including the expected number of time steps before an individual is trapped/absorbed.The fundamental matrix is given by: The F matrix contains information on how many time steps an individual is expected to spend in site j given that it started in site i, before falling into a trap.
Finally, to compute the fitness value (ϕ), we calculate the expected number of time steps (T ) before being trapped, when starting in transient site i: Two metrics that we use in the current paper are: i) the expected number of time steps before being trapped, averaged over all origin sites, i.e., mean(T sn×1 ), and ii) the maximum expected number of time steps before being trapped, considering all origin

Implementation of the Genetic Algorithm
Genetic Algorithms borrow inspiration from biological inheritance processes to solve optimization problems by generating a "population" of possible solutions (chromosomes) from which the fittest are selected for mating and mutation.By iterating this process over generations, we expect the population to slowly converge upon the optimal region of our problem-space which would correspond to the solution to the task at hand.
Here, we describe how MGSurvE makes use of this approach to solve the optimization problem that can be phrased as: "Given a heterogeneous environment and a limited number of traps, where should we place the traps?".
Terminology When describing our genetic algorithm, the parallels with biological processes might make the description of the methodology confusing.To alleviate this, we provide a list of terms as used in GA literature and specifically in our application: • Generation: One full discrete cycle of updates (fitness calculation, selection, crossover, mutation) performed upon the current population.
• Population: A group of potential solutions (individuals).
• Individual: Potential solution to the optimization problem with the trap positions being encoded in a genotype.
• Genotype: Way of storing the trap positions (used interchangeably with individual throughout this document).
• Allele: Each slot in our genotype is an allele and contains the information of a trap positions.
• Selection: The process of choosing individuals from the population for mating in order to generate the next generation to be used in the optimization process.
• Mating (crossover): The process of mixing two selected genotypes to produce viable offspring (where viable in this context is an acceptable potential solution to the optimization problem).
• Mutation: Process of altering individual alleles in genotypes in a stochastic way for individuals to be able to explore new potential solutions to the optimization problem.
In the following description of the optimization process, the terms will be used according to these definitions unless stated otherwise.

Optimization Workflow
If we want the algorithm to be able to convert our potential solutions to a fitness (or cost) that can be optimized, we need to define our problem in a form that is compatible with genetic algorithm approaches.In our case this is relatively simple, as we encode the positions of the traps into the GA's chromosomes as depicted in figure 1.For the continuous case, we store the trap positions directly into pairs of alleles, whereas in the discrete case, we store the ID of the site in which the traps would be positioned.
With this problem mapping at hand, we can go on to describe the set of instructions to run at each generation (optimization step) of our genetic algorithm: 1. Fitness Calculation: Each individual in the population of potential solutions is evaluated according to the calculations described in the "Mathematical Formulation" section of this document.Specifically, we update the values for equations 3 to 8. 2. Individual Selection: A sub-sample of the population is selected for mating (fitter individuals have higher probabilities to get selected).
3. Crossover (Mating): Individuals in the new pool are paired for their chromosomes to be combined for the next generation's offspring.

Mutation:
The generated offspring is subject to random variations in their chromosomes in hopes that some of these changes will lead to better solutions.
In this section, we will go through the details of each step and how the base implementation of MGSurvE goes through each step (for both, discrete and continuous optimization tasks) in an effort to find better solutions to trapping random walkers as quickly as possible by iterating over the potential positions of the traps.

Fitness Calculation
The fitness (or cost) function is the quantity that guides the artificial evolutionary process, and is the factor we will try to maximize or minimize.MGSurvE computes a summary statistic on the time it would take for a random walker in our network to fall into an absorbing site (a trap), given that it started from any site in our landscape (Equation 8).To compute these statistics on our chromosomes, we follow this procedure: 1.If the chromosome is discrete, place the traps in the positions of the matching IDs of the sites stored in each allele of the potential solution.
2. With each of the traps' attraction kernels (δ), their positions, and the base migration matrix (τ ), we update our full movement block matrix (Equation 6).
3. With this updated full movement block matrix and renormalized migration τ , we calculate the expected number of time-steps before being trapped starting from each site in the landscape (Equation 8).
4. Finally, to calculate our fitness, we apply a summary statistic function to these set of expected time-steps (mean for a balanced placement over the landscape, max to prioritize the most remote parts of the geography).

Mutation
The final step in our evolutionary process is to allow the alleles of selected individuals in our population to "mutate" parts of their chromosomes to allow for new possible solutions to be generated.This allows the evolutionary process to explore the vicinity

Solution Selection and Evaluation
As with any stochastic optimization task, we need to run our optimization routines several times to be able to evaluate the performance of our algorithm.This helps us to both notice if it is getting stuck in local optima, and to explore potentially better solutions (as the starting points in solution space will be different in each iteration).
One of the ways to check how the algorithm is performing is to plot the evolution of the best solution of each one of the iterations across the GAs generations (Fig 6).
Some of the problems that can be detected with these plots include: • Lack of exploration: If our plot is constantly flat-lining, we might need to increase our mutation and crossover rates to allow for more exploration.Another December 18, 2023 6/8 option is to lower the tournament size in the selection algorithm or increasing the population size.
• Solution instability: If our plot is jumping up and down, our mutation and crossover rates are probably too high, so we are losing our best solutions often.
• Running the algorithm for few generations: When our algorithm is not run for enough generations, our plot will not have been stabilized to any value in the plot.We can correct this by simply running the algorithm for longer generation spans.
• Widely-varying starting points: If our first generations start at very different points in terms of fitness, we might need to increase our population size.

Fig 1 .
Fig 1. Chromosome-to-trap-position mapping.Continuous optimization chromosomes store trap (magenta crosses) coordinates in pairs of alleles (x/y or lon/lat), while discrete optimization chromosomes store the id of the site (purple circles) at which they are located.

Fig 2 .
Fig 2. Chromosome to fitness calculation.If our chromosome is discrete (G D )we place the traps in the position of the sites which correspond to the IDs stored in it so that we have a representation that is equivalent to the continuous one (G C ).To get the fitness, we calculate Markov's Fundamental Matrix (eq.8) with the migration matrix (τ ), and the traps positions with their attraction kernels (δ); so that we obtain a summary statistic of the time it would take for a random walker to fall into one of the absorbing states.

Fig 3 .
Fig 3. Selection.Genotypes are selected randomly from the initial population (left) based and compared in their fitness scores (middle).The one with the best score in the selected grouped is copied to the new population (right).The process is repeated until enough individuals have been selected to populate the next generation of our algorithm.

Fig 4 .
Fig 4. Crossover.Pairs of genotypes are randomly selected, and their alleles are mixed together (arrows) in order to generate offspring in hopes that they have better solutions to the optimization task.
of the original chromosome in the solution space by modifying values of their alleles (Fig 5).Our base MGSurvE implementation provides extensions to the mutGaussian operator for continuous optimization operations (where an allele is modified with a normal distribution centered at the allele's value) along with a random uniform allele replacement one for discrete tasks (where alleles are swapped with other possible sites IDs in the landscape).

Fig 5 .
Fig 5. Mutation.Individuals are selected randomly for mutation, process in which some of their alleles' values are modified (arrows) so that the algorithm can explore for novel solutions.

Fig 6 .
Fig 6.Solution selection and evaluation.By plotting the evolution of the system we can check and diagnose potential problems with our parameters and algorithm.