A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes
We validate our method's ability to detect constraints on recombination by testing it on synthetic data with known structure. Sequences were generated at random and divided into three communities, after which 1000 recombination events were simulated, described fully in S2. For each recombination event, the two sequences were forced to be chosen from the same community with probability p or were selected uniformly at random with probability (1-p). As the probability that recombination is constrained to within-community is varied from no constraint (p = 0) to strict constraint (p = 1), the ability of our method to correctly classify sequences into one of three communities increases from very poor to perfect. The connected line shows the mean of 25 replicates, with whiskers indicating ± one standard deviation. Two example networks are shown for p = 0.1 and p = 0.9. The dashed line indicates the accuracy of guessing communities uniformly at random, which is slightly larger than 1/3 as explained in S2. Networks are displayed using a force-directed algorithm that allows a system of repelling point-charges (nodes) and linear springs (links) to relax to a low-energy two dimensional configuration, allowing for visualization of network communities.