## Figures

## Abstract

From bird flocks to fish schools and ungulate herds to insect swarms, social biological aggregations are found across the natural world. An ongoing challenge in the mathematical modeling of aggregations is to strengthen the connection between models and biological data by quantifying the rules that individuals follow. We model aggregation of the pea aphid, *Acyrthosiphon pisum*. Specifically, we conduct experiments to track the motion of aphids walking in a featureless circular arena in order to deduce individual-level rules. We observe that each aphid transitions stochastically between a moving and a stationary state. Moving aphids follow a correlated random walk. The probabilities of motion state transitions, as well as the random walk parameters, depend strongly on distance to an aphid's nearest neighbor. For large nearest neighbor distances, when an aphid is essentially isolated, its motion is ballistic with aphids moving faster, turning less, and being less likely to stop. In contrast, for short nearest neighbor distances, aphids move more slowly, turn more, and are more likely to become stationary; this behavior constitutes an aggregation mechanism. From the experimental data, we estimate the state transition probabilities and correlated random walk parameters as a function of nearest neighbor distance. With the individual-level model established, we assess whether it reproduces the macroscopic patterns of movement at the group level. To do so, we consider three distributions, namely distance to nearest neighbor, angle to nearest neighbor, and percentage of population moving at any given time. For each of these three distributions, we compare our experimental data to the output of numerical simulations of our nearest neighbor model, and of a control model in which aphids do not interact socially. Our stochastic, social nearest neighbor model reproduces salient features of the experimental data that are not captured by the control.

**Citation: **Nilsen C, Paige J, Warner O, Mayhew B, Sutley R, Lam M, et al. (2013) Social Aggregation in Pea Aphids: Experiment and Random Walk Modeling. PLoS ONE 8(12):
e83343.
https://doi.org/10.1371/journal.pone.0083343

**Editor: **Bard Ermentrout, University of Pittsburgh, United States of America

**Received: **August 18, 2013; **Accepted: **November 1, 2013; **Published: ** December 20, 2013

**Copyright: ** © 2013 Nilsen et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Funding: **This work was supported by National Science Foundation grant DMS-1009633 to CT. CN and OW were supported though Macalester College's Data Scholars Program, funded by a grant from the Howard Hughes Medical Institute's Precollege and Undergraduate Science Education Program. ML was supported by the Howard Hughes Medical Institute Undergraduate Science Education Program award 52006301 to Harvey Mudd College. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

From bird flocks to fish schools and ungulate herds to insect swarms, nature abounds with examples of animal aggregations [1]–[3]. These groups may arise from environmental factors, social factors, or a combination of the two. Environmental factors induce organisms to move in relation to food sources, light sources, gravity, predators, wind, chemical gradients, and more. On the other hand, even in the absence of significant environmental cues, some animals aggregate because of their intrinsic social tendencies. Social forces such as attraction, repulsion and alignment occur when these organisms interact, sensing each other via sight, smell, hearing, and so forth [4]–[8]. Social aggregations not only are examples of natural pattern formation, but on long time and space scales may influence disease transmission, food supply availability, ecological dynamics, and ultimately, evolution [9], [10]. Additionally, the understanding of aggregations has been used to design algorithms in robotics, computer science, and engineering [11], [12].

A central question in the study of aggregations pertains to the relationship between individual-level and group-level behaviors, and it is crucial to distinguish between these. Individual-level behaviors might include an organism's tendency to move closer to conspecifics, or to align its movement with that of its neighbors. Group-level properties describe characteristics of many individuals, such as the shape of an aggregation, its spatial density distribution, and its velocity distribution. The connection between individual and group-level behaviors is highly nontrivial, as is typical for a complex system [13]. One methodology for exploring this connection is through mathematical modeling. By constructing mathematical models that describe each individual organism's rules for movement, one can simulate and analyze the ensemble to investigate the aggregate behavior. Indeed, aggregation modeling is the subject of an intensive effort in the mathematical modeling community, explored in [5], [14]–[23] and many dozens of other studies. There exists a menagerie of mathematical models for aggregation. One criteria that distinguishes models is the degree to which randomness plays a role. Models can be completely deterministic, deterministic but with an added noise component, or completely stochastic. Models for random movement of biological organisms (such as the one we will presently develop) often take the form of random walks [24], [25] or Lévy flights [26], [27].

An ongoing challenge in aggregation modeling is to construct individual-level rules that are quantitatively accurate and well-tied to experimental data. Sometimes, modelers may attempt to calibrate models and infer parameters based on published field observations or experimental results, for example, as with recent studies of locust swarms [28], [29]. A more direct approach is to conduct experiments that track the motion of individuals and use the data, namely time series of organisms' positions and velocities, to construct models more directly. This approach has enhanced the understanding of fish schools [30], starling flocks [31], and duck formations [32]. Presently, we consider social aggregation of the pea aphid, *Acyrthosiphon pisum*. These particular aphids are significant both because they are severe crop pests [33] and because they are a model organism in biology for studying disease transmission, insect-plant interactions, phenotypic plasticity, and more [34].

Some foundational results on pea aphid movement appear in [35]. Aphids moving on the ground exhibit two dispersal behaviors: searching and running. In the searching behavior, aphids look for a nearby plant to inhabit. Running aphids, in contrast, travel far away from their original host plant, likely in an effort to evade predators. In[35], aphids were exposed to predators while feeding on alfalfa plants. As a defense mechanism, aphids dropped from their feeding site and then traveled away from the original host plant. The average searching aphid made one turn every 6.67 *s* and traveled 0.27 *cm/s* while the average running aphid turned less frequently, every 27.8 *s* and traveled faster, at 0.67 *cm/s*. In a given experimental run, aphids generally did not shift between the searching and running behaviors.

In the absence of predators, some aphids move infrequently [35]. When aphids are attacked by predators, the aphids employ defense mechanisms such as dropping from their location, running, or emitting a fluid droplet from the cornicle, a tube on the dorsal side of the last segment of the insect. The fluid droplet is composed of a mechanical protectant which temporarily paralyzes the jaws of the attacker [36] and alerts nearby conspecifics and heterospecifics to the danger [37]. The experiments in [38] investigate the emission of this fluid droplet further by prodding aphids of various ages on the anterior portion of their thorax and recording the aphid as an emitter or non-emitter. Pre-reproductive aphids are the most likely age group to emit this fluid droplet, plausibly because they often live in close proximity to highly related kin. Once the aphids reach adulthood, it is more advantageous to invest energy in reproduction.

Despite the aforementioned account of chemical signaling, and while it is well-known that aphids aggregate around food sources [39], much less is known about whether certain aphid species form aggregations that are intrinsically social. Aphid species *Uroleucon nigrotuberculatum* and *Uroleucon caligatum* experience lower mortality from generalist predators when aggregated [40], suggesting an evolutionary advantage for social aggregation. Other results on aphid aggregation appear in [40]–[43,43]. In [43], pea aphids were placed in a chamber with five identical feeding stations. If the insects did not aggregate socially, one would expect an even distribution of aphids in each chamber, but this distribution was not observed. In both light and dark conditions, the aphids aggregated mainly in one or two of the feeding stations. Aphids in a dark environment still aggregated at statistically significant levels, albeit less strongly than in lit conditions, suggesting that vision may be one of the senses through which aggregation is activated. In contrast, in a key test in [43], artificial aphids were placed behind the feeding stations such that their shadows were clearly visible. The aphids in the chamber were then allowed to choose one of the five feeding stations at random. In this test, the chamber with the aphid dummies did not show greater likelihood of being chosen, which implies that vision is not the only mechanism enabling social aggregation.

The experiments of [41], [42] found that lime aphids, *Eucallipterus tiliae*, aggregate socially. Three studies in [42] are especially relevant. In the first study, an aphid was allowed to move and settle on a particular, uninhabited leaf. Its final position was marked and the aphid was removed. Trials were repeated on the same leaf with different individuals whose final positions were similarly marked. The distribution of settling locations was random, suggesting that microhabitats on a leaf do not influence aphids' movement. However, when multiple individuals were allowed to settle simultaneously on the leaf, they aggregated, suggesting that social interactions influence their movement. In the second study, between one and eleven aphids were already settled on a leaf, and one target aphid was placed on the leaf. When the target aphid approached a settled aphid (with approach defined as walking within 1 *cm*) on 82% of the trials, the target aphid settled within 1 *cm* of the other settled aphid. The third study examined aphid distribution for different population densities. In this study, each aphid had an associated virtual territory, defined as a circle of fixed radius around the insect, identical for all individuals. In experimental trials, the group was allowed to approach an equilibrium configuration. Then, the percent leaf coverage was computed as the area of the union of the territories divided by the area of the leaf. As the number of aphids was increased, the percent leaf coverage rose with decreasing slope, indicating close packing of the insects, ostensibly due to social interactions.

Given the evidence for social aggregation in some aphid species, our goal at present is to assess and model aggregation of the pea aphid. More specifically, in order to deduce individual-level rules, we conduct experiments to track the motion of aphids walking in a featureless circular arena. We observe that each aphid transitions stochastically between a moving and a stationary state. Moving aphids follow a correlated random walk. The probabilities of stopping and starting, as well as the random walk parameters, depend strongly on distance to an aphid's nearest neighbor. For large nearest neighbor distances, when an aphid is essentially isolated, its motion is ballistic. Aphids move faster, turn less, and are less likely to stop. In contrast, for short nearest neighbor distances, aphids move more slowly, turn more, and are more likely to become stationary; this behavior constitutes an aggregation mechanism. From the experimental data, we estimate the state transition probabilities and correlated random walk parameters as a function of nearest neighbor distance. With the individual-level model established, we assess whether it reproduces the macroscopic patterns of movement at the group level. To do so, we consider three distributions, namely distance to nearest neighbor, angle to nearest neighbor, and percentage of population moving at any given time. For each of these three distributions we compare our experimental data to the output of numerical simulations of our nearest neighbor model, and of a control model in which aphids do not interact socially. Our social nearest neighbor model reproduces salient features of the experimental data that are not captured by the control.

## Experimental Methods

To host aphid colonies, we grew fava bean plants, *Vicia faba* (Johnny's Selected Seeds, Winslow, ME) with 6–7 seeds per pot in an approximately 20°*C* laboratory setting at 60%–70% relative humidity. We stored plants in 45 *cm*×45 *cm*×45 *cm* mesh enclosures (BugDorm, Taichun, Taiwan). Plants received 12 *hr* of continuous light per day from a 120 *W* grow lamp suspended 5–7.5 *cm* above the enclosure, or 25–30 *cm* above the plants. We considered plants to be mature enough to host aphids approximately two weeks after planting, when they reached a height of 15 *cm* above the flower pot rim. We colonized each plant with one hundred pea aphids, *A. pisum* (Nasco, Fort Atkinson, WI). We periodically cleaned enclosures when dirt or dead aphids accumulated. By seven days after colonization, plant health would deteriorate due to aphid feeding. At this point, we transferred the colony to fresh plants in a fresh enclosure. Aphids were then given several days to acclimate before being used in experimental trials.

We performed experiments on a vibration isolation table (IsoStation, Newport Corp., Irvine, CA) in a darkened lab in order to minimize effects of the ambient environment. The experimental arena consisted of a polypropylene circular ring, with a radius of 20 *cm* and height of 3/16 *in*, enclosed between two 1/8 *in* thick glass plates. We underlit the arena with a 24*in*×24*in* LED light panel (AnythingDisplay, Nashua, NH) having a 6500°*K* pure white color temperature. In order to remove debris that might interfere with imaging, and to remove any biological material that might potentially be left from previous experimental runs, we cleaned the top and bottom glass plates with acetone, ethanol and compressed air before every trial. We lined the arena wall and ceiling with silicone oil to discourage aphids from occupying the arena's walls and ceiling.

Aphids are dimorphic insects that may develop into winged or wingless forms, depending on a complicated interaction between genetics and environment [44]. Since we wished to track two-dimensional motion, and in order to minimize any behavioral variations due to age, we restricted our experimental trials to adult wingless aphids (as identified by sight). Adult pea aphids have a body length of approximately 2–4 *mm* [45]. To initiate a trial, we selected individuals from a colonized plant, typically selecting a mix of aphids who appeared to be stationary and moving. Three trials incorporated 8, 10 and 18 aphids; the remaining six trials incorporated 27–35 aphids moving in the arena. We filmed the experiment using a 1080 p high definition video camera (Sony Handycam HDR-SR12) placed 1.1 *m* above the arena, with white balance calibrated to adjust for the effect of the light box as a background. After 45 minutes of filming we ceased recording and returned aphids to the colony.

To prepare our data for motion tracking, we converted raw video footage in.mts format to.mp4 using Handbrake video processing software with sampling in grayscale at 5 *fps*. We used QuickTime Pro to export the video into an image sequence of.tiff files, downsampled to 256 grays and 2 *fps* to facilitate data processing. Using the ImageJ image processing package [46] we removed initial frames of each trial during which overhead lights were reflected, and cropped the rectangular video frames to a circular region corresponding to the experimental arena. We further processed images using M atlab's Image Processing Toolbox and the u-track 2.0 motion tracking package [47]. Specifically, we converted color images to black and white ones (to render the inside of the arena black) and denoised each frame. We ran u-track, which forms trajectories by linking identified aphid positions from frame to frame using a Kalman filter for motion propagation. The tracking process resulted in more trajectories than the number of aphids used in the trial due to the inherent difficulty of motion tracking. That is to say, a single aphid's track across the course of an experimental trial may be recognized as several, shorter trajectories by the tracking algorithm, but this does not affect our data analysis and modeling (more details appear in subsequent sections). Finally, we converted tracked aphid positions from pixel coordinates to real coordinates. Fig. 1 shows examples of tracked data.

(A) Trajectories of 28 aphids during approximately 15 *min* of one experimental trial, as determined by motion tracking of video data. The green circle is the experimental arena with radius 20 *cm*. (B) Blow-up of a subset of a single aphid trajectory, shown in a 10 *cm* × 10 *cm* zoom.

To prepare our raw data set for modeling (see next section) we enhanced it with several elementary, derived pieces of data, namely motion state (stationary or moving), step length (distance traveled in one frame), heading, turning angle, and distance to nearest neighbor. An aphid's step length in a current frame was calculated as magnitude of the difference between its current and previous positions. We considered an aphid to be moving in a given frame if its step length was sufficiently large. For small steps, corresponding to speeds less than 4×10^{−2} *cm/s* (about 1/10 body length per second), we assumed the aphid to be stationary, with the small amount of movement attributed to noise in the video itself and errors in the aphid identification and tracking algorithms. An aphid's heading (the direction it was traveling in a given frame) was calculated by taking the angle of the difference between the aphid's current and previous position vectors. Finally, we calculated turning angle in a given frame as the difference in the current and previous heading. Our final data set consists of 1.2 million entries from the pooled data of nine experimental runs. Each entry contains an aphid's position, motion state, step length, heading, and turning angle.

## Mathematical Modeling of Individual-Level Behaviors

Based on the observation that aphids in the experimental trials transitioned between stationary and moving behavior, we propose a probabilistic two-state model to describe aphid movement and social interaction dynamics. Let *P _{MS}* represent the probability that a moving aphid in a given frame transitions to a stationary state in the next frame. Similarly, let

*P*represent the probability that a stationary aphid in a given frame transitions to a moving state. Perhaps the simplest model that accounts for social interactions allows these probabilities to depend solely on the distance to an aphid's nearest neighbor,

_{SM}*d*. The underlying biological assumptions leading to this model are that aphids sense isotropically (perhaps due to a combination of visual, auditory, and olfactory inputs), that they are affected by the minimum possible social information, and that they do not react to the speed and orientation of their neighbor. We will show that this minimal model reproduces certain salient features of the experimental data.

Moving aphids appear (naively) to follow a correlated random walk [24]; see Fig. 1B. In an (unbiased) correlated random walk, an individual walks in a straight line of a certain (random) step length , turns from its previous heading at an angle *θ* that is random but drawn from a mean-zero distribution, and then repeats. In our model, we will assume that the correlated random walk parameters depend solely on distance to nearest neighbor, similar to the transition probabilities discussed above. For step length, we choose the simplest model, meaning that there is no spread in the step length distribution. A moving aphid's step length depends deterministically on its distance to nearest neighbor *d*. For turning angle *θ*, the mean of the distribution is zero by the assumption of symmetry of the correlated random walk. Therefore, we model dependence on *d* in the spread *ρ* of the turning angle distribution.

We will now quantify our four model parameters: probability of a moving aphid stopping (*P _{MS}*), the probability of a stationary aphid starting to move (

*P*), a moving aphid's step length traveled in one frame (), and the spread of the turning angle distribution that a moving aphid obeys (

_{SM}*ρ*). Each of these will depend on distance to an aphid's nearest neighbor

*d*through simple functional forms with three or four parameters, which we estimate from experimental data below.

To estimate the transition probabilities *P _{MS}* and

*P*, we note that our data set (see previous section) includes a motion state for each entry. We can classify every transition that occurs in the data set as stationary to stationary (

_{SM}*SS*), stationary to moving (

*SM*), moving to moving (

*MM*) or moving to stationary (

*MS*). We divide the data set in two, with

*SS*and

*SM*in one subset and

*MM*and

*MS*in the other. For each subset, we generate bins of 800 data points where binning is performed according to

*d*. Within each bin, we estimate the probability of a transition as the ratio of the number of occurrences of the transition to the total number of observations. For instance, within a given bin, we estimate

*P*as (1)

_{MS}We then form a scatterplot of the probability within each bin versus the midpoint of the bin, resulting in Fig. 2.

(A) *P _{MS}*, the probability that an aphid moving in a given timestep becomes stationary at the next timestep. Each data point represents the probability within a bin of 800 elements from our experimental data set, where the data are binned by

*d*. The probability is calculated via a simple frequency count according to Eq. (1). The overall dependence of the data on

*d*is modeled with Eq. (2), which describes an increased probability of an aphid settling if a neighbor is nearby. Best fit parameters appear in the text; the coefficient of determination is

*R*

^{2}= 0.92. To give a further sense of the efficacy of the fit, we display each point according to the standard error of the mean within the bin it represents. If the model curve passes within two standard errors of the estimated value, we show it as a green square; otherwise, it is a red dot. (B) Like (A), but for the probability

*P*that a stationary aphid starts moving. The model is Eq. (4), describing higher aphid mobility at very short and very long

_{SM}*d*. Here,

*R*

^{2}= 0.52; see text for discussion.

The probability *P _{MS}*, shown in Fig. 2A, appears to decrease monotonically with

*d*and level off. We model this decrease with the functional form (2)

The probability Here, represents the probability that an aphid will become stationary when infinitesimally close to its nearest neighbor, whereas is the probability of transitioning when isolated, that is, even in the absence of sensed neighbors. The length scale *d _{MS}* characterizes the transition between the two limiting regimes of

*d*. The choice of a decaying exponential function not only agrees well with the data (as discussed presently) but has biological motivation. If one assumes that the motion state transition occurs due to sensing, and that the sensory input an aphid receives has a constant probability of failure per distance displaced from its source, then one obtains an exponential model, a common choice for aggregation modeling [48]. Overall, the model Eq. (2) reflects aphids being more likely to settle near other individuals, in order to aggregate.

To fit Eq. (2) to the experimental data, we first observe that and appear linearly while *d _{MS}* appears nonlinearly. We minimize the root-mean-square (RMS) error of the fit by scanning across values of

*d*and at each value, performing a least squares fit for the two linear parameters. We find , , and , resulting in a fit (shown as the blue curve) with a high coefficient of determination, . To give a further sense of the efficacy of the fit, it is helpful to consider the standard error in each bin, which is given by (3)where

_{MS}*N*is the number of aphids per bin and

*P*=

_{MS}*P*(

_{MS}*d*) is the probability of transition within the bin. Green squares (red dots) represent bins for which the corresponding model prediction is within (outside of) two standard errors of the estimated

*P*.

_{MS}The probability *P _{SM}* is shown in Fig. 2B. Unlike

*P*which decreases monotonically,

_{MS}*P*has a minimum at short distances. We choose the functional form (4)

_{SM}The first (exponential) term models collision avoidanceThe first (exponential) term is repulsive, consistent with the notion that aphids avoid settling too close to others. The second (rational) term is a “loneliness” term, capturing that aphids move more when they are in isolation.is attractive, modeling the tendency of solitary aphids to move in order to aggregate. Together, these two terms specify a particular distance at which an aphid is most likely to be stationary (namely the value of *d* that minimizes *P _{SM}*, which for our parameters is approximately 0.014

*m*). We fit this functional form to the data through a procedure similar to

*P*, except that we must now search over a grid of two nonlinear parameters, and . We find , , , and . To compare each data point to the model, we use the same green square/red circle scheme as above. The overall fit has . This coefficient of determination, substantially lower than for

_{MS}*P*, is likely due to the large scatter of the data for large

_{MS}*d*, which may reflect two sources of error. First, imaging and tracking of aphids is more difficult when they are in the vicinity of the boundary of the arena, and aphids at large

*d*are more likely to be near a boundary. Second, it is possible that there is an explicit effect of the boundary on aphids' behavior which we have not modeled here.

We tried several functional forms (including linear combinations of exponentials) but choose Eq. (4), which minimizes the RMS error with two pairs of parameters. We believe the exact functional form is less important than the trends of higher mobility at both very short and very long distances.

We now turn to the parameters governing moving aphids' correlated random walks. Fig. 3 shows the mean step length as a function of *d*, with each point in the scatterplot corresponding to a bin of 800 data points. Because there is a coherent rise in the data for small *d*, we consider the model (5)

Each data point represents the mean step length within a bin of 800 elements from our experimental data set, where the data are binned by *d*. The overall dependence of the data on *d* is modeled with Eq. (5), which captures the tendency of aphids to aggregate simply by traveling less when in the vicinity of others. Best fit parameters appear in the text; the coefficient of determination is *R*^{2} = 0.82. To give a further sense of the efficacy of the fit, we display data points according to the same scheme used in Fig. 2. Green squares (red dots) represent data bins for which the model prediction falls within (outside) two standard errors of the experimental mean.

According to this model, aphids with neighbors nearby take short steps, and the step length increases and saturates as *d* increases. Using a similar fitting procedure to *P _{MS}* and

*P*, we find , , and . Within each bin, the standard error around the mean is where

_{SM}*N*is the number of observations and

*s*is the sample standard deviation. To compare experimental bins with the model prediction, we use the same green squares/red dot visualization as above. For the overall fit, we find . The data decrease moderately from our model curve for , which is half the radius of our experimental arena. Once again, we believe that we may be seeing biases due to the boundary and the increased difficulty of motion tracking near the boundary.

Finally, we model the spread of the distribution of turning angles *θ*. We bin *θ* values by *d* with 2400 values per bin (larger than the previously used value of 800 in order to help reduce the standard error within each bin). As alluded previously, within every data bin, the distribution is strongly peaked around zero; see the examples in Fig. 4B and Fig. 4C. Therefore, to capture the effect of neighbors, it is necessary to model the spread of the distribution of *θ*, which indeed appears to depend on *d*. Since *θ* is an angular distribution, it exists on the interval . Wrapped normal distributions give a poor fit to our data (not shown). We instead select the wrapped Cauchy distribution [49] centered at zero, (6)where is a parameter governing the spread of the distribution. Small values of *ρ* correspond to more spread distributions, whereas values closer to one result in strongly peaked distributions. Fig. 4A shows *ρ* as a function of *d* for the binned data. As a model, we select the functional form (7)

(A) Turning angle distribution parameter 0<*ρ*<1 as a function of distance to an aphid's nearest neighbor. Here, *ρ* is a parameter in the zero-mean wrapped Cauchy distribution Eq. (6) used to model turning angle *θ*. Each data point represents the experimentally measured value of *ρ* within a bin of 2400 elements from our experimental data set, where the data are binned by *d*. The overall dependence of the data on *d* is modeled with Eq. (7), which captures the tendency of aphids to aggregate by taking wider turns when in the vicinity of others, leading to motion that is more diffusive and less ballistic. Best fit parameters appear in the text; the coefficient of determination is *R*^{2} = 0.99. Green circles (red dots) points represent data bins for which the model prediction falls within (outside) a 95% confidence interval around the experimentally measured *ρ*, where the interval is constructed by resampling our original data 20,000 times. (B) Normalized histogram showing the experimental turning angle distribution within the data bin corresponding to the magenta triangle in (A). The blue curve shows the wrapped Cauchy distribution predicted by our model. (C) Like (B), but for the magenta diamond.

According to this model, aphids with nearby neighbors will turn more often at wider angles, resulting in motion that is less ballistic and more diffusive.

Fitting the model as described previously, we find , , and . To compare the experimental data and the model within a given bin, we calculate a 95% confidence interval by resampling the data in each bin thousands of times, calculating *ρ* each time, and considering the resulting distribution of values of *ρ*. If the value of *ρ* predicted by our model falls within the central 95% of the sampling distribution, we show the data point in Fig. 4A as a green square; otherwise it is a red dot. For the fit of Eq. (7), we find .

In summary, our model consists of just four quantities: *P _{MS}*,

*P*, , and

_{SM}*ρ*. Each of these depends on

*d*via three or four parameters. In total, we have fit 13 parameters, but we note that there are over one million entries in our data set.

As alluded previously, one component ignored in the model is the arena's boundary. While it is quite likely that the presence of a boundary wall influences aphids' movement, the majority of our data set is composed of aphids far from the boundary. Fig. 5 shows the cumulative distribution function of distance to boundary for the entire data set. Only 10% of our data is within 2 *cm* of the boundary (4 or 5 aphid body lengths), and we leave the quantification of boundary effects as future work.

The circular experimental arena has a radius of 0.2 *m*. Only 10% of the data set corresponds to aphids within 2 *cm* (about five body lengths) of the boundary.

With our model for individual-level behavior established, we will presently assess the degree to which it reproduces group-level behaviors. For comparison and contrast, we also consider a control model in which aphids do not interact at all. For this non-interaction model, we use the asymptotic (limit of large *d*) values of the parameters in our individual-level model. That is, we set , , , and .

## Simulation and Analysis of Group-Level Behaviors

We now shift our focus to group-level behaviors. We compare the experimental data (*EXP*) with data simulated from the two models developed above, namely the one in which aphids interact with their nearest neighbor (model *INT*) and the one in which aphids do not interact (model *NON*). For each model, we carry out simulations parallel to each experimental run, that is, having the same initial aphid positions and containing the same number of frames. We augment the individual-level behaviors with a rule for what simulated aphids do if they encounter the (simulated) arena boundary. If an aphid travels to a new position that would be outside of the arena, we apply a simplistic reflective boundary condition in which the angle of incidence on the boundary equals the angle of reflection. Also, we let the distance the aphid travels once it reflects off the wall be the distance it would have travelled beyond the boundary.

We will compare three different group-level behaviors by studying their corresponding cumulative distribution functions as computed across each data set. A cumulative distribution tells, for any particular value of a data variable (horizontal axis) the percentage of data in the data set that is less than or equal to that value (vertical axis). It will be convenient to call our cumulative distributions , , , where the subscript *i* indexes the distribution (since it is discrete). Our strategy will be to make three pairwise comparisons for each group-level behavior, namely vs. , vs. , and vs. . It is also convenient to define the underlying probability distributions, , , . For each pairwise comparison we will calculate several different quantities. A simple comparison is the distance between median values of the probability distributions, which we refer to as . Another choice is the Kolmogorov-Smirnov distance [50], [51], a common nonparametric measure. For two cumulative distributions and , the maximum vertical distance between two cumulative distributions. Finally, we consider the Kullback-Leibler divergence [52]. This quantity measures the information lost when a distribution is used to approximate another distribution, . It is defined as (8)where for us, the superscript 1 and 2 will refer to one of our three data sets. Results appear in Table 1, Table 2, Table 3. We do not perform statistical hypothesis testing using , , and because we have no null hypothesis that our models and experiment produce statistically indistinguishable data. Rather, we expect that they are different, and we simply use empirical measures to assess the closeness of the model distributions to the experimental one.

The first group-level behavior we consider is the distribution of nearest neighbor distances *d* that emerges through an experiment or simulation. The cumulative distributions are shown in Fig. 6(A), with *EXP* as solid blue, *INT* as dashed green, and *NON* as dot-dashed red. Statistical measures are given in Table 1. We see that is smaller for *EXP* vs. *INT* than for *EXP* vs. *NON* by approximately a factor of two. Put differently, the shorter median *d* for *INT* (as opposed to *NON*) indicates that the social behaviors in the model indeed promote aggregation. The experimental curve has an even shorter . Model *INT* appears to capture some (but not all) of the aggregative tendency seen in the experiment. The Kolmogorov-Smirnov distance, , is smaller between *EXP* and *INT* than *EXP* and *NON*, as is . Thus, by all three measures, *INT* captures more of the experimental behavior than *NON* does.

(A) Cumulative distributions of distance to nearest neighbor *d* (in *m*) for experimental data set (solid blue), social interaction model (dashed green), and non-interacting model (dot-dashed red). (B) Like (A), but the cumulated quantity is angle to nearest neighbor *φ* (relative to an aphid's heading ). (C) Like (A), but the cumulated quantity is , fraction of the aphid population moving in a given frame. As compared to the curves in (A) and (B), the more staircase-like appearance of these curves arises simply from the fact that the variable being cumulated is discrete (percentage of aphids in a group of several dozen) as opposed to the continuous variables in (A) and (B). For (A)–(C), measures of the difference between the distributions are given in Tables 1–3 respectively.

The second group-level behavior we consider is the distribution of angle to nearest neighbor, *φ*, measured relative to an aphid's heading . The cumulative distributions and statistical information appear in Fig. 6(B) and Table 2. The graph reveals that *EXP*, *INT*, and *NON* all give rise to a uniform distribution of relative orientation (reflected by the linear cumulative profile). Therefore, aphids in experiment and in both models do not preferentially align towards their nearest neighbors.

Finally, we consider the third group-level behavior, the distribution of the fraction of aphids moving at a given time. The cumulative distributions and statistical information appear in Fig. 6(C) and Table 3. They are strikingly different. As with the distributions for *d*, *INT* reproduces much more of the behavior of *EXP* than *NON* does. The extreme rightward shift of the red curve indicates that the mobility of aphids is much higher in *NON*; put differently, aphids in this model do not aggregate and settle nearly as much as in *EXP* and *INT*.

## Conclusion

Through experiment and modeling, we have investigated the movement, social behavior, and aggregation of the pea aphid. Motion-tracked experimental data gives rise to a two-state model in which aphids transition stochastically between stationary and moving states. Moving aphids follow a correlated random walk. The state transition probabilities *P _{MS}* and

*P*, the random walk step length , and the random walk turning angle distribution spread

_{SM}*ρ*all depend on distance to an aphid's nearest neighbor,

*d*. These four quantities have each been fit with a functional form incorporating three or four parameters whose values we estimated from the experimental data. To assess the efficacy of our model in reproducing group-level behaviors, we compared experimental data to outputs of our social nearest neighbor model and a control (noninteracting) model. We found that the social model reproduces the distribution of nearest neighbors and the distribution of fraction of moving aphids better than the control model. The experiment and both models display no difference for a third group-level property, namely angle to nearest neighbor.

Our mathematical model is strikingly different from some previous data-driven aggregation models. The model of golden shiner fish in [30] and the model of surf scoter ducks in [32] are primarily deterministic, describing organisms that simultaneously attract, repel, and align. In these studies, noise additively modulates an organism's intended direction at each time step, presumably to describe errors in sensing and movement capabilities. In contrast, our model has rules that are fundamentally random. Fig. 2 shows that aphids under similar conditions (same distance to nearest neighbor) display different behaviors (transitioning vs. not transitioning motion state). Fig. 3 and Fig. 4 suggest that the movement process for aphids is a random walk.

The biological conclusions of our work are as follows. First, we have provided strong quantitative evidence that pea aphids display social behavior, in that an individual's movement in a featureless environment is influenced by its nearest neighbor.

Second, we have gained insight into the mechanism by which aphids aggregate. The probability of a stationary aphid starting to move decreases if a neighbor is nearby. The probability of a moving aphid stopping increases if a neighbor is nearby. These two behaviors promote aggregation. Further, aphids that are moving take shorter steps and turn more when in the vicinity of neighbors, promoting motion that is more diffusive and less ballistic (that is, less likely to move it away from the neighbor). This is reminiscent of the classic run-and-tumble model of bacteria [53]. In short, aggregation occurs through movement decreasing in the proximity of other aphids as opposed to direct locomotion towards individuals or clusters.

ThirdFinally, our model of individual-level behavior gives some feeling for the sensing range of the aphid. We recall the exponential length scales , , , and . These characteristic length scales are on the order of 1–3 aphid body lengths.

As evidenced by the metrics in the previous section, our individual-based social model reproduces group-level featuresbehaviors muchbetter than a control model. There remain many avenues for further investigation. While we have demonstrated that pea aphid behavior promotes aggregation, we have not focused on quantifying the degree of aggregation (beyond measuring the distribution of distance to nearest neighbor). One could investigate the typical population size of an aggregation and the typical time scales of an aggregation's formation and existence. Furthermore, we have not captured all of the experimental complexity in our simple model. As mentioned throughout, we have ignored the effects of the boundary. It would be useful to quantify more precisely the rules an aphid obeys when it encounters an immovable obstacle such as a boundary. Additionally, our model is arguably the simplest possible social model, in which social effects depend on a single nearest neighbor. One could investigate the degree to which an aphid responds simultaneously to multiple neighbors, keeping in mind the limits of aphid cognition. Finally, it could be interesting to augment our work, which describes aphid aggregation the absence of environmental cues, with a consideration of external factors such as nutrition sources. Such an investigation might shed further light on the aphid's role as a destructive crop pest.

## Acknowledgments

Ken Moffett of the Macalester College machine shop built the experimental arena. Matthew Beckman of Augsburg College provided advice on our experimental setup. Raibatak Das of the University of Colorado, Denver shared a template of M atlab code helpful in our image processing and tracking. We benefitted from statistical discussions with Alicia Johnson, Victor Addona, and Danny Kaplan. As part of his undergraduate research experience at Macalester College, Trevor McCalmont contributed to a prototype of the experiment and model during early stages of this work. We are grateful to Macalester College for laboratory space in the XMAC (eXperiment, Modeling, Analysis and Computation) laboratory.

## Author Contributions

Conceived and designed the experiments: JP RS ML CMT. Performed the experiments: JP RS ML. Analyzed the data: CN JP OW BM RS ML AJB CMT. Wrote the paper: CN JP OW AJB CMT.

## References

- 1.
Parrish JK, Hamner WM, editors (1997) Animal Groups in Three Dimensions. Cambridge: Cambridge University Press.
- 2.
Okubo A, Levin SA, editors (2001) Diffusion and Ecological Problems, volume 14 of
*Interdisciplinary Applied Mathematics: Mathematical Biology*. New York: Springer, second edition. - 3.
Camazine S, Deneubourg JL, Franks NR, Sneyd J, Theraulaz G, et al.. (2001) Self-Organization in Biological Systems. Princeton Studies in Complexity. Princeton: Princeton University Press.
- 4. Breder CM (1954) Equations descriptive of fish schools and other animal aggregations. Ecol 35: 361–370.
- 5. Mogilner A, Edelstein-Keshet L (1999) A non-local model for a swarm. J Math Bio 38: 534–570.
- 6. Couzin ID, Krause J, James R, Ruxton GD, Franks NR (2002) Collective memory and spatial sorting in animal groups. J Theor Biol 218: 1–11.
- 7. Eftimie R, de Vries G, Lewis MA (2007) Complex spatial group patters result from different animal communication mechanisms. Proc Natl Acad Sci 104: 6974–6979.
- 8. Eftimie R, de Vries G, Lewis MA, Lutscher F (2007) Modeling group formation and activity patterns in self-organizing collectives of individuals. Bull Math Bio 69: 1537–1565.
- 9.
Okubo A, Grünbaum D, Edelstein-Keshet L (2001) The dynamics of animal grouping. In: Okubo A, Levin SA, editors, Diffusion and Ecological Problems, New York: Springer, volume 14 of
*Interdisciplinary Applied Mathematics: Mathematical Biology*, chapter 7. Second edition, pp. 197–237. - 10.
Tilman D, Kareiva P, editors (1998) Spatial Ecology: The Role of Space in Population Dynamics and Interspecific Interactions. Princeton, NJ: Princeton University Press.
- 11.
Bonabeu E, Dorigo M, Theraulaz G (1999) Swarm Intelligence: From Natural to Artificial Systems. Santa Fe Institute Studies in the Sciences of Complexity. New York: Oxford University Press.
- 12.
Passino KM (2005) Biomimicry for Optimization, Control, and Automation. London: Springer.
- 13. Parrish JK, Edelstein-Keshet L (1999) Complexity, pattern, and evolutionary trade-offs in animal aggregation. Science 284: 99–101.
- 14. Flierl G, Grünbaum D, Levin S, Olson D (1999) From individuals to aggregations: The interplay between behavior and physics. J Theor Biol 196: 397–454.
- 15. Levine H, Rappel WJ, Cohen I (2001) Self-organization in systems of self-propelled particles. Phys Rev E 63: 017101.1–017101.4.
- 16. Topaz CM, Bertozzi AL (2004) Swarming patterns in a two-dimensional kinematic model for biological groups. SIAM J Appl Math 65: 152–174.
- 17. Topaz CM, Bertozzi AL, Lewis MA (2006) A nonlocal continuum model for biological aggregation. Bull Math Bio 68: 1601–1623.
- 18. D'Orsogna MR, Chuang YL, Bertozzi AL, Chayes L (2006) Self-propelled particles with soft-core interactions: Patterns, stability, and collapse. Phys Rev Lett 96: 104302.
- 19. Leverentz AJ, Topaz CM, Bernoff AJ (2009) Asymptotic dynamics of attractive-repulsive swarms. SIAM J Appl Dyn Sys 8: 880–908.
- 20. Bernoff AJ, Topaz CM (2011) A primer of swarm equilibria. SIAM J Appl Dyn Sys 10: 212–250.
- 21. Fetecau RC, Huang Y, Kolokolnikov T (2011) Swarm dynamics and equilibria for a nonlocal aggregation model. Nonlinearity 24: 2681–2716.
- 22. Kolokolnikov T, Sun H, Uminsky D, Bertozzi AL (2011) Stability of ring patterns arising from two-dimensional particle interactions. Phys Rev E 84: 015203.
- 23.
Fetecau RC, Huang Y (2012) Equilibria of biological aggregations with nonlocal attractive-repulsive interactions. Physica D.
- 24.
Turchin P (1998) Quantitative Analysis of Movement: Measuring and Modeling Population Redistribution in Animals and Plants. Sinauer Associates.
- 25. Gautrais J, Jost C, Soria M, Campo A, Motsch S, et al. (2009) Analyzing fish movement as a persistent turning walker. J Math Bio 58: 429–445.
- 26. Viswanathan GM, Afanasyev V, Buldyrev SV, Murphy EJ, Prince PA, et al. (1996) Lévy flight search patterns of wandering albatrosses. Nature 381: 413–415.
- 27.
Shlesinger MF, Zaslavsky GM, Frisch U (1995) Lévy flights and related topics in physics, volume 450 of
*Lecture Notes in Physics*. Springer. - 28. Topaz CM, Bernoff AJ, Logan S, Toolson W (2008) A model for rolling swarms of locusts. Euro Phys J ST 157: 93–109.
- 29. Topaz CM, D'Orsogna MR, Edelstein-Keshet L, Bernoff AJ (2012) Locust dynamics: Behavioral phase change and swarming. PLoS Comp Bio 8: e1002642.
- 30. Tunstrom K, Katz Y, Ioannou CC, Huepe C, Lutz MJ, et al. (2013) Collective states, multistability and transitional behavior in schooling fish. PLoS Comput Biol 9: e1002915.
- 31. Ballerini M, Calbibbo N, Candeleir R, Cavagna A, Cisbani E, et al. (2008) Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study. Proc Natl Acad Sci 105: 1232–1237.
- 32. Lukeman R, Li YX, Edelstein-Keshet L (2010) Inferring individual rules from collective behavior. Proc Natl Acad Sci 107: 12576–12580.
- 33.
van Emden HF, Harrington R (2007) Aphids as Crop Pests. Centre for Agriculture and Biosciences International.
- 34.
The International Aphid Genomics Consortium (2010) Genome sequence of the pea aphid
*Acyrthosiphon pisum*. PLoS Biol 8: e1000313. - 35. Roitberg BD, Myers JH, Frazer BD (1979) The influence of predators on the movement of apterous pea aphids between plants. J Anim Ecol 48: 111–122.
- 36.
Dixon AFG (1958) Escape responses shown by certain aphids to the presence of Adalia decempunctata. Trans R Ent Soc Lond : 319–334.
- 37.
Kislow CJ, Edwards LJ (1972) Repellent odors in aphids. Nature : 108–109.
- 38.
Mondor EB, Baird DS, Slessor KN, Roitberg BD (2000) Ontogeny of alarm pheromone secretion in pea aphid,
*Acyrthosiphon Pisum*. J Chem Ecol 26: 2875–2882. - 39.
Way MJ, Cammell M (1970) Aggregation behavior in relation to food utilization by aphids. In: Watson A, editor, Animal Populations in Relation to their Food Resources. pp. 229–247.
- 40. Cappuccino N (1987) Comparative population dynamics of 2 goldenrod aphids: Spatial patterns and temporal constancy. Ecology 68: 1634–1646.
- 41.
Kidd NAC (1976) Aggregation in the lime aphid (
*Eucallipterus tiliae L*.): 1. Leaf vein selection and its effect on distribution on the leaf. Oecologia 22: 299–304. - 42.
Kidd NAC (1976) Aggregation in the lime aphid (
*Eucallipterus tiliae L*.): 2. Social aggregation. Oecologia 25: 175–185. - 43.
Strong FE (1967) Aggregation behavior of pea aphids,
*Acyrthosiphon pisum*. Ent Ex & App 10: 463–475. - 44.
Brisson JA (2010) Aphid wing dimorphisms: Linking environmental and genetic control of trait variation. Phil Trans R Soc B : 605–616.
- 45.
Eastop V (1971) Keys for the identification of Acyrthosiphon (Hemiptera:Aphididae). Bull Br Mus Nat Hist Entomol 26.
- 46. Schneider CA, Rasband WS, Eliceiri KW (2012) NIH Image to ImageJ: 25 years of image analys. Nat Methods 9: 671–675.
- 47. Jaqaman K, Loerke D, Mettlen M, Kuwata H, Grinstein S, et al. (2008) Robust single-particle tracking in live-cell time-lapse sequences. Nat Methods 5: 695–702.
- 48. Mogilner A, Edelstein-Keshet L, Bent L, Spiros A (2003) Mutual interactions, potentials, and individual distance in a social aggregation. J Math Bio 47: 353–389.
- 49.
Fisher NI (1993) Statistical Analysis of Circular Data. Cambridge, UK: Cambridge University Press.
- 50. Kolmogorov AN (1933) Sulla determinazione empirica di una legge di distribuzione. Giornale dell'Istituto Italiano degli Attuari 4: 83–91.
- 51. Smirnov N (1948) Table for estimating the goodness of fit of empirical distributions. Ann Math Statis 19: 279–281.
- 52. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22: 79–86.
- 53. Berg HC, Brown DA (1972) Chemotaxis in Escherichia coli analysed by three-dimensional tracking. Nature 239: 500–504.