The authors have declared that no competing interests exist.

Analyzed the data: CMT LZ TH. Wrote the paper: CMT LZ TH. Performed simulations: CMT.

We apply tools from topological data analysis to two mathematical models inspired by biological aggregations such as bird flocks, fish schools, and insect swarms. Our data consists of numerical simulation output from the models of Vicsek and D'Orsogna. These models are dynamical systems describing the movement of agents who interact via alignment, attraction, and/or repulsion. Each simulation time frame is a point cloud in position-velocity space. We analyze the topological structure of these point clouds, interpreting the persistent homology by calculating the first few Betti numbers. These Betti numbers count connected components, topological circles, and trapped volumes present in the data. To interpret our results, we introduce a visualization that displays Betti numbers over simulation time and topological persistence scale. We compare our topological results to order parameters typically used to quantify the global behavior of aggregations, such as polarization and angular momentum. The topological calculations reveal events and structure not captured by the order parameters.

Biological aggregations are groups of organisms such as fish schools, bird flocks, insect swarms, and mammal herds [

Quantitative understanding of aggregations has been developed in part through mathematical modeling. Modeling of aggregations dates back (at least) to the 1950s with the seminal work of [

Quantitative understanding of aggregations has also been developed through the exploration and modeling of rich data sets measured in the field or in experiment. This type of study has a much more recent history, as the technology necessary to gather and process large, accurate data sets did not exist several decades ago. Notable examples include data-based modeling of starlings [

In a classical approach to characterizing collective dynamics, one begins with _{i} and velocities _{i}, either from biological observation or numerical simulation of a model. One then calculates some global metric hoped to give insight into macroscopic dynamics. For instance, [_{ang},
_{i} = _{i} − _{cm} and _{cm} is the center of mass of the group. By varying parameters that control the social interactions and plotting _{ang}, [_{ang} cannot distinguish a single from a double mill, so [_{ang}. Together, _{ang} and _{abs} can distinguish single from double mills. Other examples of metrics include the average number of neighbors with whom an individual interacts (which requires knowledge of interaction rules) and the mean distance to nearest neighbor [

Our discussion here provides a sampling of metrics in the literature. Most are inspired by order parameters from physics, and many have been constructed

We note that the term topology has been invoked in the aggregation literature in a different sense than we use it here. In both the biological modeling literature and the robotics literature, “topology” sometimes refers to the coupling scheme between agents, that is, by which members of the group a given individual is influenced [

The rest of this paper is organized as follows. We begin with an overview of persistent homology. This discussion aims to present some of the key concepts to a mathematical reader unfamiliar with algebraic topology. Brief descriptions of computational methods and our data visualization follow. Then, we proceed to topological analyses of the models of Vicsek,

Homology is a tool from algebraic topology that measures the features of a topological space such as an annulus, sphere, torus, or more complicated surface or manifold. In particular, homology can distinguish these spaces from one another by quantifying their connected components, topological circles, trapped volumes, and so forth. A finite set of data points can be viewed as a (noisy) sampling from an underlying topological space. One can measure the homology of the data by creating connections between proximate data points, varying the scale over which these connections are made, and looking for features that persist across scales. This is called persistent homology. Persistent homology has been used in a wide array of applications to uncover the topological structure of data, including neuroscience, language processing, natural images, signal analysis, bioinformatics, computer vision, and sensor networks [

We explain these ideas in greater detail for the remainder of this section. Our discussions in the first two subsections below recapitulate presentations in texts such as [

To build a global object from a discrete set of

These

To form _{ɛ} in the following way. In _{ɛ}, every collection of _{ɛ} (this graph is sometimes called the _{ɛ}). A 3-simplex (a tetrahedron) is formed whenever four points are pairwise within

The 18 points are 0-simplices. Two 0-simplices form a 1-simplex (an edge) if their

For purposes of homology, it is necessary to impose an orientation on the vertices of each _{0}, _{1}, …, _{k}] of _{0}, …, _{i}, …, _{j}, …, _{k}] = − [_{0}, …, _{j}, …, _{i}, …, _{l}]. The simplices in

There are other methods one could use to form a simplicial complex. The Rips complex is the flag or clique complex that is the maximal simplicial complex built from the underlying graph. Another commonly used method is the Cech complex, where

Homology is a way to uncover _{ɛ}. For each _{ɛ}, so that the dimension of _{p} (integers modulo _{i} _{i} _{i}, where _{i} ∈ ℤ_{p} and the sum is over all _{i} in _{ɛ}.

To compute homology, one must be able to describe the boundary of a _{0}, _{1}, …, _{k}] by
_{0}, …, _{k}] by removing the vertex _{i}. For example,
_{k}(_{0}, _{1}, _{2}] represents a triangle and [_{1}, _{2}] − [_{0}, _{2}] + [_{0}, _{1}] are the oriented edges that form its boundary.

Boundary operators connect the vector spaces _{k}○∂_{k+1} = 0. That is, “a boundary has no boundary”. For example,
_{ɛ},

The goal of homology is to “discard” cycles that are also boundaries. To this end, we put an equivalence relation on _{1} ∼ _{2}, if they differ by a boundary, _{0}, _{1}] + [_{1}, _{2}] + [_{2}, _{3}] + [_{3}, _{4}] + [_{4}, _{0}] and the red 1-chain _{1}, _{2}] + [_{2}, _{3}] + [_{3}, _{4}] + [_{4}, _{1}] are cycles because ∂_{1}(_{1}(_{0}, _{1}] + [_{4}, _{0}] − [_{4}, _{1}] = [_{1}, _{4}] − [_{0}, _{4}] + [_{0}, _{1}] = ∂[_{0}, _{1}, _{4}].

The blue 1-cycle and the red 1-cycle are homologous (equivalent), because their difference is the boundary of a triangle, shown in green; see text for a detailed explanation.

The equivalence relation ∼ defined above partitions the _{ɛ} as the set of homology classes
_{k} is the dimension of the vector space

In terms of the topological characteristics one might hope to measure,

Given a collection of

The top four figures display the simplicial complex of 18 points for different values of the proximity parameter

To reconcile this ambiguity, one exploits the fact that as _{1} ≤ _{2} ≤ ⋯ ≤ _{M}, then we have an inclusion of simplicial complexes

A convenient way to visualize persistent homology is through a graphical representation called a

Another method of displaying homological information is through a

(A) Random initial positions (

Because our data sets are obtained from numerical simulation, they are noiseless. Still, one might wonder whether small perturbations of data would impact the topological features that are measured. For biological aggregations, this question would be especially relevant in analyzing experimental data, for which measurement error might introduce noise. As shown in [

In the next two sections, we will present topological analyses of simulation output from the two aggregation models of [

The computational complexity of computing ^{3}), where

To extract topological information, we process simulation data in the statistical computing environment

Because we have a series of simulation time steps, we introduce a visualization that captures homological persistence over both scale

As an example, consider

The simulation that generated these data was seeded with the initial condition in

In summary, large regions in the contour diagram (excluding

Using persistent homology, we now analyze data generated by aggregation models. One of the most referenced aggregation models is that of Vicsek and collaborators [

The Vicsek model is a dynamical system in discrete time and continuous space that describes the motion of interacting point particles in a square with periodic boundary conditions. The model appears in the literature written in different forms; we write it as
_{i}(^{2} is the position of particle _{i} is velocity. We refer to the angle of the velocity vector _{i} as _{i}, the heading. Additionally, _{0} is a constant,

For clarity, let us re-state the model in prose. To update the model, each particle must be given a new heading. This heading is the average of the previous headings of all other particles within a radius _{0}Δ

The parameters in the model are the number of particles _{0}, the box size ℓ, and the time step Δ_{0}, and ℓ. Some studies refer to three effective parameters: _{0}, and ^{2}, a particle density.

Two preliminary matters will build understanding prior to a discussion of results. First, we analyze the topology of an initial condition. Second, we discuss the classic order parameter used in the physics literature to characterize the global dynamics of the system.

^{3} but rather ^{1} × ^{1} × ^{1} = 𝕋^{3}, the three-torus, which has Betti numbers

As discussed in the introduction, a traditional approach is to characterize global behavior via an order parameter. For (9), the order parameter most often studied is the normalized average velocity of the group,

The model (9) can display three qualitatively different global behaviors, depending on parameters. We visualize snapshots of these states in

These simulations are analogous to Fig. 1 in [_{0} = 0.03, and the initial state consists of uniform random positions and headings. We vary box size ℓ and noise

Further analyzing this region, consider the times marked by the two dashed gray bars, namely

These states correspond to the dashed vertical bars in

Taken together, panels (B) and (C) of

A typical snapshot is shown in

A typical snapshot is shown in

We now return attention to the region of

For both the first and third simulations, the order parameter

Another model is that of D’Orsogna and collaborators [

The D’Orsogna model is a continuous-time dynamical system that describes the motion of interacting point particles in an unbounded plane. The model takes the form of Newtonian force equations and thus is second order in time. The equations are
_{i}(^{2} is the position of particle _{i}(^{2} is velocity. The first equation simply defines velocity as the derivative of position. The second equation is Newton’s law, stating that mass times acceleration is equal to a sum of forces. These forces include self-propulsion of strength _{r} and characteristic length scale _{r}. The second term is similar, but describes attraction of strength _{a} and characteristic length scale _{a}. Put together, these two terms are similar to potentials used in molecular physics. In biological scenarios, typically _{r} < _{a} and _{r} > _{a}, meaning that repulsion occurs over shorter distances and is stronger. For an isolated pair of particles interacting solely according to this attractive-repulsive rule, and for appropriately chosen parameters, the potential has a unique minimum, and there exists an equilibrium distance at which attraction and repulsion balance. When one deals with an ensemble of

Arguably, one of the most intriguing behaviors of the model is the formation of mills, occurring in certain parameter regimes. These structures are annular in shape, with particles rotating around a hollow core. In a single mill, all particles travel with the same orientation (clockwise or counterclockwise). In a double mill, some particles travel clockwise and some travel counterclockwise. It is helpful to think about the topology of these states. A mill and a double mill have distinct topologies in four dimensional position-velocity space. The single mill is one connected component and one topological circle, that is,

We conduct a simulation of (11) with _{a} = 0.5, _{r} = 1, _{a} = 2, _{r} = 0.5.

Circles indicate positions of the _{r} = 1, _{r} = 0.5, _{a} = 0.5, _{a} = 2.

_{abs}, defined in (_{abs} approaches unity signals that the asymptotic behavior of the group is rotational. The fact that

Snapshot of the time evolution are shown in _{abs} (blue). (B) Contour plot of Betti number

Panel (C) shows

Pulling together the information from panels (B) and (C), we conclude the following. At times below

Inspired by physics, order parameters such as polarization and angular momentum have been useful for characterizing the global behavior of biological aggregations. We propose topological data analysis as an additional, valuable technology for understanding their group behavior.

We have performed numerical simulations of two well-known mathematical models of biological aggregations, resulting in point clouds of data that evolve in time. To understand the global behavior of each model, we study the topological structure of the point clouds by calculating their persistent homology. More specifically, we compute Betti numbers, which count connected components, topological circles, trapped volumes, and so forth.

To interpret the topological computations, we introduce a new visualization tool, namely a Contour Realization Of Computed

In Vicsek’s model of aligning particles, the homological measures distinguish simulations that the usual alignment order parameter cannot. They also find topological similarity between simulations with different order parameter time series. In D’Orsogna’s model of self-propelled, attracting-repelling particles, the topological calculations recognize the presence of a double mill state. In our study we have, for tutorial purposes, sought to explain our CROCKER plots by a subsequent manual examination of the data. That said, though phenomena such as group alignment, clustering, and double mills could be seen upon detailed examination of our raw simulation data, we would not have found them by eye if the topological methods had not first detected them.

One limitation of our work is that we have only calculated the first two Betti numbers,

One attempt to address time evolution of topological features uses vineyards, which have been applied to protein folding in the context of level set persistence [

Topological data analysis is an active and growing area of current research. We hope that our work above contributes to the toolkit that applied mathematicians might bring to bear on models they study.