Skip to main content
  • Loading metrics

Topological data analysis distinguishes parameter regimes in the Anderson-Chaplain model of angiogenesis

  • John T. Nardini,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Mathematics, North Carolina State University, Raleigh, North Carolina, United States of America

  • Bernadette J. Stolz,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Mathematical Institute, University of Oxford, Oxford, United Kingdom

  • Kevin B. Flores,

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Mathematics, North Carolina State University, Raleigh, North Carolina, United States of America

  • Heather A. Harrington,

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Mathematical Institute, University of Oxford, Oxford, United Kingdom, Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom

  • Helen M. Byrne

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Mathematical Institute, University of Oxford, Oxford, United Kingdom


Angiogenesis is the process by which blood vessels form from pre-existing vessels. It plays a key role in many biological processes, including embryonic development and wound healing, and contributes to many diseases including cancer and rheumatoid arthritis. The structure of the resulting vessel networks determines their ability to deliver nutrients and remove waste products from biological tissues. Here we simulate the Anderson-Chaplain model of angiogenesis at different parameter values and quantify the vessel architectures of the resulting synthetic data. Specifically, we propose a topological data analysis (TDA) pipeline for systematic analysis of the model. TDA is a vibrant and relatively new field of computational mathematics for studying the shape of data. We compute topological and standard descriptors of model simulations generated by different parameter values. We show that TDA of model simulation data stratifies parameter space into regions with similar vessel morphology. The methodologies proposed here are widely applicable to other synthetic and experimental data including wound healing, development, and plant biology.

Author summary

Vascular networks play a key role in many physiological processes, by delivering nutrition to, and removing waste from, biological tissues. In cancer, tumors stimulate the growth of new blood vessels, via a process called angiogenesis. The resulting vascular structure comprises many inter-connected vessels which lead to emergent morphologies that influence the rate of tumor growth and treatment efficacy. In this work, we consider several approaches to summarize the morphology of synthetic vascular networks generated from a mathematical model of tumor-induced angiogenesis. We find that a topological approach can be used quantify vascular morphology of model simulations and group the simulations into biologically interpretable clusters. This methodology may be useful for the diagnosis of abnormal blood vessel networks and quantifying the efficacy of vascular-targeting treatments.


Blood vessels deliver nutrients to, and remove waste products from, tissues during many physiological processes, including embryonic development, wound healing, and cancer [13]. The structure and form of connected blood vessels (e.g., vascular morphology), determine how nutrients and waste are supplied to or removed from the environment and, in turn, influence the behavior of the tissue and its constituent cells. The morphology of a vascular network can reveal the presence of an underlying disease, or predict the response of a patient to treatment. High resolution vascular imaging technology creates an exciting opportunity to develop mathematical tools to discover new links between blood vessel structure and function.

Here, we focus on tumor-induced angiogenesis, the process by which tumor cells stimulate the formation of new blood vessels from pre-existing vasculature [1]. When oxygen and nutrient levels within a population of tumor cells become too low to sustain a viable cell population, the tumor cells produce several growth factors, including vascular endothelial growth factor (VEGF), platelet-derived growth factor (PDGF), and basic fibroblast growth factor (bFGF), which diffuse through the surrounding tissue [46]. On contact with neighboring blood vessels, these tumor angiogenesis factors (or TAFs) increase the permeability of the vessel walls and loosen adhesive bonds between the endothelial cells that line the blood vessels [7]. The TAFs activate the endothelial cells to release proteases that degrade the basement membrane [8]. Activated tip endothelial cells then migrate away from the parent vessel and follow external cues, such as spatial gradients of VEGF and/or fibronectin [9, 10]. Stalk endothelial cells located behind the tip cells proliferate into the surrounding tissue matrix, causing the emerging vessel sprouts to elongate [11]. When tip endothelial cells from one sprout come into contact with tip or stalk endothelial cells from another sprout, the two endothelial cells may fuse or anastomose together, forming a new loop through which oxygen and nutrients may be transported [9]. In Fig 1 we present a schematic model of how these processes are coordinated.

Fig 1. Schematic overview of tumor-induced angiogenesis.

Schematic depicting seven aspects angiogenesis that are incorporated in the Anderson-Chaplain model of tumor-induced angiogenesis [17]. (1) Distant tumor cells release a range of chemoattractants, including vascular endothelial growth factors and basic fibroblast growth factors that stimulate the formation of new blood vessels. These growth factors are described collectively as a single, generic tumor angiogenesis factor (TAF). (2) Production and consumption of tissue-matrix bound fibronectin by the endothelial cells creates a spatial gradient of fibronectin across the domain. (3) Endothelial cells sense the TAF and fibronectin gradients and undergo individual cell migration. (4) As endothelial tip cells migrate, via chemotaxis, up spatial gradients of the TAF, stalk cells in the developing vessels are assumed to proliferate, creating what has been termed a “snail trail” of new endothelial cells. (5) Endothelial tip cells also migrate, via haptotaxis, up spatial gradients of fibronectin. (6) Endothelial cells in existing sprouts may initiate the formation of secondary sprouts. (7) If a sprout coincides with an existing vessel, then it is assumed to be annihilated and a new loop is formed; if two sprouts coincide or anastomose, then both are assumed to be annihilated and a new loop forms.

As the number of experimental and theoretical studies of angiogenesis has increased, our knowledge of the mechanisms involved in its regulation has increased beyond the idealized description given above [12, 13]. For example, subcellular signalling involving the VEGF and delta-notch signalling pathways is known to influence whether an endothelial cell adopts a tip or stalk cell phenotype [14]. Further, immune cells, including macrophages that are present in the tumor microenvironment, are known to release TAFs such as VEGF and matrix degrading proteases which facilitate tumor angiogenesis [12]. Mechanical stimuli also influence endothelial cell migration. For example, the fluid shear stress experienced by endothelial cells lining blood vessels drives their migration by causing them to form lamellipodia and focal adhesions at the front of the cell, in the flow direction, and to retract focal adhesions at the rear [15, 16].

In this paper, we explore several approaches for quantifying the morphology of vascular networks that form during angiogenesis, including recently developed techniques from topological data analysis. We develop and test our methodology through application to simulated data from a highly idealized, and well known hybrid (i.e., discrete and continuous) stochastic mathematical model of tumor-induced angiogenesis proposed by Anderson and Chaplain [17] (see Fig 1 for a model schematic).

The Anderson-Chaplain model of tumor angiogenesis is based on continuum models proposed by Balding and McElwain [18] and Byrne and Chaplain [19], as well as the stochastic model proposed by Stokes and Lauffenberger [20]. It is an on-lattice model in which endothelial cells perform a biased random walk and generate blood vessel networks that resemble those observed experimentally [17]. While more detailed theoretical models have been developed (see reviews [2124]), we focus on the Anderson-Chaplain model due to its simplicity and wide adoption in the mathematical biology literature. The methods presented in this work can also be used to study alternative models of angiogenesis, such as the phase-field model presented in [25], the 2D model of early angiogenesis and cell fate specification in [26], the 3D hybrid model presented in [27], and multiscale models of vascular tumor growth, such as those presented in [2831].

The Anderson-Chaplain model describes the spatio-temporal evolution of three physical variables: endothelial tip cells, tumor angiogenesis factor (TAF), and fibronectin [17]. Fig 1 contains a model schematic where tumor cells located along the right boundary produce TAF and the endothelial tip cells from a nearby healthy vessel placed along the left boundary move via chemotaxis up spatial gradients of TAF. As the tip cells move, stalk cells located behind them proliferate, creating what has been termed a “snail trail” of new blood vessel segments [21]. Endothelial tip cells also migrate via haptotaxis, up spatial gradients of fibronectin, which is produced and consumed by the migrating endothelial cells and binds to the tissue matrix in which the cells are embedded. As the tip cells migrate and the stalk cells proliferate, the vessel sprouts elongate, and secondary sprouts may emerge from the vessels, at the end of a sprout or laterally from the nascent vessel. Further, when a tip cell collides with another vessel segment or another tip cell, the two cells may fuse, forming a closed loop or anastomosis.

We would like a quantitative method to understand and compare vessel networks that is applicable to experimental and synthetic data. Standard quantitative descriptors of vascular morphology proposed by [27, 3235] include inter-vessel spacing, the number of branch points, measuring the total area (volume) covered for 2D (3D) simulations, vessel length, density, tortuosity, the number of vessel segments, and the number of endothelial sprouts. These metrics correlate with tumor progression and treatment response in experimental datasets [36]; however, they are computed at a fixed spatial scale, and neglect information about the connectedness (i.e., the topology) of vascular structures. Therefore, standard descriptors do not exploit the full information content of synthetic data generated from the models. At the same time, advances in imaging technology mean that it is now possible to visualize vascular architectures across multiple spatial scales [37]. These imaging developments are creating a need for new methods that can quantify the multi-scale patterns in vascular networks, and how they change over time.

A relatively new and expanding field of computational mathematics for studying the shape of data at multiple scales is TDA. TDA consists of algorithmic methods for identifying global structures in high-dimensional datasets that may be noisy [38, 39]. TDA has been useful for biomedical applications [4043], including brain vessel MRI data [44] and experimental data of tumor blood vessel networks [45, 46]. A commonly-used tool from TDA is persistent homology (PH). Homology refers to the characterization of topological features (e.g., connected components and loops) in a dataset, and persistence refers to the extent to which these features persist within the data as a scale parameter varies. PH is a useful way to summarize the topological characteristics of spatial network data [4547] as well as noisy and high dimensional agent-based models (ABMs) [48, 49]. More recently, PH has been combined with machine learning to extract accurate estimates of parameters from such models [50]. Most of these studies focus on PH calculations for point clouds of data; notably, PH has been adapted to analyze networks and grayscale images [51]. An active area of research is exploring different filtrations for application-driven PH.

In this paper, we develop a computational pipeline for analyzing simulations of the Anderson-Chaplain model over a range of biologically-relevant parameter values. Our main objectives are to identify model descriptor vectors that group together similar model simulations into biologically interpretable clusters. By biologically interpretable, we mean that simulations within a cluster arise from similar parameter values and simulations in different clusters arise from different parameter values.

We use two topological approaches, the sweeping plane and flooding filtrations, to construct simplicial complexes from binary image data generated from simulations of the Anderson-Chaplain model. We show that PH of the sweeping plane filtration and its subsequent vectorization provides an interpretable descriptor vector for the model parameters governing chemotaxis and haptotaxis. We show, by comparison with existing morphologically-motivated standard descriptor vectors, that this topological approach leads to more biologically interpretable clusters that stratify the haptotaxis-chemotaxis parameter space. Furthermore, the clusters generated from the sweeping plane filtration are robust and generalize well to unseen model simulations.

Materials and methods

We simulate the Anderson-Chaplain model of tumor-induced angiogenesis for a range of values of the haptotaxis and chemotaxis coefficients, ρ and χ. Each simulation generates vasculature composed of a collection of interconnected blood vessels in the form of binary image data. We compute three so-called standard descriptor vectors to summarize each simulation: two are computed at 50 time steps and the third is computed at the final time step. We also compute 30 topological descriptor vectors. These topological descriptors encode the proposed multiscale quantification of the vascular morphology. We calculate standard and topological descriptor vectors for the entire set of 1,210 simulations. We then analyze the simulations by clustering the standard or topological descriptor vectors and compare the results. Fig 2 shows the pipeline of data generation and analysis. We give details of the Anderson-Chaplain model, including governing equations, parameter values and implementation, in the S1 Appendix. The Python files and notebooks used for all steps of our analysis are publicly available at

Fig 2. Data generation and analysis pipeline.

(1) Spatio-temporal modeling: Anderson-Chaplain model. The Anderson-Chaplain model is simulated for 11 × 11 = 121 different values of the haptotaxis and chemotaxis parameters, (ρ, χ). The model output is saved as a binary image, where pixels labeled 1 or 0 denote the presence or absence, respectively, of blood vessels. We generate 10 realizations for each of the 121 parameter combinations, leading to 1,210 images of simulated vessel networks. (2) Data analysis. We use the binary images from Step 1 to generate standard and topological descriptor vectors A.) Standard descriptor vectors We compute the number of active tip cells and the number of vessel segments at discrete time points. We also compute the length of the vessels at the final time point [32, 34, 35, 52]. B.) Topological descriptor vectors. We construct flooding and sweeping plane filtrations [44] using binary images from the final timepoint of each simulation. We track the birth and death of topological features (connected components and loops) that are summarized as Betti curves and persistence images [52]. (3) Data clustering. We perform k-means clustering using either the standard or topological descriptor vectors computed during step 2 from all 1,210 simulations. In this way, we decompose (ρ, χ) parameter space into regions that group vessel networks with similar morphologies.

Data generation

We simulate the Anderson-Chaplain model as the parameters ρ and χ are each independently varied among the following 11 values: {0, 0.05, 0.1, 0.15, …, 0.5}. The parameter bounds of [0, 0.5] were chosen to yield a range of vascular architectures that are visibly distinct and recapitulate vascular patterns seen in vivo. The resolution of the parameter mesh (11 discrete values between 0 and 0.5) was chosen for computational tractability while still being sufficient to illustrate how TDA methods can detect and quantify differences between simulations from similar parameter values that are difficult to distinguish visually. We generate 10 realizations of the model for each of the 121 (ρ, χ) parameter combinations, and produce 1,210 binary images that summarize the different synthetic blood vessel networks. Each simulation is initialized using the same initial conditions, with all other model parameters fixed at the baseline values used in [17], and no-flux conditions imposed on the domain boundary. For consistency with [17], all simulations are computed on lattices of size 201 × 201. Each simulation continues until either an endothelial tip cell crosses the line x = 0.95 (in dimensionless spatial units) or the maximum simulation time, t = 20 (in dimensionless time units), is exceeded. The final model output is a 201 × 201 binary image in which nonzero (zero) pixels denote the presence (absence) of an endothelial cell at that lattice site.

Morphologically-motivated standard descriptors and descriptor vectors.

Morphologically-motivated standard descriptors are used to summarize physical characteristics of each model simulation. Commonly used descriptors include inter-vessel spacing, the number of branch points, the total area (volume) spanned by the network for 2D (3D) simulations, the shape of such loops in 2D (voids in 3D), and tortuosity [27, 3235, 53]. We choose to use the following standard descriptors due to their prevalence in the literature: the number of vessel segments (NS), the number of endothelial tip cells (NT), and the length of the vessel that has penetrated furthest from the parent vessel (L) [18, 32, 34, 35]. In Fig 3 we use idealized networks to indicate how the standard descriptors are defined. As described in S1 Appendix, NS and NT both increase by one with each branching event, and NT decreases by one with each anastomosis event. We record NS and NT at 50 time steps to create the descriptor vectors. To facilitate downstream clustering analysis, we require vectors of the same length for all simulations. To ensure the dynamic descriptor vectors for each model simulation are of length 50, we record NS(t) and NT(t) at the 50 time steps , where ti = (i − 1)Δt, Δt = tend/49, and tend denotes the duration of a given simulation. We remark that the requirement for having vectors of the same length for all simulations might not be feasible when dealing with experimental data, which could include heterogeneities in sample size and time point locations. We note further that this requirement only applies to the standard descriptors; for the topological filtrations considered in future sections, only the final segmented vessel image is needed. For each simulation, we also compute L(tend), which we define as the x-coordinate of the vessel segment location which has the greatest horizontal distance from the y-axis. Thus, the standard descriptor vectors we use for subsequent clustering analysis are NS(t), NT(t) and L(tend).

Fig 3. Standard descriptor overview.

Standard descriptors are computed for each model simulation include the number of vessel segments (NS, depicted by each colored solid curve), the number of endothelial tip cells (NT, depicted by black dots), and the x-coordinate of the vessel segment location with the greatest horizontal distance from the y-axis (L, depicted by the vertical grayed dashed line). This schematic example is initialized with 2 vessel segments and 2 tip cells at locations (1) and (2). A branching event at location (3) increases the value of both NS and NT by one. An anastomosis event at location (4) decreases NT by one. The branching event at location (5) increases both NS and NT by one.

Quantifying vessel shape using topological data analysis

The most prominent method from TDA is persistent homology (PH) [38, 51, 54]. PH is rooted in the mathematical concept of homology, which captures the characteristics of shapes. To compute (persistent) homology from data, one needs to first construct simplicial complexes, which can be thought of as collections of generalized triangles. From the constructed simplicial complexes, one can quantify and visualize the connected components (dimension 0) and loops (dimension 1) of a dataset at different spatial scales. While we compute the NT and NS standard descriptor vectors at 50 time points for each model simulation (see Section Morphologically-motivated standard descriptors and descriptor vectors), we compute topological descriptor vectors from the final output binary image.

Simplicial complexes and homology.

We construct simplicial complexes from data points derived from binary images of the completed simulations of the Anderson-Chaplain model (see“Synthetic data” in Step 1 of Fig 2, where red pixels denote presence of vasculature). Normally, TDA of image data lends itself to the construction of cubical complexes, which are generalizations of cubes; however, we have binary rather than grayscale data, so we instead use information about the model generation and Moore neighborhoods as we describe here. Each pixel that has a value of one is embedded in at the centroid of the pixel as a vertex (or 0-simplex), as demonstrated in Fig 4. We term the resulting collection of points a point cloud. We connect two points by an edge (or 1-simplex) if they are within each others’ Moore neighborhood (the Moore neighborhood for one pixel is shaded in Fig 4B). If three points are pairwise connected by an edge, then we connect them with a filled-in triangle (or 2-simplex). The union of 0-, 1-, and 2-simplices form a simplicial complex, as shown in Fig 4C.

Fig 4. Converting a binary image into a simplicial complex.

A) Schematic binary image. B) Point cloud for all nonzero pixels in the binary image. The Moore neighborhood of one nonzero pixel is highlighted in salmon. C) A simplicial complex constructed from the point cloud in panel B.

To compute topological invariants, such as connected components (dimension 0) and loops (dimension 1), we use homology. To obtain homology from a simplicial complex, K, we construct vector spaces whose bases are the 0-simplices, 1-simplices, and 2-simplices, respectively, of K. There is a linear map between 2-simplices and 1-simplices, called the boundary map ∂2, which sends triangles to their edges. Similarly, the boundary map ∂1 sends edges to their end points and ∂0 sends points to 0. The action of the boundary map ∂1 on the simplices is stored in a binary matrix where the entry ai,j indicates whether the i-th 0-simplex forms part of the boundary of the j-th 1-simplex. If so, then ai,j = 1; otherwise, ai,j = 0. One can compute the kernel Ker(⋅) and image Im(⋅) of the boundary maps to obtain the vector spaces H0(K) = Ker(∂0)/Im(∂1) and H1(K) = Ker(∂1)/Im(∂2). These vector spaces are also referred to as homology groups and their dimensions define topological invariants called the Betti numbers of K, β0 and β1, which quantify the numbers of connected components and loops, respectively.

Filtrations for simulated vasculature.

There are different ways to study vascular data at multiple scales [4446, 55]. We encode the multiple spatial scales of the data using a filtration, which is a sequence of embedded simplicial complexes K0K1 ⊆ … ⊆ Kend built from the data. A challenge for researchers is to determine informative filtrations for specific applications [56], such as blood vessel development in our study. Here, the input data is the binary image from the final timepoint of the Anderson-Chaplain model, which we call N (See Eq (13) in S1 Appendix). We construct sequences of binary images that correspond to different filtered simplicial complexes: a sweeping plane filtration [44] and a flooding filtration [55]. We remark that both filtrations can be considered sublevelset filtrations corresponding to a height function on some simplicial complex X (or just on the vertices/pixels and considering the simplicial subcomplex spanned by them).

Sweeping plane filtration. In the sweeping plane filtration, we move a line across the binary image N and include pixels in the filtration at discrete steps once a pixel is crossed by the line. On every filtration step, we move the line by a fixed number of pixels in a chosen direction. In Fig 5, we illustrate the method when a vertical line moves from left to right (LTR). From the corresponding sequence of binary images, we construct a filtered simplicial complex KLTR = {K1, K2, …, Kend}. Similarly, we construct the filtered simplicial complexes KRTL when the vertical line moves from right to left (RTL), KTTB when a horizontal line moves from top to bottom (TTB), and KBTT when a horizontal line moves from bottom to top (BTT).

Fig 5. The LTR sweeping plane filtration for a binary image.

A) Sample binary image from a simulation of the Anderson-Chaplain model. B-E) Point clouds used for each iteration of the sweeping plane approach. On each step, only pixels located to the left of the plane (denoted here with a blue line) are included; gray pixels are ignored. F-I) Filtration steps constructed from the point clouds in panels B-E.

Flooding filtration. Our second filtration is the flooding filtration. From the binary image, N, associated with a model simulation, we construct a sequence of binary images in the following way. On the first step, we input the output binary image from a model simulation. In the binary image, pixels with value one denote the presence of an endothelial cell and pixels with value zero denote the absence of endothelial cells. To create the second binary image, we consider all pixels of value one and assign all pixels in their Moore neighborhood (as shown in Fig 4B) to value one. We repeat this process until a binary image with only pixels of value one is created. In Fig 6B, for example, the red pixels denote the nonzero pixels from the initial binary image, the orange pixels denote the nonzero pixels added on the first round of flooding, and the yellow pixels denote the nonzero pixels added on the second round of flooding. Each “round” is called a step. From the corresponding sequence of binary images, we construct a filtered simplicial complex Kflood = {K1, K2, …, Kend}. We choose Kend = 25 so that the Betti curves for all simulations attain β0 = 1 and β1 = 0 at the final filtration step, thus ensuring that the maximum amount of topological information is encoded in the Betti curves.

Fig 6. The flooding filtration for a binary image.

A) Schematic binary image. B) Grayscale image of the original binary image. Red pixels are nonzero in the initial image, orange pixels become nonzero after one round of flooding, and yellow pixels become nonzero after two rounds of flooding. C-E) We performed three filtrations by dilating the image twice. During each filtration, the eight neighboring pixels of all nonzero pixels (ie pixels not marked as white) become nonzero. Red pixels are nonzero in the original image, orange pixels become nonzero on the second filtration (i.e., after one step of flooding), and yellow pixels become nonzero on the third filtration (i.e., after two steps of flooding). F-H) Simplicial complexes associated with the filtrations presented in panels C-E.

Persistent homology (PH).

PH is an algorithm that takes in data via a filtration and outputs a quantification of topological features such as connected components (dimension 0) and loops (dimension 1) across the filtration. The simplicial complexes are indexed by the scale parameter of the filtration. In our case, the scale parameter corresponds to the spatial location in the sweeping plane filtration or steps of flooding in the flooding filtration. Note that we compute these filtrations on simulated blood vessel network images at the final time step of a model simulation. The inclusion of a simplicial complex KiKj for ij gives a relationship between the corresponding homology groups Hp(Ki) and Hp(Kj) for p = 0, 1. This relationship enables us to track topological features such as loops along the simplicial complexes in the filtration. Intuitively, a topological feature is born in filtration step b when it is first computed as part of the homology group Hp(Kb) and dies in filtration step d when that feature no longer exists in Hp(Kd), i.e., when a connected component merges with another component or when a loop is covered by 2-simplices. The output from PH is a multiset of intervals [b, d) which quantifies the persistence of topological features. Each topological feature is said to persist for the scale db in the filtration. Our topological descriptors are the connected components (i.e., 0-dimensional) and loops (i.e., 1-dimensional features) obtained by computing persistent homology of the sweeping plane and flooding filtrations. We use the superscript notation Kv to denote the filtered simplicial complex K in direction v = {LTR, RTL, TTB, BTT, flood}.

Topological descriptor vectors from PH.

We use PH to generate different output vectors.

  • Betti curves. Betti curves [57] show the Betti numbers on each filtration step. Let βi(Kv) denote the Betti curves of filtered simplicial complex K in direction v for dimensions i = {0, 1}.
  • Persistence images. A persistence image [52] uses as input the birth-death pairs given by PH and converts the set of (birth b, persistence (db)) coordinates into a vector, a format which is suitable for machine learning and other classification tasks. Following the standard definition of a persistence image, the coordinates (b, db) are blurred by a Gaussian, with standard deviation σ, that is centered about each birth-persistence point, which accounts for uncertainty [52]. We take a weighted sum of the Gaussians and place a grid on this surface to create a vector. We compute two alternative weighting strategies: all ones (in which case all features are equally weighted) and a persistence weight of (db) (in which more persistent features are given larger weights). Let PIOi(Kv) and PIRi(Kv) denote the corresponding persistence images computed from Kv with equal and ramped weighting, respectively, for dimension i = {0, 1}.

The topological descriptor vectors that we compute for each model simulation are βi(Kv), PIOi(Kv), and PIRi(Kv), where i = {0, 1} and v = {LTR, RTL, TTB, BTT, flood}. In total each simulation has 3 × 2 × 5 = 30 topological descriptor vectors.

Simulation clustering

We use an unsupervised algorithm, k-means clustering, to group the quantitative descriptor vectors of the synthetic data generated from simulations of the Anderson-Chaplain model. To cluster n samples into k clusters, the k-means algorithm finds k points in , clusters each sample according to which of the k means it is closest to, and then minimizes the within-cluster residual sum of squares distance [58]. We cluster the standard descriptor vectors and the topological descriptor vectors (see steps 2A and 2B of the methods pipeline in Fig 2), respectively, to investigate whether descriptor vectors lead to biologically interpretable clusters.

We test the robustness of each clustering assignment by randomly splitting the descriptor vectors into training and testing sets, and use the testing set to evaluate out-of-sample (OOS) accuracy. Specifically, 7 of the 10 descriptor vectors from each of the 121 (ρ, χ) parameter combinations are randomly chosen with uniform sampling and placed into a training set (847 simulations); the remaining 3 descriptor vectors from each (ρ, χ) combination are placed into a testing set (363 simulations). After performing unsupervised clustering with a k-means model on the training set, each (ρ, χ) combination is labeled by the majority of cluster assignments among its 7 training simulation descriptor vectors. To evaluate each clustering model’s OOS accuracy, the 3 test data samples for each (ρ, χ) parameter combination are given a “ground-truth” label equal to the cluster assignment for the 7 training simulations for that same (ρ, χ) parameter combination. A “predicted” label is calculated for each descriptor vector from the test data using the trained k-means model, and OOS accuracy is defined as the proportion of test simulations for which the “predicted” label matches the “ground-truth” label. For example, if the seven training simulations from the parameter combination (ρ, χ) = (0.2, 0.2) are placed into clusters {1, 1, 2, 2, 1, 1, 1}, then the three testing simulations from (ρ, χ) = (0.2, 0.2) will be labeled as being in cluster 1. If the test simulations are predicted to be from clusters {1, 2, 1}, then we compute an OOS accuracy of 66.67%. We use all 363 testing simulations when computing OOS accuracy and use the KMeans command from scikit-learn (version 0.22.1), which uses Lloyd’s algorithm for Euclidean space, for training and prediction.


We simulated the Anderson-Chaplain model for 121 different values of the chemotaxis and haptotaxis coefficients (ρ, χ) and performed 10 model realizations for each parameter pair. We observed that simulations generated by different parameter pairs may appear visually distinct in their vessel growth patterns. To quantify and summarize the morphologies, we computed dynamic standard descriptor vectors as well as topological descriptor vectors for the set of 1,210 model simulations. We then performed unsupervised clustering using either standard or topological descriptor vectors to partition the (ρ, χ) parameter space.

As we detail here, we found that the topological sweeping plane descriptor vectors offered a multi-scale analysis that led to biologically interpretable and robust clusterings of the (ρ, χ) parameter space, whereas the clusterings of the standard descriptor vectors were not biologically interpretable. The topological analysis succeeded in discriminating fine-grained vascular structures and the parameter pairs that generated them.

Haptotaxis and chemotaxis alter vessel morphology

We investigated the influence of haptotaxis and chemotaxis by varying the parameters ρ and χ We note that similar parameter sensitivity analyses have been performed previously [17, 22]; we reproduce this analysis as the basis for our data analysis. Recall that in model simulations, the vessels emerge from the left-most boundary and are recruited towards a tumor located on the right boundary. When chemotaxis alone drives endothelial tip cell movement (ρ = 0.00, χ = 0.50), the blood vessel segments are long and thin as the tip cells migrate directly towards the tumor and produce individual vessel segments with few anastomosis events (Fig 7A). When haptotaxis drives tip cell movement (ρ = 0.50, χ = 0.00), the vessels fail to reach the tumor (Fig 7B). When both haptotaxis and chemotaxis are active (ρ = χ = 0.15), the network spans a large portion of the spatial domain and is highly connected (Fig 7C).

Fig 7. Typical realizations of the Anderson-Chaplain model.

A) Chemotaxis-driven tip cell movement (ρ = 0.00, χ = 0.50), B) haptotaxis-driven tip cell movemement (ρ = 0.50, χ = 0.00), and C) tip cell movement driven by chemotaxis and haptotaxis (ρ = 0.15, χ = 0.15).

Standard and topological descriptors measure distinct aspects of vascular architectures

Rather than performing visual inspection of all model simulations, we computed and compared quantitative vessel descriptor vectors using data science approaches as described in the Materials and Methods. We demonstrate here how the biological and topological descriptors are distinct from each other. In Fig 8, we observe how the number of tip cells (NT), vessel segments (NS), connected components (β0), and loops (β1) associated with an evolving vasculature change following different events. After each anastomosis event, NT decreases by one (Fig 8B). After a branching event, both NT and NS increase by one (Fig 8C). We conclude that NT is unchanged and NS increases by one after one branching event and one anastomosis event have occurred. By contrast, the values of the topological descriptors before and after these events are not necessarily identical, as depicted in Fig 8D and 8E. When an anastomosis event involves vessel segments that were previously connected, the number of loops, β1, increases by 1 (compare Fig 8A and 8D); when the vessel segments were not previously connected, β0 decreases by 1 (compare Fig 8A–8E).

Fig 8. Standard and topological descriptors describe distinct aspects of vessel morphology.

For clarity, each vessel segment is colored differently, and black dots represent active tip cells. A) The initial vessel configuration. B) A schematic showing how the vascular architecture changes after an anastomosis event. C) A schematic showing how the vascular architecture changes after a branching event. D-E) Schematics showing two ways in which vasculature can change after one anastomosis event and one branching event. F) A schematic showing how the vessels can change after one branching and two anastomosis events. (Key: β0, number of connected components; β1, number of loops; NT, number of tip cells; NS, number of vessel segments).

Let NA(r, s) and NB(r, s) denote the numbers of anastomosis and branching events, respectively, that occur between times t = r and t = s. We computed NT(t), NS(t) vectors of the standard descriptors of evolving blood vessel simulations, as follows:

The values of NT(t1) and NS(t1) were determined from the models’ initial conditions.

We first constructed the filtrations Kv where v = {LTR, RTL, TTB, BTT, flood} of each simulation’s final binary image. Then we computed connected components and loops, which we call topological descriptors of Kv, using PH. The topological descriptor vectors, Betti curves and persistence images, were then constructed for these five filtrations.

Quantification of blood vessel architecture data

Fig 9 shows the standard descriptor vectors for the three simulations presented in Fig 7. For the haptotaxis-driven simulation growth in the horizontal direction halts (the total length remains at 0.095 units) and the numbers of tip cells and vessel segments stay constant (at or near the initial condition of 9 vessels). By contrast, the number of tip cells and vessel segments increase over time for the chemotaxis-driven simulation, with a more marked increase when both chemotaxis and haptotaxis are active. For both chemotaxis-active simulations, the vessels extend to the maximum horizontal distance.

Fig 9. The standard descriptors used to summarize model simulations.

We show how the following descriptors change over 50 time steps for the three simulations presented in Fig 7: A) the number of tip cells, NT(t), B) the number of vessel segments, NS(t), and C) the final vessel length, L(tend).

We next illustrate how we use the plane sweeping and flooding filtrations to analyze the chemotaxis-driven, haptotaxis-driven, and chemotaxis and haptotaxis-driven simulations (from Fig 7). We computed Betti curves, β0(Kv) and β1(Kv), of the simulated data with sweeping plane filtrations for v = {LTR, RTL, TTB, BTT} (see Fig 10) and a flooding filtration v = {flood} (see Fig 11). These filtrations are constructed to provide complementary but not directly comparable information; roughly, the sweeping plane gives an indication of the network in the (x-y) plane whereas the flooding filtration also focusses on vessel density (see earlier description of filtration construction). We could interpret the PH of the sweeping plane filtration direction as follows: the blood vessels grow primarily from left to right, so the LTR and RTL filtrations primarily identify dynamic changes in branching and anastomosis of vessels. The BTT and TTB filtrations are similar to each other, and reflect only horizontal changes, which could be interpreted as measuring bendiness or tortuosity. Therefore, we only discuss the BTT results in this section.

Fig 10. TDA sweeping plane descriptor vectors.

A) The sweeping plane directions are left-to-right (LTR, gray); right-to-left (RTL, orange); bottom-to-top (BTT, cyan); and top-to-bottom (TTB, green). For these four v = iTj filtrations, we started with points on the ith boundary and included more points as we stepped towards the jth boundary. Repeating this process produced the spatial filtration Kv = K0, K1Kend for v = {LTR, RTL, TTB, BTT}. B-D) We formed the Betti curves β0(Kv) and β1(Kv) by computing the Betti numbers along each step of Kv for the three blood vessel simulations presented in Fig 7; B) the chemotaxis-driven simulation, C) the haptotaxis-driven simulation and D) chemotaxis and haptotaxis simulation.

Fig 11. The flooding filtration.

A,C,E) Illustrate the flooding filtrations for the three blood vessel simulations from Fig 7. Dark red pixels were points included in the first steps of the filtration and yellow pixels were the points included in later filtration steps. White pixels were not included after 24 steps of the flooding filtration. B,D,F) The Betti Curves (β0(Kflood) and β1(Kflood)) for the chemotaxis-driven, haptotaxis-driven, and chemotaxis and haptotaxis simulations respectively.

For the chemotaxis-driven simulation (Figs 10B, 11A and 11B), the β0(KLTR) descriptor vector slightly decreases with distance x from the y-axis as vessel segments anastomose together. The β0(KRTL) and β1(KRTL) descriptor vectors decrease with x at the end of each vessel segment. The magnitude of β0(KBTT) increases almost linearly with the y-spatial coordinate, and β1(KBTT) increases with y until it plateaus for y ≥ 0.6. The β0(Kflood) descriptor vector decreases with each filtration step, and β1(Kflood) begins with β1 = 14 and achieves a local maximum of β1 = 10 after seven filtration steps before reaching β1 = 0 after 11 filtration steps.

For the haptotaxis-driven simulation (Figs 10C, 11C and 11D), all LTR and RTL Betti curves become zero for x > 0.1 because each vessel segment has terminated by x = 0.1. The β0(KBTT) and β1(KBTT) descriptor vectors increase steadily with y as the horizontal plane moves past each individual vessel segment. The flooding descriptor vectors initially decrease with each flooding step and remain fixed at β0 = 1, β1 = 0 after six steps of flooding.

For the chemotaxis and haptotaxis-driven simulation (Figs 10D, 11E and 11F), β0(KLTR) decreases with x as vessel segments anastomose together, while β1(KLTR) increases with x due to the formation of many loops. The β0(KRTL) descriptor vector increases with x until it reaches its maximum value of β0 = 11 at x = 0.85 and then decreases to β0 = 0. The β1(KRTL) descriptor vector decreases with x. The β0(KBTT) descriptor vector periodically increases and decreases with y as the horizontal plane moves past regions of high and low connectivity. The β1(KBTT) descriptor vector steadily increases with the y coordinate. The β0(Kflood) descriptor vector is equal to one for all steps of flooding, and β1(Kflood) decreases with each round of flooding and reaches β1 = 0 after 20 flooding steps.

Sweeping plane descriptor vectors cluster simulations by parameter values

We applied k-means clustering separately to the standard descriptor vectors and the topological descriptor vectors. None of the clusterings of the standard descriptor vectors were biologically interpretable in (ρ, χ) parameter space (see Fig 12 and S1 Fig). While the topological descriptor vectors from the sweeping plane filtration gave more biologically interpretable groupings for combined haptotaxis and chemotaxis parameter values, they were not robust for haptotaxis dominated or chemotaxis dominated (ρ, χ) parameter ranges (see S2 Fig). The flooding filtration did not give biologically interpretable groups for any parameter ranges (see Clustering results in S1 Appendix for all details). We suggest that the LTR and RTL plane sweeping filtrations produce the most interpretable clusterings because sweeping the plane horizontally resembles the way in which endothelial tip cells migrate from left to right in the simulated data.

Fig 12. Partitioning (ρ, χ) parameter space into distinct regions using standard biological descriptor vectors.

We applied k-means clustering, with k = 5, to standard biological descriptor vectors. The resulting partitions are presented for A) NT(t), B) NS(t), C) L(tend), and D) concatenation of all three descriptor vectors. The five clusters are ordered according to the mean χ value within the cluster.

No single descriptor vector was able to robustly cluster simulations into biologically interpretable groups. However, double descriptor vectors, created by concatenating two sweeping plane topological descriptor vectors, produced clusterings in (ρ, χ) parameter space that were biologically interpretable and robust. As an example of a double descriptor vector, “PIO0(KLTR) & PIO1(KLTR)” denotes the topological descriptor vector created by concatenating the PIO0(KLTR) and PIO1(KLTR) descriptor vectors. We note that doubles of standard descriptor vectors or topological flooding descriptor vectors were unable to produce biologically interpretable and robust clusterings in (ρ, χ) parameter space (see Clustering results in S1 Appendix).

We computed a total of 2 × 4 × 3 = 24 individual sweeping plane descriptor vectors for each model simulation: each descriptor vector counted connected components or loops (β0, β1), swept a plane across the domain in one of four directions (LTR, RTL, TTB, and BTT), and used one of the three PH summaries (BC, PIO, PIR). We computed all double sweeping plane descriptor vectors and clustered all 276 combinations in (ρ, χ) parameter space. The 15 double descriptor vectors with the highest OOS accuracies are presented in Table 1; the four with the highest OOS accuracies are listed below:

  1. PIO0(KLTR) & PIO0(KRTL) (93.1% OOS accuracy),
  2. PIO0(KRTL) & PIR1(KLTR) (92.6% OOS accuracy),
  3. PIR0(KTTB) & PIR1(KLTR) (91.5% OOS accuracy),
  4. PIR0(KBTT) & PIR1(KLTR) (91.5% OOS accuracy).
Table 1. Summary of the 15 sweeping plane double descriptor vectors that achieved the highest OOS accuracy scores.

In Fig 13 and S3 Fig, we show how the data clustered in (ρ, χ) parameter space when the top four double sweeping plane descriptor vectors were used. The “PIO0(KLTR) & PIO0(KRTL)” and “PIO0(KRTL) & PIR1(KLTR)” double descriptor vectors produced five connected clustering regions and, thus, attained high OOS accuracy and biologically interpretable clusterings. In contrast, the clusterings generated by the following double descriptor vectors (“PIR0(KTTB) & PIR1(KLTR)” and “PIR0(KBTT) & PIR1(KLTR)”) were unable to distinguish between simulations with strong chemotaxis and strong haptotaxis. We conclude that double descriptor vectors comprising LTR and RTL descriptor vectors robustly cluster simulations into biologically interpretable clusterings because the associated filtrations capture the ways in which vessel segments anastomose and branch, respectively. We performed principal components analysis to reduce the “PIO0(KLTR) & PIO0(KRTL)” double descriptor vector to two principal components. The results presented in S4 Fig show that the simulations from each group are also clustered in this two-dimensional space.

Fig 13. Partitioning (ρ, χ) parameter space into distinct regions using sweeping plane double descriptor vectors.

We applied k-means clustering, with k = 5, to sweeping plane double descriptor vectors. The resulting partitions are presented for the four descriptor vectors that have the highest OOS accuracy: A) PIO0(KLTR) & PIO0(KRTL), B) PIO0(KRTL) & PIR1(KLTR), C) PIO0(KTTB) & PIR1(KLTR), and D) PIO0(KBTT) & PIR1(KLTR). The five clusters are ordered according to the mean χ value within the cluster.

We applied k−means clustering to all sweeping plane double descriptor vectors and computed the average OOS accuracy for a range of values of k, also known as an elbow plot (S5 Fig). While the average OOS accuracy decreased with increasing values of k, the rate of decrease slowed markedly for k > 5. We fixed k = 5 as this value robustly clustered (ρ, χ) parameter space into biologically interpretable clusters (Fig 13). Larger values of k may overfit the data [59], while smaller values fail to partition (ρ, χ) parameter space into biologically interpretable regions (S6 Fig). For example, when k = 3, simulations with strong haptotaxis (ρ > χ), equal chemotaxis and haptotaxis (ρ = χ), and strong chemotaxis (ρ < χ) are clustered together. When k = 4, there are two distinct clusters near ρ = χ, but simulations with large values of the haptotaxis parameter, ρ, are clustered with those with high values of the chemotaxis parameter, χ. With k = 5 clusters, the region with large values of the haptotaxis parameter, ρ, was further subdivided. We suggest that clustering with k = 5 provides sufficient resolution to distinguish model simulations generated from different regions of parameter space. We note that more sophisticated descriptors could be used to justify this choice of k, including the silhouette score [60].

We identified five model simulations whose double descriptor vectors closely resembled the means from the clusters associated with the “PIO0(KLTR) & PIO0(KRTL)” clustering. Fig 14 presents representative simulations of each cluster. We note that as ρ and χ vary, the spatial extent and connectedness of the networks vary as expected (i.e., greater spatial extent in the x-direction as χ increases, and greater connectedness as ρ increases). We applied a similar analysis to model simulations sampled from (ρ, χ, ψ) parameter space, where the non-negative parameter ψ represents the rate at which tip cells branch. The resulting parameter clustering changed only with ρ and χ, suggesting that the vascular morphology does not depend on ψ (S7 Fig).

Fig 14. Representative simulated blood vessel networks from Groups 1–5.

Series of simulations that resemble the means from each of the five groups (as identified from the double descriptor vectors associated with the sweeping plane filtrations PIO0(KLTR) & PIO0(KRTL)). The representative simulations for groups 1–5 were generated using the following pairs of parameter values: (ρ, χ) = (0.25, 0), (0.5, 0.2), (0.3, 0.2), (0.15, 0.15), and (0.2, 0.4), respectively.

Conclusions and discussion

We have introduced a topological pipeline to analyze simulated data of tumor-induced angiogenesis [17]. We investigated different filtrations to feed into the persistent homology algorithm. We demonstrated that simple data science algorithms can group together topologically similar simulated data. Pairs of topological descriptor vectors generated from sweeping plane filtrations robustly clustered simulations into five biologically interpretable groups. Simulations within each group were generated from similar values of the model parameters ρ and χ, which determine the sensitivity of the tip cells to haptotaxis and chemotaxis.

The Betti numbers quantify connectedness of the vasculature across spatial scales. We computed PH of four sweeping plane filtrations (LTR, RTL, TTB, BTT), which quantify topological features as a plane moves across the simulated data in a particular direction. We found that the filtration direction provides information about different aspects of the vascular network. For the LTR filtration, the number of connected components (i.e., β0) decreases when an anastomosis event occurs, whereas for the RTL filtration, β0 decreases when a branching event occurs. The TTB and BTT filtrations summarize vascular tortuosity: when tip cell movement in the y-spatial direction is negligible and/or tip cells do not merge with adjacent sprouts, β0 increases with the number of tip cells/ number of sprouts. We found that the combined PIO0(KLTR) & PIO0(KRTL) sweeping plane descriptor vectors robustly cluster haptotaxis-chemotaxis parameter space into five distinct regions that appear to be connected.

Our results may depend on the model setup, in which new blood vessels are created as endothelial tip cells move from left to right. In the future, we may explore the use of extended persistence [61] and/or the PH transform [62], which allow consideration of multiple different directions in the topological analysis. If the vessel network grows radially, for example, then radial filtrations pinned at the network center may be more informative than the LTR filtration [45, 46]. Classically in PH, topological features of short persistence are topological noise, yet there is growing evidence that these short features can provide information about geometry and curvature [43, 44, 46, 47, 6366]. Our results using persistence images with unit weightings, rather than ramped weightings, also suggest that short persistence features may be informative for distinguishing vessel morphologies. The spatial resolution of the output binary images that we used for our analyses was the same as that used to generate the simulations, enabling us to distinguish the presence and absence of individual endothelial cells. In future work, we will consider finer and coarser binary images to investigate how the performance of the methods depends on their spatial resolution.

Many models of angiogenesis have been developed and implemented since the Anderson-Chaplain model, as has been extensively reviewed previously [2124, 6771]. We plan to use the methods described in this work to interpret more sophisticated models of angiogenesis and, in the longer term, to determine how vessel morphology affects tumor survival and to predict the efficacy of vascular targeting agents. The tumor is not explicitly modeled in the Anderson-Chaplain model, but many studies couple angiogenesis and tumor growth [25, 27, 29, 72, 73]. More complex models have been developed that account for mechanical factors such as mechanotaxis, pressure-driven convective transport of extracellular factors, mechanical stimulation of endothelial cell proliferation, and subcellular signalling pathways that guide cell fate specification of tip and stalk cells [26, 27, 29, 74]. For example, Vavourakis et al. developed a 3D model of angiogenic tumor growth that incorporates the effects of mechanotaxis on vessel sprouting and mechano-sensitive vascular remodelling [29]. Importantly, these more complex models have been shown to recapitulate experimental observations of time-varying spatial distributions of angiogenic vasculature [29] and features such as asymmetric neovasculature [74] and cell rearrangements and phenotype switching of endothelial stalk and tip cells [75, 76]. In future work, we will apply the TDA framework developed here to these, and other angiogenesis models. Such analyses will enable us to investigate the effect that processes such as mechanotaxis and subcellular signalling have on the topology of blood vessel networks.

Miller at al. used computational models to analyze tumor regression in response to drug treatment as a function of vascular morphology and heterogeneity [77]. Our future analysis of such state-of-the-art models will require us to extend the topological methods presented here for spatio-temporal data that describes how multiple cell types (e.g., tumor cells, immune cells and endothelial cells) interact in 3D. In a preliminary study, we applied these methods to 3D data from in vivo studies of tumor vessel networks, which is near the edge of current computational feasibility [46]. Further work in this direction requires significant computational advancements to scale the TDA pipeline to multiple, more complex models and to analyze the resulting parameter landscapes.

The topological approaches considered in this work represent a multiscale way to summarize the properties of in silico vasculature, and could be extended to experimental imaging data. Previous studies have quantified experimental data of tumor-induced angiogenesis using standard descriptors that include sprout length and diameter, number of tumor cells, concentration of angiogenic chemicals [30, 78, 79]. To compare models and data will require descriptors, such as the topological ones proposed here, that can discriminate between different conditions (e.g., parameter values). The challenges of making direct comparisons between experimental and synthetic data is a significant bottleneck in the validation of mathematical and computational models of angiogenesis. In future work we will apply the topological approaches outlined in this paper to compare different model simulations with experimental data [80, 81]. In this way, we aim to determine biologically realistic parameter values for control conditions [36], and to identify parameter values that are associated with response to treatments, including vessel normalization.

Supporting information

S1 Fig. Standard descriptor vector clustering.

Clustering of the (ρ, χ) parameter space using k-means with k = 5 on the standard descriptor vectors to summarize each simulated vasculature. We considered A) Number of tips over time, C) Number of vessel segments over time, E) Final length of the simulated vasculature, and G) All of the previous descriptor vectors concatenated into one descriptor vector. The five clusters are ordered according to the mean χ value within the cluster. Panels B,D,F,H depict the out of sample confusion matrices for each descriptor vector.


S2 Fig. Individual plane sweeping descriptor vector clustering.

Clustering of the (ρ, χ) parameter space using k-means with k = 5 on individual sweeping plane topological filtrations to summarize each simulated vasculature. The four highest OOS accuracies resulted from the A) PIR1(KLTR), C) PIO0(KLTR), E) PIO1(KRTL), G) and PIO1(KLTR) descriptor vectors. The five clusters are ordered according to the mean χ value within the cluster. Panels B,D,F,H depict the out of sample confusion matrices for each descriptor vector.


S3 Fig. Double plane sweeping descriptor vector clustering.

Clustering of the (ρ, χ) parameter space using k-means with k = 5 on double vectors of the sweeping plane topological filtration to summarize each simulated vasculature. The four highest OOS accuracies resulted from the A) PIO0(KLTR) & PIO0(KRTL), C) PIO0(KRTL) & PIR1(KLTR), E) PIO0(KTTB) & PIR1(KLTR), and F) PIO0(KBTT) & PIR1(KLTR) descriptor vectors. The five clusters are ordered according to the mean χ value within the cluster. Panels B,D,F,H depict the out of sample confusion matrices for each descriptor vector.


S4 Fig. Dimensionality reduction.

Dimensionality reduction of the PIO0(KLTR) & PIO0(KRTL) double descriptor vector. We reduced the dimensionality of the PIO0(KLTR) & PIO0(KRTL) double descriptor vector to two dimensions using principal components analysis and plot the reduced-dimension descriptor vector for each simulation. The color of each dot denotes the predicted grouping of each simulation from the k-means algorithm.


S5 Fig. Elbow curve.

Elbow curve for the OOS Accuracy for the k-means clustering and labeling algorithm over different values of k. We considered all 276 doubles of topological feature vectors and plot the mean OOS accuracy percentage plus or minus two standard deviations for multiple values of k. We propose that the elbow occurs at k = 5.


S6 Fig. Multiple k-means clusterings.

Clustering of the (ρ, χ) parameter space using the PIO0(KLTR) & PIO0(KRTL) double of topological descriptor vectors using the k-means algorithm with A) k = 3, C) k = 4, E) k = 5. The clusters are ordered according to the mean χ value within the cluster. Panels B,D,F) depict the out of sample confusion matrices for each descriptor vectory.


S7 Fig. (ρ, χ, ψ) clustering.

Clustering based on the (ρ, χ) grouping and the time to branch, ψ. We created a data set of blood vessel simulations by considering (ρ, χ) values from groups 1–5 from Fig 13A and letting ψ vary over the 10 values {.1, 0.2, 0.3, …, 1.0}. We simulated the Anderson-Chaplain model ten times at each (ρ, χ, ψ) values to create 500 total simulations. We performed our clustering methodology on this collection of images based off the PIO0(KLTR) & PIO0(KRTL) descriptor vector.


S8 Fig. Initial TAF and fibronectin profiles.

TAF concentration (left) and fibronectin concentration (right).


S9 Fig. Individual flooding descriptor vector clustering.

Clustering of the (ρ, χ) parameter space using k-means with k = 5 on individual flooding topological descriptor vectors to summarize each simulated vasculature. The four highest OOS accuracies resulted from the A) PIR1(Kflood), C) β1(Kflood), E) PIO1(Kflood), G) and β0(Kflood) descriptor vectors. The five clusters are ordered according to the mean χ value within the cluster. Panels B,D,F,H depict the out of sample confusion matrices for each descriptor vector.


S10 Fig. Double flooding descriptor vector clustering.

Clustering of the (ρ, χ) parameter space using k-means with k = 5 on double flooding topological descriptor vectors to summarize each simulation. The four highest OOS accuracies resulted from the A) PIO0(Kflood)&PIR1(Kflood), C) PIR0(Kflood)&PIR1(Kflood), E) β0(Kflood)&β1(Kflood), G) and PIO0(Kflood)&β1(Kflood) descriptor vectors. The five clusters are ordered according to the mean χ value within the cluster. Panels B,D,F,H depict the out of sample confusion matrices for each descriptor vector.


S1 Table. Anderson-Chaplain model parameters.

List of mechanistic parameters for the Chaplain Anderson ABM from [17].


S2 Table. Anderson-Chaplain model nondimensionalized parameters.

Baseline nondimensionalized mechanistic parameters for the Chaplain-Anderson Model from [17].


S3 Table. Individual plane sweeping clustering.

Out of Sample Accuracy scores for individual feature vectors from the flooding filtration using k-means classification with k = 5.


S4 Table. Double flooding clustering.

Out of Sample Accuracy scores for doubles of feature vectors from the flooding filtration using k-means classification with k = 5.


S5 Table. Individual plane sweeping clustering.

Out of Sample Accuracy scores for individual feature vectors from the sweeping plane filtration using k-means classification with k = 5.


S1 Appendix. Modeling and clustering appendix.

The file contains a more detailed description of implementation of the Anderson-Chaplain Model and clustering results.



  1. 1. Gupta MK, Qin RY. Mechanism and its regulation of tumor-induced angiogenesis. World Journal of Gastroenterology. 2003;9(6):1144–1155.
  2. 2. Kushner EJ, Bautch VL. Building blood vessels in development and disease. Current Opinion in Hematology. 2013;20(3):231–236.
  3. 3. Tonnesen MG, Feng X, Clark RAF. Angiogenesis in wound healing. Journal of Investigative Dermatology Symposium Proceedings. 2000;5(1):40–46.
  4. 4. Ferrara N. VEGF and the quest for tumour angiogenesis factors. Nature Reviews Cancer. 2002;2(10):795–803.
  5. 5. Folkman J. Tumor angiogenesis. In: Klein G, Weinhouse S, Haddow A, editors. Advances in Cancer Research. vol. 19. Academic Press; 1974. p. 331–358.
  6. 6. Sherwood LM, Parris EE, Folkman J. Tumor angiogenesis: therapeutic implications. New England Journal of Medicine. 1971;285(21):1182–1186.
  7. 7. Folkman J, Merler E, Abernathy C, Williams G. Isolation of a tumor factor responsible for angiogenesis. The Journal of Experimental Medicine. 1971;133(2):275–288.
  8. 8. Menashi S, Lu H, Soria C, Legrand Y. 2 Endothelial cell proteases: physiological role and regulation. Baillière’s Clinical Haematology. 1993;6(3):559–576.
  9. 9. Paweletz N, Knierim M. Tumor-related angiogenesis. Critical Reviews in Oncology/Hematology. 1989;9(3):197–242.
  10. 10. Terranova VP, DiFlorio R, Lyall RM, Hic S, Friesel R, Maciag T. Human endothelial cells are chemotactic to endothelial cell growth factor and heparin. Journal of Cell Biology. 1985;101(6):2330–2334.
  11. 11. Sholley M, Ferguson G, Seibel H, Montour J, Wilson J. Mechanisms of neovascularization. Vascular sprouting can occur without proliferation of endothelial cells. Laboratory Investigation; a Journal of Technical Methods and Pathology. 1984;51(6):624–634.
  12. 12. Lugano R, Ramachandran M, Dimberg A. Tumor angiogenesis: causes, consequences, challenges and opportunities. Cellular and Molecular Life Sciences. 2020;77(9):1745–1770.
  13. 13. Saman H, Raza SS, Uddin S, Rasul K. Inducing angiogenesis, a key step in cancer vascularization, and treatment approaches. Cancers. 2020;12(5):1172.
  14. 14. Bentley K, Franco CA, Philippides A, Blanco R, Dierkes M, Gebala V, et al. The role of differential VE-cadherin dynamics in cell rearrangement during angiogenesis. Nature Cell Biology. 2014;16(4):309–321. pmid:24658686
  15. 15. Li JL, Harris AL. Notch signaling from tumor cells: a new mechanism of angiogenesis. Cancer Cell. 2005;8(1):1–3.
  16. 16. Li S, Butler P, Wang Y, Hu Y, Han DC, Usami S, et al. The role of the dynamics of focal adhesion kinase in the mechanotaxis of endothelial cells. Proceedings of the National Academy of Sciences. 2002;99(6):3546–3551. pmid:11891289
  17. 17. Anderson ARA, Chaplain MAJ. Continuous and discrete mathematical models of tumor-induced angiogenesis. Bulletin of Mathematical Biology. 1998;60(5):857–899.
  18. 18. Balding D, McElwain DLS. A mathematical model of tumour-induced capillary growth. Journal of Theoretical Biology. 1985;114(1):53–73.
  19. 19. Byrne HM, Chaplain MAJ. Growth of nonnecrotic tumors in the presence and absence of inhibitors. Mathematical Biosciences. 1995;130(2):151–181.
  20. 20. Stokes CL, Lauffenburger DA. Analysis of the roles of microvessel endothelial cell random motility and chemotaxis in angiogenesis. Journal of Theoretical Biology. 1991;152(3):377–403.
  21. 21. Byrne HM. Dissecting cancer through mathematics: from the cell to the animal model. Nature Reviews Cancer. 2010;10(3):221–230.
  22. 22. Hadjicharalambous M, Wijeratne PA, Vavourakis V. From tumour perfusion to drug delivery and clinical translation of in silico cancer models. Methods. 2021;185:82–93.
  23. 23. Metzcar J, Wang Y, Heiland R, Macklin P. A review of cell-based computational modeling in cancer biology. JCO Clinical Cancer Informatics. 2019.
  24. 24. Scianna M, Bell CG, Preziosi L. A review of mathematical models for the formation of vascular networks. Journal of Theoretical Biology. 2013;333:174–209.
  25. 25. Vilanova G, Colominas I, Gomez H. A mathematical model of tumour angiogenesis: growth, regression and regrowth. Journal of The Royal Society Interface. 2017;14(126):20160918.
  26. 26. Stepanova D, Byrne HM, Maini PK, Alarcón T. A multiscale model of complex endothelial cell dynamics in early angiogenesis. PLoS Computational Biology. 2021;17(1):e1008055.
  27. 27. Perfahl H, Hughes BD, Alarcón T, Maini PK, Lloyd MC, Reuss M, et al. 3D hybrid modelling of vascular network formation. Journal of Theoretical Biology. 2017;414:254–268. pmid:27890575
  28. 28. Grogan JA, Connor AJ, Markelc B, Muschel RJ, Maini PK, Byrne HM, et al. Microvessel chaste: an open library for spatial modeling of vascularized tissues. Biophysical Journal. 2017;112(9):1767–1772. pmid:28494948
  29. 29. Vavourakis V, Wijeratne PA, Shipley R, Loizidou M, Stylianopoulos T, Hawkes DJ. A validated multiscale in-silico model for mechano-sensitive tumour angiogenesis and growth. PLoS Computational Biology. 2017;13(1):e1005259.
  30. 30. Cai H, Liu X, Zheng J, Xue Y, Ma J, Li Z, et al. Long non-coding RNA taurine upregulated 1 enhances tumor-induced angiogenesis through inhibiting microRNA-299 in human glioblastoma. Oncogene. 2017;36(3):318–331. pmid:27345398
  31. 31. Sefidgar M, Soltani M, Raahemifar K, Sadeghi M, Bazmara H, Bazargan M, et al. Numerical modeling of drug delivery in a dynamic solid tumor microvasculature. Microvascular Research. 2015;99:43–56. pmid:25724978
  32. 32. Folarin AA, Konerding MA, Timonen J, Nagl S, Pedley RB. Three-dimensional analysis of tumour vascular corrosion casts using stereoimaging and micro-computed tomography. Microvascular Research. 2010;80(1):89–98.
  33. 33. Kannan P, Kretzschmar WW, Winter H, Warren D, Bates R, Allen PD, et al. Functional parameters derived from magnetic resonance imaging reflect vascular morphology in preclinical tumors and in human liver metastases. Clinical Cancer Research. 2018;24(19):4694–4704. pmid:29959141
  34. 34. Konerding MA, Malkusch W, Klapthor B, Ackern Cv, Fait E, Hill SA, et al. Evidence for characteristic vascular patterns in solid tumours: quantitative studies using corrosion casts. British Journal of Cancer. 1999;80(5):724–732. pmid:10360650
  35. 35. Konerding MA, Fait E, Gaumann A. 3D microvascular architecture of pre-cancerous lesions and invasive carcinomas of the colon. British Journal of Cancer. 2001;84(10):1354–1362.
  36. 36. Bauer AL, Jackson TL, Jiang Y. A cell-based model exhibiting branching and anastomosis during tumor-induced angiogenesis. Biophysical Journal. 2007;92(9):3105–3121.
  37. 37. Zhan Y, Li MZ, Yang L, Feng XF, Zhang QX, Zhang N, et al. An MRI study of neurovascular restorative after combination treatment with Xiaoshuan enteric-coated capsule and enriched environment in rats after stroke. Frontiers in Neuroscience. 2019;13. pmid:31354412
  38. 38. Carlsson G. Topology and data. Bulletin of the American Mathematical Society. 2009;46(2):255–308.
  39. 39. Zomorodian AJ. Topology for computing. Cambridge University Press; 2005.
  40. 40. Nicolau M, Levine AJ, Carlsson G. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. PNAS. 2011;108(17):201102826.
  41. 41. Nielson JL, Paquette J, Liu AW, Guandique CF, Tovar CA, Inoue T, et al. Topological data analysis for discovery in preclinical spinal cord injury and traumatic brain injury. Nature Communications. 2015;6:8581. pmid:26466022
  42. 42. Qaiser T, Sirinukunwattana K, Nakane K, Tsang YW, Epstein D, Rajpoot N. Persistent homology for fast tumor segmentation in whole slide histology images. Procedia Computer Science. 2016;90:119–124.
  43. 43. Xia K, Wei GW. Persistent homology analysis of protein structure, flexibility, and folding. International Journal for Numerical Methods in Biomedical Engineering. 2014;30(8):814–844.
  44. 44. Bendich P, Marron JS, Miller E, Pieloch A, Skwerer S. Persistent homology analysis of brain artery trees. Annals of Applied Statistics. 2016;10(1):198–218.
  45. 45. Byrne HM, Harrington HA, Muschel R, Reinert G, Stolz BJ, Tillmann U. Topology characterises tumour vasculature. Mathematics Today. 2019;55(5):206–210.
  46. 46. Stolz BJ, Kaeppler J, Markelc B, Mech F, Lipsmeier F, Muschel RJ, et al. Multiscale topology characterises dynamic tumour vascular networks. arXiv:200808667 [q-bio]. 2020;.
  47. 47. Feng M, Porter MA. Persistent homology of geospatial data: a case study with voting. arXiv:190205911 [physics]. 2019;.
  48. 48. Topaz CM, Ziegelmeier L, Halverson T. Topological data analysis of biological aggregation models. PLoS ONE. 2015;10(5):e0126383.
  49. 49. McGuirl MR, Volkening A, Sandstede B. Topological data analysis of zebrafish patterns. Proceedings of the National Academy of Sciences. 2020;117(10):5113–5124.
  50. 50. Bhaskar D, Manhart A, Milzman J, Nardini JT, Storey KM, Topaz CM, et al. Analyzing collective motion with machine learning and topology. Chaos. 2019;29(12):123125. pmid:31893635
  51. 51. Otter N, Porter MA, Tillmann U, Grindrod P, Harrington HA. A roadmap for the computation of persistent homology. European Physical Journal—Data Science. 2017;6(17):1–38.
  52. 52. Adams H, Emerson T, Kirby M, Neville R, Peterson C, Shipman P, et al. Persistence images: a stable vector representation of persistent homology. Journal of Machine Learning Research. 2017;18(1):218–252.
  53. 53. Baish JW, Stylianopoulos T, Lanning RM, Kamoun WS, Fukumura D, Munn LL, et al. Scaling rules for diffusive drug delivery in tumor and normal tissues. Proceedings of the National Academy of Sciences. 2011;108(5):1799–1803. pmid:21224417
  54. 54. Edelsbrunner H, Harer JL. Computational topology: an introduction. Providence R. I.: American Mathematical Society; 2010.
  55. 55. Garside K, Henderson R, Makarenko I, Masoller C. Topological data analysis of high resolution diabetic retinopathy images. PLoS ONE. 2019;14(5):e0217413.
  56. 56. Stolz-Pretzer B. Global and local persistent homology for the shape and classification of biological data. University of Oxford; 2019. Available from:
  57. 57. Giusti C, Pastalkova E, Curto C, Itskov V. Clique topology reveals intrinsic geometric structure in neural correlations. Proceedings of the National Academy of Sciences of the United States of America. 2015;112(44):13455–13460.
  58. 58. Lloyd S. Least squares quantization in PCM. IEEE Transactions on Information Theory. 1982;28(2):129–137.
  59. 59. Bholowalia P, Kumar A. EBK-Means: a clustering technique based on elbow method and K-means in WSN. International Journal of Computer Applications. 2014;105(9):17–24.
  60. 60. Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics. 1987;20:53–65.
  61. 61. Edelsbrunner H, Harer JL. Persistent homology—a survey. Contemporary Mathematics. 2008;453:257–282.
  62. 62. Turner K, Mukherjee S, Boyer DM. Persistent homology transform for modeling shapes and surfaces. Information and Inference: A Journal of the IMA. 2014;3(4):310–344.
  63. 63. Marchese A, Maroulas V. Signal classification with a point process distance on the space of persistence diagrams. Advances in Data Analysis and Classification. 2018;12(3):657–682.
  64. 64. Robins V, Turner K. Principal component analysis of persistent homology rank functions with case studies of spatial point patterns, sphere packing and colloids. Physica D: Nonlinear Phenomena. 2016;334:99–117.
  65. 65. Stolz BJ, Harrington HA, Porter MA. Persistent homology of time-dependent functional networks constructed from coupled time series. Chaos. 2017;27(4):047410.
  66. 66. Bubenik P, Hull M, Patel D, Whittle B. Persistent homology detects curvature. Inverse Problems. 2020;36(2):025008.
  67. 67. Hamis S, Powathil GG, Chaplain MAJ. Blackboard to bedside: a mathematical modeling bottom-up approach toward personalized cancer treatments. JCO Clinical Cancer Informatics. 2019;(3):1–11.
  68. 68. Karolak A, Markov DA, McCawley LJ, Rejniak KA. Towards personalized computational oncology: from spatial models of tumour spheroids, to organoids, to tissues. Journal of The Royal Society Interface. 2018;15(138):20170703.
  69. 69. Mantzaris NV, Webb S, Othmer HG. Mathematical modeling of tumor-induced angiogenesis. Journal of Mathematical Biology. 2004;49(2):111–187.
  70. 70. Peirce SM. Computational and mathematical modeling of angiogenesis. Microcirculation. 2008;15(8):739–751.
  71. 71. Rockne RC, Hawkins-Daarud A, Swanson KR, Sluka JP, Glazier JA, Macklin P, et al. The 2019 mathematical oncology roadmap. Physical Biology. 2019;16(4):041005. pmid:30991381
  72. 72. Owen MR, Alarcón T, Maini PK, Byrne HM. Angiogenesis and vascular remodelling in normal and cancerous tissues. Journal of Mathematical Biology. 2008;58(4):689.
  73. 73. Norton KA, Jin K, Popel AS. Modeling triple-negative breast cancer heterogeneity: effects of stromal macrophages, fibroblasts and tumor vasculature. Journal of Theoretical Biology. 2018;452:56–68.
  74. 74. Vilanova G, Burés M, Colominas I, Gomez H. Computational modelling suggests complex interactions between interstitial flow and tumour angiogenesis. Journal of The Royal Society Interface. 2018;15(146):20180415.
  75. 75. Chen W, Xia P, Wang H, Tu J, Liang X, Zhang X, et al. The endothelial tip-stalk cell selection and shuffling during angiogenesis. Journal of Cell Communication and Signaling. 2019;13(3):291–301. pmid:30903604
  76. 76. Jakobsson L, Franco CA, Bentley K, Collins RT, Ponsioen B, Aspalter IM, et al. Endothelial cells dynamically compete for the tip cell position during angiogenic sprouting. Nature Cell Biology. 2010;12(10):943–953. pmid:20871601
  77. 77. Ha M, Hb F. Evaluation of drug-loaded gold nanoparticle cytotoxicity as a function of tumor vasculature-induced tissue heterogeneity. Annals of Biomedical Engineering. 2018;47(1):257–271.
  78. 78. Feleszko W, Bałkowiec EZ, Sieberth E, Marczak M, Dabrowska A, Giermasz A, et al. Lovastatin and tumor necrosis factor- exhibit potentiated antitumor effects against Ha-ras-transformed murine tumor via inhibition of tumor-induced angiogenesis. International Journal of Cancer. 1999;81(4):560–567. pmid:10225445
  79. 79. Sailem HZ, Zen AAH. Morphological landscape of endothelial cell networks reveals a functional role of glutamate receptors in angiogenesis. Scientific reports. 2020;10(1):1–14.
  80. 80. Gimbrone MA, Cotran RS, Leapman SB, Folkman J. Tumor growth and neovascularization: an experimental model using the rabbit cornea. Journal of the National Cancer Institute. 1974;52(2):413–427.
  81. 81. Hlushchuk R, Barré S, Djonov V. Morphological aspects of tumor angiogenesis. In: Ribatti D, editor. Tumor angiogenesis assays: methods and protocols. Methods in Molecular Biology. New York, NY: Springer; 2016. p. 13–24.