Biologically-informed neural networks (BINNs), an extension of physics-informed neural networks , are introduced and used to discover the underlying dynamics of biological systems from sparse experimental data. In the present work, BINNs are trained in a supervised learning framework to approximate in vitro cell biology assay experiments while respecting a generalized form of the governing reaction-diffusion partial differential equation (PDE). By allowing the diffusion and reaction terms to be multilayer perceptrons (MLPs), the nonlinear forms of these terms can be learned while simultaneously converging to the solution of the governing PDE. Further, the trained MLPs are used to guide the selection of biologically interpretable mechanistic forms of the PDE terms which provides new insights into the biological and physical mechanisms that govern the dynamics of the observed system. The method is evaluated on sparse real-world data from wound healing assays with varying initial cell densities .
In this work we extend equation learning methods to be feasible for biological applications with nonlinear dynamics and where data are often sparse and noisy. Physics-informed neural networks have recently been shown to approximate solutions of PDEs from simulated noisy data while simultaneously optimizing the PDE parameters. However, the success of this method requires the correct specification of the governing PDE, which may not be known in practice. Here, we present an extension of the algorithm that allows neural networks to learn the nonlinear terms of the governing system without the need to specify the mechanistic form of the PDE. Our method is demonstrated on real-world biological data from scratch assay experiments and used to discover a previously unconsidered biological mechanism that describes delayed population response to the scratch.
Citation: Lagergren JH, Nardini JT, Baker RE, Simpson MJ, Flores KB (2020) Biologically-informed neural networks guide mechanistic modeling from sparse experimental data. PLoS Comput Biol 16(12): e1008462. https://doi.org/10.1371/journal.pcbi.1008462
Editor: Inna Lavrik, OvGU; Medical Faculty, GERMANY
Received: May 26, 2020; Accepted: October 22, 2020; Published: December 1, 2020
Copyright: © 2020 Lagergren et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: https://github.com/jlager/BINNs/tree/master/Data.
Funding: This material was based upon work partially supported by the National Science Foundation Directorate for Mathematical and Physical Sciences under grant DMS-1638521 to the Statistical and Applied Mathematical Sciences Institute and the Division of Integrative Organismal Systems under grant IOS-1838314 to KBF, and in part by National Institute of Aging grant R21AG059099 to KBF. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. REB is a Royal Society Wolfson Research Merit Award holder and also acknowledges the Biotechnology and Biological Sciences Research Council for funding via grant no. BB/R000816/1. MJS acknowledges the Australian Research Council (DP200100177). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Collective migration refers to the coordinated migration of a group of individuals [3, 4]. This process arises in a variety of biological and social contexts, including pedestrian dynamics , tumor progression , and animal development . In the presence of many individuals, differential equation models provide a flexible framework to investigate collective behavior as a continuum [8–12]. A challenge for mathematicians and scientists is to use mathematical models together with spatiotemporal data of collective migration to validate assumptions about the underlying physical and biological laws that govern the observed dynamics. Several factors contribute to the difficulty of this task, even for simple systems/data, some of which include biological forms and levels of noise in the observation process, poor understanding of the underlying dynamics, a large number of candidate mathematical models, implementation of computationally expensive numerical solvers, etc. This work provides a data-driven tool which can alleviate many of these problems by enabling the rapid development and validation of mathematical models from sparse noisy data. The methodology is demonstrated using a case study of scratch assay experiments.
Scratch assays are a widely adopted experiment in cellular biology used to study collective cell migration in vitro as cell populations re-colonize empty spatial regions. These experiments have been used previously to observe population-wide behavior in many different contexts, including wound healing [13–16] and cancer progression . Mathematical modeling of scratch assays plays an important role in the quantification and analysis of population dynamics. This is because (i) the equations and parameters comprising mathematical models are interpretable, providing information about the underlying physical and biological mechanics that drive the observed system, and (ii) when properly calibrated, they are generalizable, affording the ability to make accurate predictions beyond the data set used for calibration.
Reaction-diffusion partial differential equations (PDEs) are frequently used to model scratch assay experiments [2, 8, 11, 18, 19]. The general one-dimensional reaction-diffusion equation that describes the rate of change of a quantity of interest u(x, t) (e.g. cell density) is (1) in which the rate of change of u (i.e. ut) is a function of diffusion, modeled by the function , and reaction or growth, modeled by the function . Note that and depend on the application, and choosing the correct/optimal mechanistic models for these terms is the focus of many current research efforts and remains an open question. The classical Fisher–Kolmogorov–Petrovsky–Piskunov (FKPP) equation is a reaction-diffusion equation that has been used to model a wide spectrum of growth and transport of biological processes. In particular, the FKPP model assumes a scalar diffusivity function and logistic growth function with intrinsic growth rate r and carrying capacity K [19, 20]. Variants of the reaction-diffusion equation have also been used to account for different types of cell interactions during scratch assay experiments. For example, the nonlinear diffusivity function with cell-to-cell adhesion coefficient α was used to model dynamics in which neighboring cells prevent other cells from migrating . Alternatively, a diffusivity function of the form can be used to model dynamics in which cells promote the migration of others . Additional variants of reaction-diffusion equation models have captured cell migration in the presence of growth factors , during melanoma progression , and in response to different drug treatments .
A recent study quantitatively investigated the role of initial cell density by conducting a suite of scratch assay experiments on PC-3 prostate cancer cells with systematically varying initial cell densities . The experimental data was used to calibrate the FKPP equation as well as a variant model known as the Generalized Porous-FKPP equation, which assumes that diffusivity increases with cell density u by using a diffusivity function with diffusion coefficient D, carrying capacity K, and exponent m. Like the FKPP equation, the growth term is also described by the logistic growth function . While the calibrated models approximated the experimental data well in many cases in , the presence of systematic biases between the model solutions and experimental data indicate the existence of additional governing mechanisms that may not be accounted for in these mathematical models. However, the existence of a large number of possible biophysical mechanisms that could play a role in scratch assay dynamics makes the testing of mathematical models against these experimental data computationally challenging. Thereby, this scenario motivates the use of equation learning methods to discover the diffusion and reaction terms directly from the experimental data.
Enabled by advances in computing power, algorithms, and the amount of available data, the field of equation learning has recently emerged as a powerful tool for the automated identification of underlying physical laws governing a set of observation data. The basic assumption in this field is that measured data arise from some unknown n-dimensional dynamical system of the form (2) with quantity of interest u = u(x, t), parameter vector , and appropriate initial and boundary conditions. An example quantity of interest for modeling cell migration dynamics is the cell density at location x and time t. The measured data for a set of spatial points xi, i = 1, …, M, and set of time points tj, j = 1, …, N, are assumed to be corrupted by some form of observation error that may be known or unknown in practice. The goal of equation learning methods is to identify the closed form of in Eq (2) directly from the noisy measurements ui,j. Note that, in order to simulate the learned equation, either the noisy or a denoised version of the initial condition can be used along with an assumed boundary condition (e.g. no-flux) that describes the biological process generating the data.
Two primary sets of methodology have been used in field of equation learning to date: sparse regression [23, 24] and theory-informed neural networks [1, 25]. In the sparse regression framework, numerical methods (e.g. finite differences or polynomial splines) are used to denoise u and approximate the partial derivatives ut, ux, uxx, etc. from a set of data. The approximations are then used to construct a library of nonlinear candidate terms (e.g. 1, u, u2, ux, …, , etc.) thought to comprise the governing system of ordinary differential equations (ODEs) or PDEs. The data relating ut to all possible model terms inside the library are formulated as a linear regression problem in which sparsity promoting techniques are used to select a small subset of library terms that produce the most parsimonious model. While the sparse regression framework has been successfully demonstrated to circumvent searching through a combinatorially large space of possible candidate models, it can require large amounts of training data and the numerical methods used for denoising and differentiation are not robust to biologically realistic forms and levels of noise, leading to inaccuracies in both the constructed library and learned equations . Further, the method assumes the unknown function in Eq (2) can be written as a linear combination of nonlinear candidate terms, which may not be true in practice.
An alternative approach uses function-approximating deep neural networks, i.e., multilayer perceptrons (MLPs), as surrogate models uMLP(x, t) for the solution of the governing dynamical system [27–29]. In particular, physics-informed neural networks (PINNs) [1, 25] assume the mechanistic form of in Eq (2) is pre-specified and then used as a form of regularization in the neural network objective function. The parameters of are allowed to be “learnable,” meaning that the parameters of the governing PDE are calibrated while the neural network is trained to minimize the error between uMLP(xi, tj) and the data ui,j. This methodology ensures that the neural network solution satisfies the physical laws described by while simultaneously fitting the spatiotemporal data. Theory-informed neural networks have been demonstrated with smaller amounts of data in the presence of noise, however, they have so far only been applied to problems where the governing mechanistic PDE is known a priori [30–32].
Hybrid approaches that combine neural networks and sparse regression have also been suggested to address some of the issues surrounding the above methods [26, 33]. In these approaches, neural networks are used as surrogate models for u(x, t) and then used to construct the library of candidate terms for sparse regression using automatic differentiation. These methods have been shown to accurately learn the governing system of equations for a variety of reaction-diffusion models from spatiotemporal data with biologically realistic levels of noise .
All three approaches (i.e. sparse regression, theory-informed neural networks, and hybrids) however, suffer from the model specification problem, in which the governing ODE/PDE model must be specified a priori either explicitly or as a library of candidate terms. Thus, (i) if the true dynamical system contains terms that are not included in the regularization term for theory-informed neural networks, or (ii) if the true terms cannot be represented as a linear combination of nonlinear candidate terms for sparse regression, then these methods will ultimately fail to recover the true system. Further, detecting this issue when determining what the “true” system is in real-use cases is an open question. Where systems with scalar or linear dynamics may be suitable for these approaches, biological systems pose a particular challenge in this respect, since many of the underlying mechanics driving these systems are nonlinear. For example, the Generalized Porous-FKPP model contains a nonlinear diffusivity function with unknown exponent m. These issues help explain why, to the best of our knowledge, equation learning methods have not yet been successfully applied to real-world biological population-level data.
In this work, biologically-informed neural networks (BINNs), an extension of physics-informed neural networks (PINNs) , are presented as a solution to the library specification problem for systems with biological/physical constraints. In this framework, the right-hand-side function of the PDE in Eq (2) is assumed to be a combination of biologically relevant terms. For example, the general form of reaction-diffusion models can be described by the two right-hand-side terms in Eq (1) meaning that the equation learning problem is transformed from learning to learning the diffusivity and growth functions and . Rather than assigning mechanistic forms to each function as in previous equation learning studies, each function is replaced with a separate neural network. This approach leverages the ability of deep neural networks to approximate continuous functions arbitrarily well . Importantly, the form of each learned neural network function can be visualized, thereby enabling a data-driven tool for user-guided conjecture of new mathematical equations that describe each separate term in . Moreover, formulating the equation learning task within the BINNs framework enables the modeler to use domain expertise to include qualitative constraints on the parameter networks (e.g. specifying nonlinear functions that are non-negative, monotone increasing/decreasing, etc.) by selecting appropriate activation functions and loss terms for the optimization.
While BINNs can be used to discover a wide range of governing equations across the biological and physical sciences, including systems of ODEs and PDEs, in this work they are demonstrated using reaction-diffusion PDEs. The BINNs methodology is first tested using synthetic data and then demonstrated on experimental data from scratch assay experiments with variable initial cell densities . Notably, each data set is noisy and sparse, containing only five time measurements across 38 spatial locations. BINNs are used to discover the nonlinear forms of the diffusivity function and growth term of the governing reaction-diffusion equation. Persistent model discrepancy is used to motivate the incorporation of a novel delay term which may have important implications for the reproducibility and modeling of scratch assays. The learned nonlinear forms of the diffusion, growth, and delay terms are used to guide the selection of a mechanistic model with biologically interpretable parameters that remove virtually all of the model discrepancy.
Scratch assay data
Biologically-informed neural networks (BINNs) are evaluated on experimental scratch assay data from . A typical scratch assay involves (i) growing a cell monolayer up to some desired initial cell density, (ii) creating a “scratch” in the interior of the monolayer to produce an empty region, and (iii) recording longitudinal measurements of the cell density during re-colonization of the area. See Fig 1 for a visualization of the experiment.
(a) An illustration of an experiment with the IncuCyte ZOOM system (Essen BioScience, MI USA). Full details of the experiment and image processing can be found in . Cells are seeded uniformly within each well in a 96-well plate at a pre-specified density of between 10,000 and 20,000 cells per well. A WoundMaker (Essen BioScience) is used to create a uniform vertical scratch along the middle of the well. (b) Microscopy images are collected from a rectangular region of the well. (c) Example images corresponding to experiments initiated with 12,000, 16,000, or 20,000 cells per well. A PC-3 prostate cancer cell line was used. The image recording time is indicated on each subfigure and the scale bar corresponds to 300 μm. The green dashed lines in the images in the top row show the approximate location of the leading edge created by the scratch. Each image is divided into equally-spaced vertical columns, and the number of cells in each column divided by the column area is calculated to yield an estimate of the 1-D cell density.
One-dimensional cell density profiles are obtained by manually counting the cells within vertical columns of the two-dimensional image data. For these data, the cell density profiles were reported for six varying initial cell density levels (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cells per well). To make the cell density profiles compatible with neural network training, the data are pre-processed by rescaling the x and t variables to the scales of millimeters (mm) and days, respectively (see Methods Section for more details). Further, the cell density profile at the left boundary is removed from the data because it was identified as an outlier across each of the six data sets. The resulting pre-processed cell densities at 37 spatial points and five time points are shown in Fig 2.
Pre-processed cell density profiles from scratch assay experiments with varying initial cell densities . Each subplot corresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cells per well). The cell densities are reported at 37 equally-spaced positions and five equally-spaced time points.
Biologically-informed neural networks
BINNs are centered around a function-approximating deep neural network, or MLP, denoted by uMLP(x, t) which acts as a surrogate model that approximates the solution to the governing equation described by Eq (2) (Fig 3A). In this work, the governing PDE is assumed to contain two terms, and , that describe the general reaction-diffusion model in Eq (1). Since the true forms of the diffusivity and growth functions are unknown, they are approximated by neural networks and (Fig 3B). Both DMLP and GMLP are continuously differentiable functions that input the predicted cell density uMLP(x, t) and output the corresponding diffusivity or growth value. The advantage of using MLPs for the terms of the governing PDE is that the nonlinear forms of these terms can be learned without specifying them explicitly (or as a library of candidate terms), thus circumventing the model specification problem. Automatic differentiation (Fig 3C) is used to numerically differentiate compositions of uMLP, DMLP, and GMLP in order to construct the general reaction-diffusion model in Eq (1). The resulting PDE (Fig 3D) is used to regularize uMLP during training so that uMLP not only fits the data ui,j but also satisfies the governing reaction-diffusion system.
(A) BINNs are deep neural networks that approximate the solution of a governing dynamical system. (B) By allowing the terms of the dynamical system (e.g. diffusivity function and growth function ) to be function-approximating deep neural networks, the nonlinear forms of these terms can be learned without the need to specify a mechanistic model or library of candidate terms. (C) Automatic differentiation is used on compositions of the different neural network models (e.g. u, D, and G) to construct the PDE that describes the governing dynamical system. (D) The governing system is used in the neural network objective function to jointly learn and satisfy the governing PDE while minimizing the error between the network outputs and noisy observations.
To ensure that the fit to the data and the fidelity to the governing PDE are simultaneously optimized, BINNs are trained with gradient-based methods using the following multi-part objective function: (3) The first term concerns the generalized least squares (GLS) distance between uMLP(xi, tj) and the corresponding observed data ui,j. The observation process is assumed to be described by a statistical error model of the form (4) in which the measured data ui,j are a combination of the underlying dynamical system u(xi, tj) and some random variable wi,j ⊙ εi,j where ⊙ represents element-wise multiplication . In general, the independent and identically distributed (i.i.d.) random variable εi,j is modeled by an n-dimensional normal distribution with mean zero and variance one that is weighted by (5) for γ ≥ 0 and where n is the dimensionality of the system. Note that (i) noiseless data are modeled by letting ω1, …, ωn = 0, (ii) constant-variance error used in ordinary least squares is modeled by letting γ = 0, ω1, …, ωn = 1, and (iii) non-constant-variance error (e.g. proportional error) used in generalized least squares is modeled by letting γ > 0, ω1, …, ωn ≠ 0. Therefore, to account for the statistical error model in Eq (4), the GLS objective function (6) is used with proportionality constant γ = 0.2. Note that γ was tuned numerically following the methodology suggested in  (see Methods Section for more details).
The next term ensures uMLP satisfies the solution of the governing PDE. For ease of notation, let , , and . Then for the reaction-diffusion equation, the error term takes the following form: (7) where LHS and RHS denote the left-hand- and right-hand-sides of the governing PDE, respectively. Thus, by driving to zero, the RHS is trained to match the LHS. Through this process, the nonlinear forms of DMLP and GMLP are learned despite not being directly observed. See the Methods Section for additional implementation details, including a random sampling procedure that enforces this PDE constraint everywhere in the input domain during training.
Biological information and domain expertise are incorporated into the BINNs framework by adding penalties in the loss term . For the reaction-diffusion equation, the diffusivity and growth rates are assumed to be within biologically feasible ranges [Dmin, Dmax] and [Gmin, Gmax], respectively. Further, diffusion is also assumed to be non-decreasing and growth to be non-increasing with respect to cell density. The latter constraints were chosen based on the collective behavior of the unconstrained parameter dynamics (e.g. the unconstrained diffusion terms were generally increasing, but exhibited unrealistic dynamics, including vertical asymptotes, at low cell densities) while the maximum and minimum diffusivity and growth rates considered in  were used to ensure DMLP and GMLP stay within biologically realistic ranges. See the Methods Section for more details. The corresponding constraints take the form: (8) The constraints on DMLP and GMLP shown in Eq (8) were used for all computational results in this work. See the Methods Section for numerical implementation details of these constraints.
BINNs are distinct from previous equation learning approaches in the following ways. First, unlike the neural network / sparse regression hybrid in , which first trains uMLP to fit a set of noisy data and then constructs a library of candidate terms as a separate step for the PDE-FIND algorithm , BINNs include the governing PDE in the objective function of uMLP itself, meaning that uMLP is trained to fit the noisy data while also approximately satisfying the learned PDE. Second, unlike PINNs , BINNs begin from a basic conservation law (e.g. conservation of mass) and use MLPs to determine suitable forms for the terms comprising these laws instead of specifying fixed mechanistic terms that may or may not capture the full system dynamics. Therefore, by replacing the terms of the PDE (e.g. diffusion and growth) with MLPs, BINNs extend PINNs to the class of equation learning methods, since the mechanistic PDE terms do not need to be specified a priori.
Because the model prediction uMLP(x, t) is only a surrogate model for the dynamical system u(x, t), it is possible that this approximation may contain errors, particularly in areas where the PDE constraint given by Eq (7) is not satisfied. To ensure that the inferred diffusion and growth terms lead to biologically realistic dynamics, the reaction-diffusion equation given by Eq (1) is solved numerically with a method-of-lines approach using and . Note that this model is well-defined because DMLP and GMLP are continuously differentiable functions of the cell density, u. Further, BINNs are retrained multiple times for each data set in which the forward simulation using the learned PDE terms that yields the smallest GLS error (Eq (6)) is saved. All fits to the data shown in the Results Section are numerical solutions to the PDE in Eq (1) using the learned diffusivity and growth functions. See the Methods Section for numerical implementation details of the PDE forward solver.
Simulation case study
Since the diffusivity and growth terms are inferred by BINNs through learning DMLP and GMLP, respectively, the ability of BINNs to learn biologically accurate representations of these terms must first be tested. To investigate this, data were simulated using the classical FKPP and Generalized Porous-FKPP equations with parameter values from  for the scratch assay data with initial cell density 20,000 cells per well. Additionally, the simulated data were obscured with artificial observation error using the statistical model in Eq (4) with γ = 0.2. Each simulation used the initial condition from the scratch assay data with initial cell density 20,000 cells per well. Using the same level of sparsity (i.e. 37 spatial points and five time points), the BINNs framework was shown to (i) approximate the dynamical system accurately and (ii) approximate the general forms of the diffusivity and growth terms. See S1 and S2 Figs for the model and parameter fits, respectively. This case study demonstrates that BINNs are able to learn accurate representations of the diffusivity and growth functions from biologically realistic noisy sparse data, however, further analysis, like model selection and comparison, is omitted here and instead explored using experimental data.
Reaction-diffusion BINNs for experimental data
As described in the previous sections, the diffusivity and growth functions are approximated by deep neural networks, and , resulting in a governing PDE of the form (9) where DMLP and GMLP are functions of the cell density u. A BINN was trained for each data set with varying initial cell density. The resulting numerical PDE solutions using the trained DMLP and GMLP are shown in S3 Fig for each data set. While the model fits are excellent for lower initial cell densities, there still remains a significant amount of model discrepancy at higher initial cell densities. GLS residual errors were computed to provide an additional way of visualizing the model discrepancy (see S4 Fig) in which non-i.i.d. residuals are clearly present at higher initial cell densities. To investigate the specific form of the model discrepancy, Fig 4 shows the learned diffusivity and growth functions with the corresponding model fit for the data set with an initial cell density of 20,000 cells per well.
Left: learned diffusivity and growth functions, DMLP and GMLP, evaluated over cell density, u. Right: Predicted cell density profiles using BINNs with the governing reaction-diffusion PDE in Eq (9) for data with initial cell density 20,000 cells per well. Solid lines represent the numerical solution to Eq (9) using DMLP and GMLP. The markers represent the experimental scratch assay data.
Fig 4 reveals clear model discrepancy in two main areas: (i) at high cell densities (i.e. x ∈ [0, 0.25] mm and x ∈ [1.75, 2.0] mm for t ∈ [0, 1] days) where diffusion is negligible and the dynamics are governed primarily by growth; and (ii) at low cell densities (i.e. x ∈ [0.5, 1.5] mm for t ∈ [0, 1] days) where growth is negligible and the dynamics are primarily governed by diffusion. In particular, the discrepancy is largest for early time points where the diffusion and growth dynamics appear too rapid. The solutions of DMLP and GMLP are also qualitatively similar to the classical FKPP equation in which the learned diffusivity function is relatively constant while the learned growth function is approximately linearly decreasing with cell density, u. However, despite DMLP and GMLP learning biologically realistic functions for the diffusivity and growth, the persistent model discrepancy observed across multiple data sets with high initial cell densities (see S3 Fig) suggests that the reaction-diffusion equation described in Eq (9) may be insufficient to fully capture the underlying dynamics of cell migration for these data. From a mathematical modeling perspective, the model discrepancy at early time points suggests the existence of a time delay that scales the magnitude of the density-dependent diffusion and growth rates. Biological reasons behind this phenomenon may include cell damage from the scratch assay protocol or changes in cell functions where more cells become immobile/non-proliferative as the cell density approaches carrying capacity [36–38]. See the Discussion Section for more details.
Delay-reaction-diffusion BINNs for experimental data
Motivated by the model discrepancy for data sets with high initial cell density, the reaction-diffusion equation in Eq (9) was modified by including a time delay described by an additional neural network function TMLP(t). The new term TMLP(t) is a continuously differentiable function of time that is constrained to be non-decreasing and output values between 0 and 1. In this way TMLP can scale the strength of the density-dependent diffusivity and growth terms in time. Letting the diffusivity, , and growth, , terms of the governing PDE be functions of u and t, they are replaced with and . This results in a governing PDE of the form which simplifies to (10) where DMLP and GMLP are functions of the cell density u and TMLP is a function of time t. Note that TMLP was chosen to be separable from DMLP and GMLP since the density-dependent dynamics of diffusion and growth are assumed to be consistent throughout time. Further, it was assumed that both DMLP and GMLP are scaled by the same time delay; see the Discussion Section for more details. BINNs governed by the PDE in Eq (10) were trained for each data set with varying initial cell density. The resulting forward simulations using the trained TMLP, DMLP, and GMLP networks are shown in S5 Fig. The model fits demonstrate that virtually all of the model discrepancy across each initial cell density was removed by including a time delay. This is confirmed visually using GLS residual errors (see S6 Fig) where the residuals are approximately i.i.d. even at higher initial cell densities. In addition to visual inspection between S3 and S5 Figs and residual errors in S4 and S6 Figs, this phenomenon is also reflected in the mean GLS errors between the numerical simulations and scratch assay data over the spatial dimension for each time point (see S7 Fig). In particular, including the novel time delay term results in a significant error reduction, particularly for early time points (i.e. t = 0.5 and t = 1.0 days). Similar to the reaction-diffusion case, Fig 5 shows the learned diffusivity, growth, and delay functions with the corresponding model fit for the data set with initial cell density of 20,000 cells per well.
Left: learned diffusivity and growth functions, DMLP and GMLP, evaluated over cell density, u, and delay function, TMLP, evaluated over time, t. Right: Predicted cell density profiles using BINNs with the governing delay-reaction-diffusion PDE in Eq (10) for data with initial cell density 20,000 cells per well. Solid lines represent the numerical solution to Eq (10) using DMLP, GMLP, and TMLP. The markers represent the experimental scratch assay data.
Fig 5 shows that the model discrepancy in areas with high and low cell densities at early time points has been practically eliminated. This is most clearly seen in the delay-reaction-diffusion model solution at the second time point (i.e. t = 0.5 days), which matches the data more accurately than the reaction-diffusion model in Eq (9) at the same time point (see Fig 4). Moreover, DMLP and GMLP for the delay-reaction-diffusion BINN learned similar forms of the diffusivity and growth compared to the reaction-diffusion case. However, the delay term TMLP reveals that the diffusion and growth dynamics described by DMLP and GMLP are scaled down for early time points (i.e. t < 1) before TMLP converges to 1, allowing DMLP and GMLP to come into full effect. This observation is of particular importance since the majority of scratch assay data are reported within this time delay region (i.e. 4, 6, 12, or 24 hrs) [8, 13, 15, 39]. Importantly, not accounting for a time delay within this region may potentially explain why scratch assay experiments are notoriously difficult to reproduce .
Guided mechanistic model selection
The diffusion, growth, and delay networks DMLP, GMLP, and TMLP were used to guide the selection of biologically realistic mechanistic models for downstream use in a traditional mathematical modeling framework. Each network solution corresponding to the six scratch assay data sets is shown in Fig 6.
The learned diffusivity, DMLP, growth, GMLP, and delay, TMLP, functions extracted from the corresponding BINNs with governing delay-reaction-diffusion PDE in Eq (10). Each line corresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cells per well). Note that DMLP and GMLP have different lengths since they are evaluated between the minimum and maximum observed cell densities corresponding to each data set.
From Fig 6, the learned diffusivities for each experiment with different initial cell density are non-zero when u = 0, suggesting the existence of some constant baseline diffusivity, and appear increasing and concave up as a function of the cell density, u, for u > 0. On the other hand, the learned growth terms are approximately linear, which is consistent with logistic models, and the learned delay terms all exhibit sigmoidal dynamics. Note that the outlying GMLP solution for the scratch assay data set with 10,000 initial cells per well is likely an artifact of the observed cell densities in that experiment not approaching the carrying capacity, and therefore leading to unrealistic learned dynamics. Based on qualitative analysis of these plots, the following mechanistic delay-reaction-diffusion equation is proposed to satisfy each scratch assay data set: (11) with diffusivity, growth, and delay functions (12a) (12b) (12c) respectively. The diffusivity network (DMLP) solutions show significant variability across the scratch assay data sets, so the posited mechanistic term is chosen to respect the observed variability while also being as simple as possible. Therefore, the diffusivity function in Eq (12a) is a combination of (i) the classical FKPP and (ii) the Generalized Porous-FKPP diffusivity functions, with baseline cell diffusivity D0, diffusion coefficient D, and exponent m. This way (i) and (ii) can be seen as nested models of the posited diffusivity function by setting either D = 0 or D0 = 0, respectively. Yet the posited diffusivity is still simple, as it only increases the number of parameters with respect to (ii) by one. Since the growth network (GMLP) solutions are approximately linear and decreasing, the growth function in Eq (12b) is chosen to be the logistic growth function with intrinsic growth rate r and carrying capacity K. Finally, the delay network (TMLP) solutions exhibit sigmoidal dynamics, so the delay function in Eq (12c) is represented by the logistic regression function with parameters β0 and β1. One advantage of using a mathematical model with specified functional forms and parameters described by Eqs (12a)–(12c) is that standard parameter estimation techniques can now be used. This enables a comparison of the BINN-guided model in Eq (11) to other mechanistic models, namely, the classical FKPP and Generalized Porous-FKPP equations.
The BINN-guided delay-reaction-diffusion model in Eq (11) was compared to the classical FKPP equation (13) with diffusion coefficient D, intrinsic growth rate r, and carrying capacity K and Generalized Porous-FKPP equation (14) with additional exponent m. These models were used as a baseline for comparison since they have been identified as the current state-of-the-art in modeling these data [2, 40]. The parameters of each model were optimized numerically using the generalized least squares error function in Eq (6) with the adjusted statistical error model in Eq (4) with γ = 0.2. Note that the carrying capacity was fixed at K = 1.7 × 103 and not optimized because it was empirically validated in . The resulting model fits and parameter values for the classical FKPP and Generalized Porous-FKPP models are shown in S8 and S9 Figs and S1 and S2 Tables. The solutions of the BINN-guided delay-reaction-diffusion model in Eq (11) to each data set are shown in Fig 7.
Predicted cell density profiles using the delay-reaction-diffusion model in Eq (11). Each subplot corresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cells per well). Solid lines represent the numerical solution to Eq (11) using the parameters that minimize in Eq (6). The markers represent the experimental scratch assay data.
The predicted cell density profiles in Fig 7 closely matched the scratch assay data which suggests that the proposed model in Eq (11) with Eqs (12a)–(12c) successfully captured the learned dynamics from TMLP, DMLP, and GMLP. The optimized parameter values across each data set are shown in Table 1. Note that the parameters were rescaled to μm and hours (hr) for comparison with  and .
Table 1 reveals that many of the parameters relating to density-dependent diffusion and growth show trends (e.g. D0 and r increasing) with initial cell density similar to . The implications of this observation are considered in the Discussion Section. To compare the three models quantitatively, the generalized least squares (GLS) errors were computed for each model and data set and reported in Table 2.
The results in Table 2 showed that Eq (11) with Eqs (12a)–(12c) fit each data set more accurately than the classical FKPP or Generalized Porous-FKPP models. This behavior is not surprising given that the BINN-guided model is more complex. Therefore, model selection methods, which balance model accuracy with model complexity, were also used to compare the quality of each model relative to the others. In particular, the modified Akaike Information Criterion (AIC) from  was used to account for the statistical error model in Eq (4). See Table 3 for the AIC scores across each model and data set.
The results in Table 3 showed that the BINN-guided delay-reaction-diffusion model outperforms the classical FKPP and Generalized Porous-FKPP models across all data sets except with initial cell density 14,000 cells per well. This discrepancy follows from Eq (6) where the additional parameters in Eq (11) only slightly decreased the GLS error for the data set with initial density of 14,000. Finally, to quantify the “value” of adding the novel delay term in Eq (12c) the differences between AIC scores for each model and the minimum AIC score, denoted by ΔAIC, are shown in Table 4.
Table 4 suggests that the delay term is most impactful for data sets with large initial density (i.e. 18,000 and 20,000 cells per well) since the ΔAIC scores are significantly larger for these data sets. Biological analysis and explanations for these results are considered in the following Discussion Section.
In this work, biologically-informed neural networks (BINNs) were introduced as a flexible and robust equation learning method for real-world biological applications. The BINNs framework was demonstrated using experimental biological data from scratch assays  and used to discover a delay term that had not yet been considered in the modeling of these data. The trained diffusivity, growth, and delay networks were used to guide the selection of the mechanistic model in Eq (11) with Eqs (12a)–(12c), which was shown to model the data more accurately than the current state-of-the-art models (i.e. classical FKPP and Generalized Porous-FKPP equations). The results shown in this work suggest that the BINNs framework can be successfully applied to a wide range of biological and physical problems where the data are sparse and the governing dynamics are unknown. The biological motivations for various aspects of the BINNs framework and significance of the results are discussed in the following paragraphs.
The model solutions in S3 Fig and Fig 4 indicated that using only density-dependent diffusivity and growth functions and was not sufficient to fully capture the scratch assay dynamics. Fig 4 highlighted this discrepancy at the second time measurement (t = 0.5 days) in which the model failed to capture the areas of both high and low cell density, despite letting and be universal function-approximating neural networks. In particular, the model solutions in the areas of high cell density (i.e. x ∈ [0.0, 0.5] and x ∈ [1.5, 2.0]) showed exponential convergence to the carrying capacity, which successfully captured the data for later time points (t ≥ 1 days) but over-predicted for early time points (t < 1 days). Similarly, diffusion in areas of low cell density (i.e. x ∈ [0.5, 1.5]) over-predicted the cell density profile for early time points but then matched the data accurately for later time points. From a mathematical perspective, this motivates the existence of a time delay that scales the density-dependent dynamics to be reduced for early time points and larger for later time points. There are also several biological motivations for considering a time delay. For example,  showed how cells are damaged at the borders of the scratch as a result of the experimental scratch assay protocol. Cell damage can potentially inhibit the communication between cells and physically block healthy cells from diffusing into uncolonized spatial regions. Another source of delay may stem from changes in density-dependent cell functions (e.g. differentiation, division, and senescence). Studies have shown that cells are more likely to terminally differentiate when cell populations approach carrying capacity [37, 38]. Therefore, scratch assay experiments that are performed for high density populations may contain fewer mobile/proliferative cells at the borders of the scratch, thus causing a time delay in the cell migration dynamics.
A general framework for incorporating the delay term may be to consider diffusivity and growth functions and , respectively. However, since the dynamics of diffusion and growth are assumed to be consistent throughout time, the diffusion and growth terms were chosen to be separable functions composed of diffusivity , growth , and delay . Additionally, it was assumed that both diffusion and growth were scaled by the same time delay as opposed to a diffusion delay and growth delay . This assumption may not be accurate if the time delay is a result of density-dependent changes in cell function where cells become mobile and proliferative at different rates. In particular, since migration and proliferation have very different timescales, it might be natural to expect that the delays would also have different timescales. However, since the numerical solutions using matched the data sufficiently accurately, this question is left for future work. Finally, was constrained to output values between 0 and 1 and forced to be increasing with time. These constraints were chosen to ensure that the delay term modeled the time-dependent changes in cell dynamics for early time points but converged to unity by later time points.
In this work, BINNs revealed that the reaction-diffusion system in Eq (1) with cell density-dependent diffusivity and growth functions was insufficient to capture the data dynamics. However, the model discrepancy for data sets with large initial cell density motivated the development of a time delay which significantly improved the model accuracy and resolved the observed discrepancy. The diffusivity, growth, and delay networks were used to posit a mechanistic model (i.e. Eq (11) with Eqs (12a)–(12c)). Using the logistic growth model (Eq (12b)) for the growth function and logistic regression (Eq (12c)) for the delay function followed straightforwardly from the parameter network solutions in Fig 6, however, the diffusivity function in Eq (12a) warrants further discussion.
Opinions vary between the biological validity of (i) the classical FKPP and (ii) the Generalized Porous-FKPP diffusivity functions. For example, one study compared (i) and (ii) using experimental wound size data and found that (ii) with m = 4 provided the best fit to the data . Another study fit (i) and (ii) to experimental cell migration data with different cell populations and found that one population was best described by constant diffusivity in (i) and the other by nonlinear diffusivity with m = 1 in (ii) . These studies do not reveal which approach is best, but they demonstrate that care is warranted. Thus, the posited diffusivity function in Eq (12a) was chosen to respect the observed variability of the diffusivity network (DMLP) solutions (Fig 6) while being as simple as possible (i.e. a combination of the classical FKPP and Generalized Porous-FKPP diffusivity functions). It may be the case that the true diffusivity function is even more complex, such as a linear combination of powers: with baseline diffusivity D0, diffusion rates D1 and D2, carrying capacity K, and exponents m1 and m2. However, these considerations are beyond the scope of the present work and left for future work.
The parameters of (i) the classical FKPP in Eq (13), (ii) the Generalized Porous-FKPP in Eq (14), and (iii) the BINN-guided model in Eq (11) with Eqs (12a)–(12c) were optimized numerically for each scratch assay data set. The optimized parameters for (i) in S1 Table all fall within the ranges reported in . However, this is not the case for any set of parameter values for (ii) as shown in S2 Table. This is likely due to the parameter optimization being conducted using the adjusted statistical error model in Eq (4) with γ = 0.2 and since the exponent m in the Porous-FKPP diffusivity function was not fixed at m = 1 as in . However, in both (i) and (ii), the diffusion coefficient, D, and intrinsic growth rate, r, showed variability with initial cell density, similar to the conclusions drawn in . Therefore, in theory, if the delay term in Eq (12c) accounts for the time it takes for density-dependent growth and diffusion to become active in the system, which may be a function of initial cell density, then the variability among diffusion coefficients and intrinsic growth rates for the BINN-guided delay-reaction-diffusion model should be reduced across the scratch assay experiments. However, from the optimized parameter values in Table 1, the baseline diffusion rate D0 and intrinsic growth rate r generally increase with initial cell density and the diffusion coefficient D generally decreases with initial cell density. This observation may indicate (i) practical identifiability issues between the diffusion, growth, and delay terms or (ii) the existence additional mechanisms that are not accounted for in the model. To confirm this, a Bayesian parameter estimation framework can be used to examine practical identifiability of parameters [44, 45]. Then, a possible strategy to mitigate this issue would be to optimize the parameters of Eq (11) with Eqs (12a) and (12b) jointly across each scratch assay data set while allowing the delay parameters in Eq (12c) to be tuned separately for each set. This exploration is left for future work.
The BINN-guided delay-reaction-diffusion model was compared to the baseline classical FKPP and Generalized Porous-FKPP models using both GLS errors and modified AIC scores. The GLS errors in Table 2 showed that the BINN-guided model fits the data more accurately than the baseline models across each scratch assay data set. However, this improvement in accuracy is due to the increased model complexity (i.e. number of parameters and PDE terms) in the BINN-guided model. Therefore, to rank the quality of each model, AIC scores were also computed since they balance model accuracy with model complexity. The AIC scores reported in Table 3 indicate that the BINN-guided model also exceeds the baseline models in terms of relative quality across each scratch assay data set except with initial cell density 14,000 cells per well, in which the Generalized Porous-FKPP model has a slightly smaller AIC score. In other words, Tables 2 and 3 indicate that the BINN-guided model performs as well or better than the state-of-the-art in modeling the suite of scratch assay experiments from . In particular, this advantage is afforded by including the delay term in Eq (12c). To quantify the relative value of adding the delay term, the AIC scores from Table 3 are used to compute difference AIC (ΔAIC) scores in Table 4 in which the ΔAIC score for a fixed model and data set is given by the difference between the corresponding AIC score and the minimum AIC score across all models for the given data set. The ΔAIC scores in Table 4 indicate that the relative value of the delay term is largest for data sets with initial cell density 18,000 and 20,000 cells per well. This observation is supported by the relevant biology discussed at the beginning of this section, in which large initial cell densities either (i) result in more damaged cells near the borders of the scratch, (ii) cause more cells in the population to have terminally differentiated away from mobile/proliferative cell functions, or (iii) some combination of (i) and (ii) and other potentially unconsidered biological sources, all of which increase the potential time delay before the density-dependent diffusion and growth dynamics become the primary drivers of the temporal evolution of the system.
Conclusions and future work
BINNs, a robust and flexible framework for equation learning with sparse and noisy data, was demonstrated and used to posit a mechanistic equation that outperforms the state-of-the-art in modeling experimental scratch assay data. The development, training, and evaluation of BINNs and the resulting model selection and analysis were reported to justify these claims. The discovered time delay term may have important implications for the reproducibility and modeling of scratch assays, since the majority of the reported data fall within the time delay region. Some of the drawbacks of the BINNs method and opportunities for future work and development are discussed below.
Since BINNs rely on multilayer perceptrons (MLPs), the learned dynamics may not generalize well outside the training domain. For example, in the present work, if the observed cell densities for a particular experiment do not approach the carrying capacity (e.g. the scratch assay data set with 10,000 initial cells per well) then the learned dynamics given by DMLP and GMLP may lead to biologically unrealistic behavior (see GMLP solutions in Fig 6). Further, since none of the scratch assay data reported values that significantly exceeded the empirically set carrying capacity, GMLP would likely not generalize well to a scenario with exceedingly large observed cell densities. Options for mitigating this issue include (i) replacing unrealistic MLP terms with mechanistic models (e.g. logistic growth instead of GMLP) if the particular dynamics are known a priori, or (ii) adding additional constraints which force the MLP terms to satisfy specific values (e.g. GMLP(u = K) = 0). Additional testing of out-of-sample generalizability may involve applying the BINNs methodology to spatiotemporal data with more time measurements. In this setting, a subset of time points are held out from the training procedure and generalizability is tested by comparing the forward solution of the learned PDE against the holdout set.
An opportunity for future development is quantifying the uncertainty of both the approximate solution uMLP and the parameter networks DMLP, GMLP, and TMLP. From the frequentist perspective, so called “subagging” (i.e. subsample aggregating) can be used to build posterior distributions of the model solutions and parameter networks . In this framework, one simply samples N training/validation splits and trains a BINN for each split. Then kernel density estimation or some other equivalent methodology can be used to build distributions from the N number of trained BINNs. Alternatively, from the Bayesian perspective, physics-informed neural networks were recently extended to Bayesian physics-informed neural networks (B-PINNs) . In this framework, Bayesian neural networks are substituted for uMLP and regularized using a pre-specified governing PDE. In the BINNs framework, Bayesian neural networks could also be substituted for DMLP, GMLP, and TMLP to quantify the uncertainty of the PDE terms in addition to the model solution.
While BINNs were demonstrated using one-dimensional reaction-diffusion PDEs for scratch assay data in this work, they can be applied on a wide spectrum of physical and biological problems (for both ODE and PDE systems) in which the governing dynamics are unknown and highly nonlinear. A straightforward next step for this work would be to evaluate BINNs on the two-dimensional scratch assay image data that were used to construct the one-dimensional cell density profiles in . Further, more complicated cell dynamics could be incorporated into the governing system in the present work by including PDE terms that describe cell population heterogeneity or additional biological mechanisms for damaged (but not dead) cells at the borders of the scratch.
BINNs were used to address a canonical problem in the field of collective cell migration by analyzing how the combination of density-dependent cell motility and proliferation drive the temporal dynamics of cell invasion during an experimental scratch assay. This novel framework revealed new mechanistic and biological insights into this process by guiding the derivation of a mathematical model that has not been considered previously using traditional mathematical modeling approaches. The classical FKPP and Generalized Porous-FKPP models are ubiquitous in modeling cell migration and proliferation, yet the BINNs methodology presented here revealed that these models may fail to incorporate all of the relevant mechanisms underlying this process. These results suggest that new models incorporating a time delay may be necessary to accurately capture the dynamics within the first day of a scratch assay, i.e., just after the scratch is introduced. Based on the success of this work, BINNs establish a new paradigm for data-driven equation learning from sparse and noisy data that could enable the rapid development and validation of mathematical models for a broad range of real-world applications throughout biology including ecology, epidemiology, and cell biology.
All methods herein were implemented in Python 3.6.8 using the PyTorch 1.2.0 deep learning library. All data and code are made publicly available at https://github.com/jlager/BINNs. The following section is intended to make BINNs feasible for a wide range of biological applications. In particular, this section covers (i) the importance of data pre-processing, (ii) strategies for using real-world knowledge to design effective neural network models, (iii) the complete training protocol ranging from selecting appropriate statistical error models and hyperparameters to balancing the multi-objective error function, and (iv) numerical implementation details for forward solving BINN-guided PDEs.
Input and output standardization are common practice to stabilize neural network training . Since the scratch assay data in  reported cell densities on the order of at spatial locations on the order of μm for time points on the order of hours, these variables needed to be standardized. Without standardization, the neural network models failed to converge for these data because (i) the network inputs (x and t) differed by several orders of magnitude from each other and (ii) the network inputs (x and t) and outputs (u) also differed by several orders of magnitude. By rescaling x and t to millimeters (mm) and days, respectively, the adjusted variables ranged from mm, days, and cell density . Standardizing x and t addressed (i) while (ii) is addressed by using scaling factors discussed in the following section. The cell density profile at the left boundary was removed since it was consistently larger than the remaining cell densities across all six data sets.
BINNs are centered around uMLP, a function-approximating multilayer perceptron (MLP) (also known as an artificial neural network). MLPs, like polynomials , are in the class of universal function approximators, meaning that they can approximate any continuous bounded functions on a closed interval arbitrarily well under some reasonable assumptions . However, there are several reasons for choosing MLPs over polynomials for equation learning. For example, a recent study found that MLPs were superior to both local and global polynomial spline regression for data smoothing and numerical differentiation in the presence of biologically realistic noise . Further, due to gradient-based optimization, MLPs can seamlessly incorporate complex multi-objective loss functions (e.g. Eq (3)) and are generally more stable to train since they do not involve taking large powers of their inputs. For the scratch assay data in the present work, uMLP inputs spatiotemporal vectors x = [x, t] and outputs the corresponding approximations to the cell density u. To give uMLP sufficient capacity to approximate the solution to the governing PDE, the network is chosen to have three hidden layers with 128 neurons in each layer, resulting in a model with approximately 30,000 total parameters. Note that, unlike in traditional mathematical modeling approaches, in practice neural networks are typically chosen to be larger than necessary to fit the data in a given application. However, regularization and optimization techniques are then used to monitor and prevent the networks from overfitting. These techniques are discussed in more detail in the following subsection. Concretely, uMLP takes the form (15) where the trainable parameters Wi and bi denote weight matrices and bias vectors for the ith layer, σ(⋅) and ϕ(⋅) denote nonlinear activation functions, and α denotes a scaling factor. Each hidden layer uses a “sigmoid” activation function (i.e. σ(x) = 1/(1 + e−x)) while the output layer uses a “softplus” activation function (i.e. ϕ(x) = ln(1 + ex)). The softplus activation function is a particular design choice since it is a continuously differentiable function that forces the predicted cell densities to be non-negative, and has been previously shown to be well-suited for biological transport models . Finally, to account for the difference in scale between the inputs () and outputs (), the MLP outputs are post-multiplied by the experimentally validated carrying capacity (i.e. α = 1.7 × 103) from . Note that in practice, if values like this are unknown, one can simply let α be the maximum observed cell density or some other similar quantity. The key here is to ensure the orders of magnitude between the network inputs and outputs are similar so that the parameters of the MLP do not have to account for the change of scale .
The diffusivity, growth, and delay functions of the governing PDEs are modeled with neural networks DMLP(uMLP) and GMLP(uMLP), and TMLP(t). All three MLPs share the same number of layers as uMLP but use 32 neurons per layer. These networks are chosen to be smaller for both computational efficiency and because the parameter dynamics are assumed to be simpler than the cell density dynamics u. The hidden layers use sigmoid activation functions. The output layer for DMLP uses a softplus activation because diffusion is assumed to be non-negative for all cell densities. Since the growth term can be negative (e.g. logistic growth when the cell density exceeds the carrying capacity), a linear output (i.e. no activation function) is used in the final layer for GMLP. The output layer for TMLP uses the sigmoid function to constrain the outputs to (0, 1). Finally, as with uMLP, the inputs and outputs of DMLP and GMLP are also standardized. In particular, the inputs of both networks (i.e. uMLP) are divided by the carrying capacity K = 1.7 × 103 while the outputs of DMLP are multiplied by and the outputs of GMLP are multiplied by . These values were the maximum diffusion and growth values considered in . Similar to uMLP, the input and output scaling factors ensure the MLP parameters do not have to account for changes in scale. No standardization was used for TMLP since its inputs and outputs are of the same order (i.e. ).
The BINN parameters (i.e. weights and biases of uMLP, DMLP, GMLP, and TMLP) are optimized using the first-order gradient-based Adam optimizer  with default hyper-parameters and minibatch-optimization. To prevent over-fitting, the scratch assay data were randomly partitioned into 80%/20% training and validation sets. The network parameters were updated iteratively to minimize in Eq (3) on the training set and saved on relative improvement in validation error. In other words, the model parameters were saved if the relative difference between (i) the validation error in the current iteration and (ii) the smallest recorded validation error exceeded 5%. Finally, since the parameters of each BINN are randomly initialized and applied to different data sets, early stopping of 5,000 (i.e. training was stopped if the relative validation error did improve for 5,000 consecutive epochs) was used to guarantee the convergence of each BINN independently. The implementation details of each term in (i.e. , , and ) are discussed in more detail below.
The first term in Eq (6) corresponds to the generalized least squares (GLS) distance between uMLP and the observation data ui,j. Since the error process is assumed to be i.i.d., the parameters of the statistical model in Eq (4) (i.e. γ) must first be calibrated. Following , uMLP is trained using as an objective function for γ = 0.0, 0.2, 0.4, 0.6 (recall that γ = 0.0 represents the ordinary least squares case) for each data set. After qualitative assessment of the modified residual errors (see S11 Fig), γ = 0.2 was identified as the value that produced the most i.i.d. residuals across each of the six data sets. Using the calibrated statistical error model, is evaluated at each training iteration using mini-batches (i.e. randomly selected subsets) of input/output data. In general, using a small batch size acts as an additional form of regularization that helps neural networks escape local minima during training and allows for better generalization . However, this significantly increases the computational cost of training due to the increased number of training iterations needed to converge. Therefore, BINNs were trained using mini-batches of size 37 (i.e. 1/4 the number of points in the training set) which was found to balance the accuracy and computational cost.
To ensure uMLP satisfies the solution of the governing PDE, the terms in Eq (7) and in Eq (8) are included in as a form of regularization. However, since the scratch assay data are sparse, simply training uMLP using at the observed data locations can result in unrealistic dynamics in between data points. Therefore, to ensure uMLP satisfies the solution of a governing PDE everywhere in the input domain, and are evaluated at 10,000 uniformly randomly sampled points xi ∈ [xmin, xmax] and tj ∈ [tmin, tmax] at each training iteration. Without the random sampling procedure, uMLP can severely overfit to the data. To illustrate the importance of the random sampling procedure, the model fits, GLS errors, and PDE errors are shown in S12 Fig for three cases in which (i) no PDE regularization is used, (ii) PDE regularization is used at the data locations, and (iii) PDE regularization is used at 10,000 randomly sampled points. In particular, S12 Fig shows that in option (i) uMLP overfits the data practically everywhere in the input domain, (ii) uMLP overfits everywhere except at the data locations (see vertical lines in third subplot of row b), and (iii) the random sampling procedure results in the smallest amount of PDE error and the largest amount of GLS error. The desired behavior is shown in option (iii) since uMLP fits the data as accurately as allowed by the governing PDE.
The third error term constrains DMLP, GMLP, and TMLP to exhibit biologically realistic values and dynamics. Choosing appropriate constraints can be ambiguous when the relevant literature gives conflicting suggestions. For example, when designing a derivative constraint for the diffusivity network DMLP,  suggest that diffusion should decrease with cell density due to cell-to-cell adhesion whereas  suggest the opposite in which cells promote the migration of others. To mitigate this, BINNs were trained without any constraints on DMLP and GMLP in order to visualize the collective behavior of the parameter networks (see S10 Fig). Note that TMLP was still forced to be non-decreasing. The network evaluations in S10 Fig showed unrealistic parameter dynamics for some data sets, but their collective behavior was used to design derivative constraints that forced DMLP to increase as a function of cell density and GMLP to decrease with cell density for the set of scratch assay data considered in this work. Concretely, the diffusion term DMLP was constrained to values between 0.0 and and the growth term GMLP to values between −0.48 and . The maximum and minimum diffusion values and maximum growth value were chosen based on values used in . The minimum growth value was chosen to be negative 20% of the maximum growth value to allow GMLP to output negative values for cell densities near the carrying capacity if needed. The sigmoid output activation function for the delay term TMLP constrained its outputs to between 0 and 1. Derivative terms were used in to constrain DMLP and TMLP to be non-decreasing and GMLP to be non-increasing. For ease of notation, let , , , and , then the constraint term can be written concretely as (16) Since the parameter networks and their derivatives occur at different scales with respect to each other and with respect to the error terms and , each term of Eq (16) is weighted by a factor αi. In particular, each constraint is weighted based on the input/output scaling factors of the corresponding neural network (see Network Design subsection). Concretely, the terms in Eq (16) are weighted by , and α5 = 1010. Note that the weight factors for the derivative constraints on DMLP and GMLP (i.e. α2 and α4) include the carrying capacity K = 1.7 × 103 since K was used as an input scaling factor for these networks. The factor 1010 was chosen large enough to guarantee that DMLP, GMLP, and TMLP exhibited the desired behavior. Boundary conditions can also be included in the term, however, since they were unknown for the scratch assay data considered in this work, no boundary conditions were used to train uMLP.
Finally, the GLS errors at the initial condition (i.e. data locations where t = 0) were weighted by a factor of 10 during training. This was found to improve the generalization accuracy of DMLP, GMLP, and TMLP when evaluated using a numerical PDE solver. The reason for this is because the cell density at t = 0 may not satisfy a governing dynamical system since the measurement is taken directly after the scratch assay protocol is performed . However, the initial condition “sets the stage” for the governing dynamics to drive the temporal evolution of the system. Therefore, by weighting the initial condition more heavily in , the PDE error term must conform uMLP to satisfy the governing system for t > 0 as dictated by uMLP at t = 0. This step forced DMLP, GMLP, and TMLP to learn more generalizable representations of the diffusivity, growth, and delay functions, respectively. The weighting factor was numerically validated using the mean GLS error across each scratch assay experiment for weighting factors 1, 10, and 102. Note that this weighting factor makes BINNs sensitive to the random choice of training/validation split, since some data points in the initial condition may be more informative than others for equation learning and ultimate model generalizability. This observation was also noted in a recent equation learning study in which the random split of training and validation sets was found to influence the structure of the learned equation . Adopting a strategy similar to this previous study, BINNs were trained 20 times for each data set (using different random training/validation splits). The BINN for which the numerical simulations resulted in the smallest GLS error was saved as the best model.
While the generalizability of each BINN is tested using a numerical solution to the learned PDE, it is unclear whether DMLP, GMLP, and TMLP can learn generalizable dynamics while overfitting to the training set. Thus, for completeness, see S13 Fig for an example convergence plot of the delay-reaction-diffusion BINN trained on the scratch assay data with 20,000 initial cells per well which confirms that the BINN does not overfit to the training data. Note that the validation error is smaller than the training error because of the weighting factor applied to the initial condition, i.e., the training set contains more weighted points than the validation set.
PDE Forward Solver
The numerical implementation details are provided for systems describing quantity of interest u(x, t) that are governed by the following equation: (17) for x ∈ [x0, xf], and t ∈ [t0, tf]. Note that the reaction-diffusion model in Eq (1) is an example of Eq (17) where and . In Eq (17), the initial condition is denoted by ϕ(x) and the boundary conditions are assumed to be no-flux boundary conditions. Note that the no-flux condition represents a zero net flux boundary condition which does not preclude cells moving across the boundary, but instead reflects the situation in which the flux in the positive and negative x-directions are equal, giving rise to zero total flux. The spatial and temporal domains are discretized into equispaced grids as: (18) for i = 0, …, 200 and j = 0, …, 1, 000. For notational convenience, let ui(t) = u(xi, t). Then, the method-of-lines approach is used to solve Eq (17) with the numerical discretization from  that is given by (19) where Pi+1/2(t) is an estimate for the rightwards diffusive flux at location xi that is given by (20) The no-flux boundary conditions at x0 and x200 are implemented by incorporating the ghost points x−1 and x201 satisfying u−1(t) = u1(t) and u201(t) = u199(t). The Scipy integration subpackage (version 1.4.1) is used to integrate Eq (17) over time using an explicit fourth order Runge-Kutta Method.
The parameters of each mechanistic model were optimized using the Limited-memory BFGS algorithm with bound constraints (L-BFGS-B) in Python’s Scipy package with default tolerance values to minimize the generalized least squares error function in Eq (6) with the adjusted statistical error model in Eq (4) with γ = 0.2. The parameters for Eqs (13) and (14) were initialized using the values from . The parameters for Eq (11) were initialized by fitting each PDE term in Eqs (12a)–(12c) to the corresponding parameter network solutions in Fig 6 using ordinary least squares. Finally, the diffusivity and growth function parameters were bounded using , , mmin = 0, mmax = 4, , and (all of which come from ), while the delay function parameters β0 and β1 were bounded by [−10, 10].
S1 Fig. Simulation model fits.
Predicted cell density profiles using BINNs with the governing reaction-diffusion PDE in Eq (9). The left subplot corresponds to the set of simulated data using the classical FKPP equation and the right subplot corresponds to the Generalized Porous-FKPP equation. Solid lines represent the numerical solution to Eq (9) using DMLP, and GMLP. Dashed lines represent the noiseless numerical simulations of the classical FKPP and Generalized Porous-FKPP equations. The markers represent the numerical simulations of the classical FKPP and Generalized Porous-FKPP equations with artificial noise generated by the statistical error model in Eq (4).
S2 Fig. Simulation parameter fits.
The learned diffusivity and growth functions DMLP and GMLP evaluated over cell density u. Starting from the left, the first two subplots correspond to the learned diffusivity and growth functions from simulated data using the classical FKPP equation. The last two subplots correspond to the learned diffusivity and growth functions from simulated data using the Generalized Porous-FKPP equation. Solid lines represent the parameter networks DMLP and GMLP and dashed lines represent the true diffusivity and growth functions used to simulate the data.
S3 Fig. Reaction-diffusion BINN solutions.
Predicted cell density profiles using BINNs with the governing reaction-diffusion PDE in Eq (9). Each subplot corresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cells per well). Solid lines represent the numerical solution to Eq (9) using DMLP and GMLP. The markers represent the experimental scratch assay data.
S4 Fig. Reaction-diffusion BINN residuals.
Modified residuals using BINNs with the governing reaction-diffusion PDE in Eq (9). Each subplot corresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cells per well).
S5 Fig. Delay-reaction-diffusion BINN solutions.
Predicted cell density profiles using BINNs with the governing delay-reaction-diffusion PDE in Eq (10). Each subplot corresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cells per well). Solid lines represent the numerical solution to Eq (10) using TMLP, DMLP, and GMLP. The markers represent the experimental scratch assay data.
S6 Fig. Delay-reaction-diffusion BINN residuals.
Modified residuals using BINNs with the governing delay-reaction-diffusion PDE in Eq (10). Each subplot corresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cells per well).
S7 Fig. Spatial errors between BINN solutions.
Mean GLS errors between the reaction-diffusion and delay-reaction-diffusion BINNs over the spatial dimension for each time point beyond the initial condition. The initial condition is excluded since the PDE solutions are simulated using the initial condition of the data, meaning that the error at t = 0 is zero. Each subplot corresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cells per well).
S8 Fig. Classical FKPP model solutions.
Predicted cell density profiles using the classical FKPP model in Eq (13). Each subplot corresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cells per well). Solid lines represent the numerical solution to Eq (13) using the parameters that minimize in Eq (6). The markers represent the experimental scratch assay data.
S9 Fig. Generalized Porous-FKPP model solutions.
Predicted cell density profiles using the Generalized Porous-FKPP model in Eq (14). Each subplot corresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cells per well). Solid lines represent the numerical solution to Eq (14) using the parameters that minimize in Eq (6). The markers represent the experimental scratch assay data.
S10 Fig. Unconstrained BINN terms.
The learned diffusivity DMLP, growth GMLP, and delay TMLP functions extracted from the corresponding BINNs with governing reaction-diffusion PDE in Eq (9) (first row) and delay-reaction-diffusion PDE in Eq (10) (second row). Each line corresponds to an experiment with a different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cells per well). Note that DMLP and GMLP have different lengths since they are evaluated between the minimum and maximum observed cell densities corresponding to each data set.
S11 Fig. Statistical error model selection.
The function-approximating deep neural network uMLP is trained using for different values of γ across each data set. Each subplot shows the modified residuals (see Eq (6)) as a function of the predicted cell density u. The columns correspond to different levels of proportionality (i.e. γ = 0.0, 0.2, 0.4, 0.6) where γ = 0.0 represents the constant variance (ordinary least squares) case. Each row (a-f) corresponds to an experiment with different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cells per well). The proportionality constant that results in the most i.i.d. residuals across each data set was chosen to calibrate the statistical error model in Eq (4).
S12 Fig. PDE random sampling validation.
The BINNs framework is trained using with three ways of including the PDE error term : (a) no PDE regularization, (b) PDE regularization at the data locations, and (c) PDE regularization at 10,000 randomly sampled points at each training iteration. The first column shows the scratch assay data with initial cell density 20,000 cells per well (black dots) with the corresponding BINNs approximation to the governing PDE uMLP (surface plot). The second column shows heatmaps of the modified residual errors (see Eq (6)) at each data point. The third column shows heatmaps of the PDE errors (see Eq (7)) evaluated on a 100 × 100 meshgrid over the input domain.
S13 Fig. BINNs Convergence.
Example convergence and improvement plots from training a delay-reaction-diffusion BINN to the scratch assay data with 20,000 initial cells per well. The left subplot shows the training and validation errors (see Eq (3)) in red and blue, respectively, and the black dot shows where the model achieved the best validation error. Similarly, the right subplot shows the training and validation error but only when the error improved.
S1 Table. Classical FKPP parameter values.
Each column corresponds to an experiment with different initial cell density (i.e. 10,000, 12,000, 14,000, 16,000, 18,000, and 20,000 cells per well).
- 1. Raissi M, Perdikaris P, Karniadakis GE. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics. 2019;378:686–707. https://doi.org/10.1016/j.jcp.2018.10.045.
- 2. Jin W, Shah ET, Penington CJ, McCue SW, Chopin LK, Simpson MJ. Reproducibility of scratch assays is affected by the initial degree of confluence: Experiments, modelling and model selection. Journal of Theoretical Biology. 2016;390:136–145. https://doi.org/10.1016/j.jtbi.2015.10.040.
- 3. Friedl P, Gilmour D. Collective cell migration in morphogenesis, regeneration and cancer. Nature Reviews Molecular Cell Biology. 2009;10(7):445–457.
- 4. Vicsek T, Czirok A, Ben-Jacob E, Cohen I, Sochet O. Novel type of phase transition in a system of self-driven particles. Physical Review Letters. 1995;75(6):1226–1229.
- 5. Helbing D, Molnár P. Social force model for pedestrian dynamics. Physical Review E. 1995;51(5):4282–4286.
- 6. Gallaher J, Anderson ARA. Evolution of intratumoral phenotypic heterogeneity: the role of trait inheritance. Interface Focus. 2013;3(4):20130016.
- 7. McLennan R, Schumacher LJ, Morrison JA, Teddy JM, Ridenour DA, Box AC, et al. Neural crest migration is driven by a few trailblazer cells with a unique molecular signature narrowly confined to the invasive front. Development. 2015;142(11):2014–2025. pmid:25977364
- 8. Arciero JC, Mi Q, Branca MF, Hackam DJ, Swigon D. Continuum model of collective cell migration in wound healing and colony expansion. Biophysical Journal. 2011;100(3):535–543.
- 9. Dyson L, Baker RE. The importance of volume exclusion in modelling cellular migration. Journal of Mathematical Biology. 2015;71(3):691–711.
- 10. Johnston ST, Simpson MJ, Baker RE. Mean-field descriptions of collective migration with strong adhesion. Physical Review E. 2012;85(5):051922.
- 11. Nardini JT, Chapnick DA, Liu X, Bortz DM. Modeling keratinocyte wound healing: cell-cell adhesions promote sustained migration. Journal of Theoretical Biology. 2016;400:103–117.
- 12. Topaz CM, D’Orsogna MR, Edelstein-Keshet L, Bernoff AJ. Locust dynamics: behavioral phase change and swarming. PLOS Computational Biology. 2012;8(8):e1002642.
- 13. Aoki K, Kondo Y, Naoki H, Hiratsuka T, Itoh RE, Matsuda M. Propagating wave of ERK activation orients collective cell migration. Developmental Cell. 2017;43(3):305–317.e5.
- 14. Chapnick DA, Liu X. Leader cell positioning drives wound-directed collective migration in TGF beta-stimulated epithelial sheets. Molecular Biology of the Cell. 2014;25(10):1586–1593.
- 15. Matsubayashi Y, Ebisuya M, Honjoh S, Nishida E. ERK activation propagates in epithelial cell sheets and regulates their migration during wound healing. Current Biology. 2004;14(8):731–735.
- 16. Nikolic DL, Boettiger AN, Bar-Sagi D, Carbeck JD, Shvartsman SY. Role of boundary conditions in an experimental model of epithelial wound healing. American Journal of Physiology—Cell Physiology. 2006;291(1):C68–C75.
- 17. Haridas P, Penington CJ, McGovern JA, McElwain DLS, Simpson MJ. Quantifying rates of cell migration and cell proliferation in co-culture barrier assays reveals how skin and melanoma cells interact during melanoma spreading and invasion. Journal of Theoretical Biology. 2017;423:13–25.
- 18. Johnston ST, Shah ET, Chopin LK, Sean McElwain DL, Simpson MJ. Estimating cell diffusivity and cell proliferation rate by interpreting IncuCyte ZOOM assay data using the Fisher-Kolmogorov model. BMC Systems Biology. 2015;9(38). pmid:26188761
- 19. Maini PK, McElwain DLS, Leavesley DI. Travelling waves in a wound healing assay. Applied Mathematics Letters. 2004;17:575–580.
- 20. Baldock AL, Ahn S, Rockne R, Johnston S, Neal M, Corwin D, et al. Patient-specific metrics of invasiveness reveal significant prognostic benefit of resection in a predictable subset of gliomas. PLOS ONE. 2014;9(10):e99057. pmid:25350742
- 21. Anguige K, Schmeiser C. A one-dimensional model of cell diffusion and aggregation, incorporating volume filling and cell-to-cell adhesion. Journal of Mathematical Biology. 2009;58(3):395.
- 22. Dale PD, Olsen L, Maini PK, Sherratt JA. Travelling waves in wound healing. FORMA. 1995;10(3):205–222.
- 23. Brunton SL, Proctor JL, Kutz JN. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences. 2016;113(15):3932–3937.
- 24. Rudy SH, Brunton SL, Proctor JL, Kutz JN. Data-driven discovery of partial differential equations. Science Advances. 2017;3(4). pmid:28508044
- 25. Yang L, Meng X, Karniadakis GE. B-PINNs: Bayesian Physics-Informed Neural Networks for Forward and Inverse PDE Problems with Noisy Data; 2020.
- 26. Lagergren JH, Nardini JT, Lavigne GM, Rutter EM, Flores KB. Learning partial differential equations for biological transport models from noisy spatio-temporal data. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2020;476(2234):20190800.
- 27. Lagaris I.E., Likas A, Fotiadis D.I. Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks. 1998;9(5):987–1000.
- 28. Berg J, Nyström K. A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing. 2018;317:28–41.
- 29. Sirignano J, Spiliopoulos K. A deep learning algorithm for solving partial differential equations. Journal of Computational Physics. 2018;375:1339–1364.
- 30. Raissi M. Deep hidden physics models: deep learning of nonlinear partial differential equations. Journal of Machine Learning Research. 2018;19.
- 31. Sahli Costabal F, Yang Y, Perdikaris P, Hurtado D.E., Kuhl E. Physics-Informed Neural Networks for Cardiac Activation Mapping. Frontiers in Physics. 2020;8:42.
- 32. Raissi M, Yazdani A, Karniadakis GE. Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science. 2020;367(6481):1026–1030.
- 33. Both GJ, Choudhury S, Sens P, Kusters R. DeepMoD: Deep learning for model discovery in noisy data; 2019.
- 34. Hornik K. Approximation capabilities of multilayer feedforward networks. Neural Networks. 1991;4(2):251–257.
- 35. Banks HT, Hu S, Thompson WC. Modeling and inverse problems in the presence of uncertainty. Chapman and Hall/CRC; 2014.
- 36. Dydowiczova A, Brozman O, Babica P, Sovadinova I. Improved multiparametric scrape loading-dye transfer assay for a simultaneous high-throughput analysis of gap junctional intercellular communication, cell density and viability. Scientific Reports. 2020;10.
- 37. Poumay Y, Pittelkow MR. Cell density and culture factors regulate keratinocyte commitment to differentiation and expression of suprabasal K1/K10 keratins. Journal of Investigative Dermatology. 1995;104(2):271–276.
- 38. Neurohr GE, Amon A. Relevance and regulation of cell density. Trends in Cell Biology. 2020;30(3):213–225.
- 39. Bindschadler M, McGrath JL. Sheet migration by wounded monolayers as an emergent property of single-cell dynamics. Journal of Cell Science. 2007;120(5):876–884.
- 40. Warne DJ, Baker RE, Simpson MJ. Using experimental data and information criteria to guide model selection for reaction–diffusion problems in mathematical biology. Bulletin of Mathematical Biology. 2019;81(6):1760–1804.
- 41. Banks HT, Joyner ML. AIC under the framework of least squares estimation. Applied Mathematics Letters. 2017;74:33–45.
- 42. Sherratt JA, Murray JD, Clarke BC. Models of epidermal wound healing. Proceedings of the Royal Society of London Series B: Biological Sciences. 1990;241(1300):29–36.
- 43. Sengers BG, Please CP, Oreffo ROC. Experimental characterization and computational modelling of two-dimensional cell spreading for skeletal regeneration. Journal of the Royal Society, Interface. 2007;4(17):1107–1117.
- 44. Lagergren J, Reeder A, Hamilton F, Smith RC, Flores KB. Forecasting and uncertainty quantification using a hybrid of mechanistic and non-mechanistic models for an age-structured population model. Bulletin of Mathematical Biology. 2018;80(6):1578–1595.
- 45. Adoteye K, Baraldi R, Flores K, Nardini J, Banks HT, Thompson WC. Correlation of parameter estimators for models admitting multiple parameterizations. International Journal of Pure and Applied Mathematics. 2015;105(3):497–522.
- 46. Buhlmann P. Bagging, boosting and ensemble methods. Handbook of Computational Statistics. 2012;
- 47. Theodoridis S, Koutroumbas K. Chapter 5—Feature selection. In: Theodoridis S, Koutroumbas K, editors. Pattern Recognition (Fourth Edition). fourth edition ed. Boston: Academic Press; 2009. p. 261–322. Available from: http://www.sciencedirect.com/science/article/pii/B9781597492720500074.
- 48. Stone MH. The generalized Weierstrass approximation theorem. Mathematics Magazine. 1948;21(5):237.
- 49. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv:14126980 [cs]. 2017;.
- 50. Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP. On large-batch training for deep learning: generalization gap and sharp minima. arXiv:160904836 [cs, math]. 2017;.
- 51. Kurganov A, Tadmor E. New high-resolution central schemes for nonlinear conservation laws and convection–diffusion equations. Journal of Computational Physics. 2000;160(1):241–282.