Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Bayesian function registration with random truncation

  • Yi Lu,

    Roles Conceptualization, Formal analysis, Methodology, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Mathematics and Computer Science Department, Drew University, Madison, New Jersey, United States of America

  • Radu Herbei ,

    Contributed equally to this work with: Radu Herbei, Sebastian Kurtek

    Roles Conceptualization, Formal analysis, Methodology, Writing – review & editing

    Affiliation Department of Statistics, The Ohio State University, Columbus, Ohio, United States of America

  • Sebastian Kurtek

    Contributed equally to this work with: Radu Herbei, Sebastian Kurtek

    Roles Conceptualization, Formal analysis, Methodology, Writing – review & editing

    kurtek.1@stat.osu.edu

    Affiliation Department of Statistics, The Ohio State University, Columbus, Ohio, United States of America

Abstract

In this work, we develop a new set of Bayesian models to perform registration of real-valued functions. A Gaussian process prior is assigned to the parameter space of time warping functions, and a Markov chain Monte Carlo (MCMC) algorithm is utilized to explore the posterior distribution. While the proposed model can be defined on the infinite-dimensional function space in theory, dimension reduction is needed in practice because one cannot store an infinite-dimensional function on the computer. Existing Bayesian models often rely on some pre-specified, fixed truncation rule to achieve dimension reduction, either by fixing the grid size or the number of basis functions used to represent a functional object. In comparison, the new models in this paper randomize the truncation rule. Benefits of the new models include the ability to make inference on the smoothness of the functional parameters, a data-informative feature of the truncation rule, and the flexibility to control the amount of shape-alteration in the registration process. For instance, using both simulated and real data, we show that when the observed functions exhibit more local features, the posterior distribution on the warping functions automatically concentrates on a larger number of basis functions. Supporting materials including code and data to perform registration and reproduce some of the results presented herein are available online.

Introduction

Advances in data collection technology have made functional data prevalent in various applied domains including biology, biometrics, medicine, computer vision, bioinformatics, and many others. This, in turn, has prompted rapid development of functional data analysis (FDA) methods for estimation, alignment, summarization, and statistical modeling (and inference) for such data. In this work, we specifically focus on the problem of Bayesian model-based alignment of two or more functions, termed pairwise and multiple registration, respectively. In particular, we elucidate the challenges arising from nonlinearity and infinite-dimensionality of the representation spaces on which observation and prior models must be defined.

We consider the task of registration of real-valued functions defined on a subinterval of the real line. The goal of registration is to separate two sources of variability in functional data termed amplitude (y-axis variation) and phase (x-axis variation or time warping), and our aim is to temporally align (or warp) a set of functions, such that the amplitude variation in the observed data is comparable (i.e., amplitude features like local extrema occur at the same time along the x-axis, across all functions). The phase variation is then captured by a set of time warping functions that achieve the alignment. Formal definitions of amplitude, phase and the registration problem that are considered in this work are provided in subsequent sections.

Real data examples of the types of functions we consider are plotted in Fig 1. To motivate the problem of function registration, we consider the widely-used Berkeley growth study dataset [1]. The data is comprised of height measurements for boys and girls recorded from age 1 to age 18. To discover patterns of growth such as growth spurts, it is often preferable to analyze growth rate functions, i.e., the time derivative of the height measurement functions, since periods of fast/slow growth result in local extrema. The growth rate functions for 54 girls are displayed in the third panel, top row of Fig 1. It is clear that the functions have very similar shapes and exhibit one peak near the middle of the domain, corresponding to the pubertal growth spurt. However, the pubertal growth spurt does not occur at the same time across all growth rate functions since different children go through puberty at different times. Thus, it becomes necessary to register the growth rate functions prior to statistical analysis such that the amplitude variability (magnitude of pubertal growth spurts) and phase variability (timing of pubertal growth spurts) are separated.

thumbnail
Fig 1. Examples of the types of functions we consider.

The value C represents the number of functions in each dataset.

https://doi.org/10.1371/journal.pone.0287734.g001

To enhance statistical analysis, registration of functional data has been utilized in a wide range of applications including biomechanical data [24], handwriting samples [5, 6], gene expression and proteomics data [7, 8], neural spike trains [9], and gait data [10, 11]. Traditionally, function registration is formulated as an optimization problem under a specific optimality criterion, preferably a metric. We refer readers to standard FDA textbooks (e.g., [12, 13]) for an overview of the many approaches that have been proposed. Recently, model-based or probabilistic frameworks have become popular in formulating the registration problem. Specifically, the Bayesian modeling paradigm provides considerable flexibility as it allows the user to specify a prior distribution over the phase parameter space. Additionally, it yields a principled approach for a more comprehensive exploration of the phase parameter space and provides quantified uncertainty measures via the posterior distribution.

The literature includes multiple different Bayesian formulations of function registration. Telesca and Inoue [7] use the B-spline basis to model functional parameters; Claeskens et al. [14] decompose time warping functions into so-called warping component functions or warplets. Cheng et al. [15], Bharath and Kurtek [16], and Matuk et al. [17] use Dirichlet priors for the time (increments of) a warping function. Finally, Earls and Hooker [18], Kurtek [19], and Lu et al. [11] use Gaussian process priors on a transformed warping parameter space. We refer the reader to Matuk et al. [20] for a general overview of Bayesian registration methods. As evident, the main differences among the aforementioned methods are in the specification of the observation model, the prior model on phase, and the algorithms used for parameter space exploration; we discuss benefits and drawbacks of the different choices later as we introduce the proposed framework. Some of the methods (e.g., [16]) are additionally able to incorporate information on landmarks (predetermined, user-defined points of interest) into the registration problem.

This work extends and improves the Bayesian models proposed in [11] to enable full exploration of the functional parameter space. In [11], the authors specify a Gaussian process prior over the infinite-dimensional phase parameter space and represent time warping functions using a sequence of basis functions. However, at the implementation stage, the dimension of the parameter space is reduced by choosing a fixed number of basis functions. Thus, the resulting model is not truly infinite-dimensional and is able to explore only a small subset of the underlying parameter space. In practice, the main disadvantage of such a formulation is that the dimension reduction is performed a priori and is thus not informed by the data. Furthermore, it is generally not obvious how many basis functions are needed to achieve satisfactory registration results.

To remedy these issues, instead of using a fixed truncation, we allow the truncation to be random. This is done by randomizing either (i) the number of basis functions or (ii) which basis functions are used. We incorporate this random truncation as a separate parameter, leading to nonparametric, infinite-dimensional models in the sense that the prior distributions are assigned on the entire functional phase parameter space. The proposed models with random truncation have three advantages. First, the level of smoothness of the functional phase parameter is informed by the data, thus avoiding potential mis-specifications of the number of basis functions. As we will show, under-specification of the number of basis functions can lead to poor registration results. Second, one can flexibly incorporate prior beliefs or desired constraints on the shape of the functional phase parameter. For instance, how much shape-alteration occurs in the observed functions can be controlled by the prior on the number of basis functions. Third, our model allows one to make inference on the random truncation parameter, which can provide additional information about the shapes of the functions in the data. For instance, the posterior tends to keep a larger number of basis functions for the phase parameter when the observed functions exhibit a lot of local features that must be registered. Following ideas from [11, 21, 22], we develop algorithms that allow efficient sampling from the posterior distribution of both the functional parameter and the random truncation parameter. A key designing principle for our model is to treat the phase parameter space as infinite-dimensional and to allow the data to dictate the amount of dimension reduction that is needed.

The rest of the paper is organized as follows. We first introduce the statistical problem of function registration, focusing on relevant function spaces, in Section Problem Formulation and Function Spaces of Interest. The proposed Bayesian registration models are formally specified in Sections Pairwise Bayesian Registration Model with Random Basis Truncation and Multiple Function Bayesian Registration Model with Random Basis Truncation. In Sections Simulation Study and Applications, we demonstrate the proposed method on a pairwise simulation study and several real datasets.

Problem formulation and function spaces of interest

We formulate the task of function registration as a statistical problem by defining (i) the data and the observation space, and (ii) the parameter(s) and their corresponding representation spaces. As will be seen, it is essential to transform both the observation space and the parameter space. The transformations we adopt here are developed in [23, 24] in the context of function registration and have been utilized for registration models in a host of recent manuscripts, including [11, 1719, 25, 26]. A comprehensive discussion of these transformations can be found in [13]. Here, for brevity, we will only introduce the relevant notation and briefly state the transformations. We refer the reader to the aforementioned references for more details.

We start with the simpler case of pairwise registration, where two real valued functions, f1 and f2, are observed. Without loss of generality, we assume that the domain on which the functions are observed is [0, 1]. In this scenario, these two functions are regarded as data with the corresponding observation space . Suppose our goal is to register f2 to f1. This is achieved by finding a warping function γ such that f2γ and f1 are aligned. The role of γ is to warp the domain of f2 so that the amplitude variation of f2 is retained, but its phase variation is altered (ideally to match that of f1). The amount of alteration, which quantifies the difference in phase variation between f1 and f2, is captured by γ. The warping function γ is regarded as the parameter with the corresponding parameter space Γ = {γ : [0, 1] ↦ [0, 1] | γ(0) = 0, γ(1) = 1, 0 < γ′ < ∞}. In the next two paragraphs, we separately describe the transformations carried out on the (1) observation (data) space, and (2) parameter space. More details on these transformations can be found in the S1 File.

Observation (data) space. For , we use the square-root velocity transformation (1) where f′ is the derivative of f. The resulting function, denoted by q for simplicity and referred to as the square-root velocity function (SRVF), is an element of the transformed observation space , which is a subset of [23]. The mapping Q is bijective up to a translation and can be recovered from using . The SRVF of a time warped function, , is given by (2) Note that this is not the same as function composition qγ, because of the additional term ; for brevity, we denote this quantity by (q, γ).

Parameter space. For γ ∈ Γ, we apply two transformations: (3) (4) The first transformation is the square-root velocity transformation in Eq (1) (note that γ′(t) > 0 ∀ t). The second transformation is the inverse exponential map for a unit sphere that allows us to linearize the space Ψ, which is a transformed representation space of the warping functions. The resulting function, denoted by g, belongs to a subset of a linear space, defined as A ≡ {gT1(Ψ)|exp1(g) > 0} (the notation T1(Ψ) refers to the tangent space of Ψ at the function 1; see S1 File for details). We can transform g back to γ via where (‖⋅‖ is the norm). The warping of an SRVF, (q, γ), can now be written in terms of the function g, which lies in a linear space, via (5)

In summary, our approach is to perform statistical inference on the parameter gA, using the (SRVFs of) the observed data . Generalization to multiple function registration is straightforward. Suppose we observe C > 2 functions, denoted by f1, …, fC. The goal is to register them simultaneously to a template function, f*. In some applications, we can pre-specify a known function as the template (common choices include one of the observed functions or their point-wise mean). Alternatively, we can treat f* as another unknown parameter to be estimated. Registration is achieved via estimation of the warping functions, γi, i = 1, …, C, corresponding to each observed function. After applying the same transformations, we treat as data and g1, …, gCA (the warping functions) and (the template function) as parameters.

Pairwise Bayesian registration model with random basis truncation

In the case of pairwise registration, the data consists of two functions f1 and f2, represented via their SRVFs q1 and q2, observed on a finite grid of size N denoted by [t] = {t1, …, tN}. We use the notation [t] to denote discretization of the domain [0, 1] throughout the rest of the paper. Thus, f([t]) denotes evaluations of the function f at the domain points [t]; similarly, denotes the N-dimensional vector . We model the difference between q1([t]) and (q2, γ)([t]) by a zero-mean N-dimensional Gaussian distribution. The main parameter of interest is the warping function γ ∈ Γ represented via gA. At the implementation stage, dimension reduction is necessary, and this is achieved by an auxiliary variable T. Specifically, we first represent g by an infinite sum using basis functions. Then, we use the random variable T to truncate the infinite sum to a finite sum, which can be evaluated on a computer. The truncated version of g will be henceforth denoted by .

The pair (g, T) fully specifies and we specify a prior distribution for by assigning a joint prior probability model for the pair (g, T). To that end, we use a Gaussian process to model g and a general distribution τT that does not depend on g to model T. We further ensure that the pair (g, T) results in a valid warping function by restricting the joint prior to the domain . The full model is given below.

Model 1. where is a function of g and T, is a covariance operator for the Gaussian process prior, τT is a prior distribution for T, {⋅}B denotes the truncation of the joint prior distribution to the set B, IG(⋅, ⋅) is the inverse-gamma distribution, and a and b are fixed constants.

The likelihood function is then given by (6) where (7) This likelihood is identical to that of [11, 19], except that the truncated parameter replaces the parameter g.

Random truncation mechanisms

We now discuss the following two mechanisms for the prior distribution of the random truncation T, which is key to the proposed approach.

(1) Random Number of Basis Functions. In the first scenario, consider TM, where M is the number of basis functions used to represent the parameter g. In this case, πTπM is a prior distribution on the set of positive integers and , where {bi}i ≥ 1 forms an orthonormal basis for T1(Ψ).

(2) Random Indicators. In the second scenario, we can randomly switch a basis function on and off (called a sieve prior in [22]). This is done via the random sequence {χi}i = 1, …, ∞, χi ∈ {0, 1}; we refer to this sequence simply as χ. This sequence controls which basis functions are kept in the basis expansion of g. In other words, Tχ and is calculated as . Let Mmax be the maximum number of basis functions stored at the implementation stage, and let Mon be the number of active basis functions (i.e., ). The domain of χ, denoted , is the collection of vectors of the form , and πTπχ is a prior on .

Posterior distribution and sampling via Markov chain Monte Carlo

The posterior distribution is a probability measure μ on the product space and is dominated by the prior measure μ0. The Radon-Nikodym derivative is given by Bayes’ formula: . We use a Metropolis-within-Gibbs algorithm to sample from the posterior distribution of . To update the posterior at each step, the algorithm iteratively draws from (i) the full conditional distribution of (g, T), and (ii) the full conditional distribution of . Details are given as follows.

Sampling of (g,T). We first update g. For this purpose, we use a Z-mixture pCN proposal (see [11, 22] for details) by setting , where g is the current value, ξ is a draw from the Gaussian process prior, and βz ∈ (0, 1) is a tuning parameter drawn with probability pz satisfying . As an example, in the case of a 2-mixture pCN proposal, we can draw β1 = 0.5 with probability 0.8 and β2 = 0.001 with probability 0.2, resulting in “big jump proposals” approximately 80% of the time and “very small jump proposals” approximately 20% of the time. Sampling a new function ξ from is done via the Karhunen-Loève expansion. We specify the covariance operator by its eigenpairs , where {bi(⋅)} forms an orthonormal basis for T1(Ψ) and the sequence of coefficients satisfy . While theoretically i = 1, …, ∞, at the implementation stage, we store a large number Mmax of basis functions. We thus sample independent random variables . Then, if truncation mechanism (1) is used, or if truncation mechanism (2) is used.

We then independently generate a proposal, T′, according to a density , which depends on the truncation mechanism. The pair (g′, T′) is accepted with probability 1 ∧ ρ, where (8) and πT is the density function (with respect to the Lebesgue measure) of τT. This acceptance ratio is intuitive if distributions of g have densities with respect to the Lebesgue measure. In that case, the form of ρ follows directly from the fact that is symmetric in g and g′ ( is the pCN proposal described earlier). Since a dominating Lebesgue measure does not exist on T1(Ψ), we can derive the acceptance ratio formally using the dominating prior Gaussian measure (given in the S1 File).

If truncation mechanism (1) is used, , and we can use a K-step random walk proposal of the form where . In other words, M′ can either stay at the current value, with probability p0, or move forward or backward up to K steps. This is a symmetric proposal and the Metropolis Hastings (MH) acceptance ratio in Eq (8) simplifies to (9)

If the truncation mechanism (2) is used, then , and we can use an on-or-off proposal, where χ′ is proposed by either switching on a nonactive basis function, with probability 0.5, or switching off an active basis function, again with probability 0.5. If all of the basis functions are currently on, i.e., Mon = Mmax, we switch one of them off with probability 1. On the other hand, if only one basis function is on, i.e., Mon = 1, we switch on another basis function with probability 1. This is the form of proposal suggested in [22]. It is not a symmetric proposal and the MH acceptance ratio in Eq (8) can be written as (10) where The derivation of aχ,χ is given in the S1 File.

Alternatively, one can consider a symmetric, choose-k proposal, where χ′ is proposed via the following steps: (i) sample k ∈ {1, 2, …, KMmax} (for simplicity, we set the probability for each value of k to be the same, but this is not required by the algorithm), (ii) randomly select k entries from the vector , and (iii) propose χ′ by switching (on to off, off to on) all of the selected entries. This proposal is symmetric. For example, if χ = (0, 0, 1, 1) and χ′ = (0, 1, 0, 1), then . As a result, the MH acceptance ratio for this proposal is the same as the one given in Eq (10) with aχ,χ = 1.

Sampling of . To update , we draw directly from the conjugate inverse-gamma distribution with shape parameter and scale parameter .

Multiple function Bayesian registration model with random basis truncation

The model for multiple function registration is a direct extension of Model (1). Based on the observed functions f1, …, fC, represented via the SRVFs q1, …, qC, we aim to make inference on , which are determined by the pairs (g1, T1)…, (gC, TC). In addition, we treat the template function q* as a parameter and consider the same truncation mechanisms as described in the pairwise case. The truncated template function is denoted by and is determined by the pair (q*, Tq). We assign a Gaussian process prior to q* and a prior to the random truncation Tq.

Model 2.

The likelihood function is then given by (11) where (12)

Sampling from the posterior distribution is performed in the same fashion as in the pairwise case using a Metropolis-within-Gibbs algorithm. At each step, the algorithm first updates each of the truncated warping parameters (gi, Ti), i = 1, …, C sequentially via a Metropolis step. The acceptance ratio for updating any of the pairs (gi, Ti) takes the same form and, for (g1, T1), is given by

The algorithm then updates the template (q*, Tq) with the acceptance ratio Note that, for simplicity, we use the same prior πT and proposal for each Ti, i = 1, …, C and Tq, but this is not required. Lastly, the algorithm updates by drawing directly from an inverse-gamma distribution with shape parameter and scale parameter .

Simulation study

We first assess the performance of the proposed pairwise Bayesian registration model via a simulation study. We simulate two observed functions, f1 and f2, both of which are warped versions of a template function, f(t) = sin(4πt2) [2]. The two observed functions are constructed such that f1 = f2γtrue where the true warping γtrue is randomly generated. We then apply our model to register f2 to f1 to obtain γest, an estimate of the true warping, and compare γest to γtrue. We use five sets of true warping functions, which are shown in Fig 2. Each set contains ten warpings and has varying degrees of local features: set (2) shown in the second panel in the top row contains warping functions that are very smooth with no “small wiggles,” whereas set (5) (fifth panel in the top row) contains warping functions that are “very wiggly.” Set (1) contains piecewise linear warping functions. We compare the proposed method to the model in [11], with the recommended setting of using 20 basis functions (we refer to this model as M20).

thumbnail
Fig 2. Pairwise simulation results.

Top: Five sets of ten true warping functions with different amount of local features. Bottom: FR distance-based performance of different models for estimating the true warping functions in each set; smaller distance indicates better performance.

https://doi.org/10.1371/journal.pone.0287734.g002

Prior Specification

We now discuss different choices for the prior distributions for the pairwise registration Model (1). We use an inverse-gamma distribution with a = 0.1 and b = 0.1 for . For the Gaussian process prior of the warping parameter g, we must specify the covariance operator by its eigenpairs . We use the Fourier basis functions and set . The constant 1.2 in this expression controls the decay rate of the Fourier series. If it is desirable to put more prior weight on warping functions with many local features, one can choose a smaller constant so that the higher frequency Fourier basis elements are weighted more (this constant should be greater than 1 to satisfy ). For the random truncation T, we use a few different priors based on which truncation mechanism is used, as described below.

Truncation mechanism (1): There are multiple possible prior choices for the number of basis functions. First, we consider three Poisson priors, truncated to the domain [1, Mmax = 200], with means equal to 20 (pois20), 50 (pois50) and 80 (pois80). These priors reflect beliefs about the smoothness of the warping functions. For instance, if the prior belief is that the warping function is relatively smooth and the phase variation should only relate to the general shape of the observed functions, a prior with a smaller mean should be chosen. Second, we consider two discrete uniform prior distributions on [30, Mmax = 200] (caplow) and [1, 50] (caphigh), respectively. These two priors enforce a maximum (minimum) level of smoothness of the warping functions via the restriction that M ≥ 30 (M ≤ 50), but are non-informative in the sense that any M between 30 and Mmax = 200 (1 and 50) is equally likely.

Truncation mechanism (2): We choose a uniform prior distribution on the sequence of on-off switches for the basis functions (indicator). This prior is non-informative on the shape of the warping functions since any of the basis elements in the sequence are equally likely to be switched on or off.

The flexibility of prior choices for the random truncation is an important benefit of the proposed model, in comparison to existing models with fixed truncation. In practice, if one wishes to register the general shapes of the functions without altering small, local features, a prior that puts more weight on smoother functions (i.e., truncation mechanism (1) with Poisson with a small mean or a uniform with a small upper bound) should be chosen. If one has no strong prior opinion and wants the posterior to be mostly informed by the data, a less informative prior (i.e., truncation mechanism (2) with uniform for the on-off indicators) should be chosen. At the same time, the model is robust to the prior choices of the decay rate of the Fourier basis and the model variance (see the S1 File for a sensitivity analysis).

Implementation details

At the implementation stage, both the observed functions and the parameters are stored on a grid of size 200, which is the same as Mmax, the number of stored basis functions. We first perform pairwise registration for each set of warping functions using the deterministic Dynamic Programming (DP) algorithm, which is implemented in the R package fdasrvf [27]. We use the DP estimate to initialize the MCMC sampling algorithm for the proposed Bayesian model. We note that, while DP offers a good starting point and allows the chain to mix faster, the performance of the MCMC algorithm is independent of the starting point in the long run. Examples with different starting points are included in the S1 File. For the warping parameter g, we use a 3-mixture pCN proposal with jump sizes βz = (0.5, 0.05, 0.0001) and corresponding proposal probabilities pz = (0.3, 0.3, 0.4). For truncation mechanism (1), the proposal for M is a 10-step random walk where M′ stays at the current value or moves forward or backward by up to 10 steps (with the probability for each move (p0, p1, …, p10) ∝ (1, 0.5, 0.445, …, 0.001), where 0.5, 0.445, …, 0.001 is an equally spaced decreasing sequence of length 10). For truncation mechanism (2), the proposed state χ′ is generated by switching up to five randomly selected indicators. We use the choose-k proposal because we notice that it tends to have faster convergence than the on-or-off alternative.

Results

To evaluate performance, we calculate the Fisher-Rao distance between γtrue and γest [24]: (13) where the posterior mean is used to construct γest for the Bayesian models. The results are shown in Fig 2. Importantly, we see that the M20 model proposed in [11] does not perform well compared to the proposed random truncation or random indicator models when the true warping functions have many local features. While M can be fixed to a larger value, our model has the advantage that the value of M does not need to be decided a priori and, instead, is informed by the data. Comparing to DP, we notice that when γtrue has fewer local features (sets (2) and (3)), the proposed Bayesian models perform better irrespective of the prior distribution for the random truncation T. When γtrue has more local features (sets (4) and (5)), models pois20 and caphigh perform worse, as expected, since they impose strong prior constraints on the maximum smoothness of the warping function. For the set of linear functions, DP performs better than the Bayesian models. This is not surprising since DP performs registration by solving a minimization problem in a piecewise linear fashion. On the other hand, the Fourier basis used for the Bayesian models are not piecewise linear. We also note that, while we compare DP with the Bayesian methods numerically using the FR distance, they are fundamentally different approaches. DP is optimization-based and the algorithm is very fast, but it has some known limitations: (1) it is not easy to enforce restrictions on the parameter space, (2) while regularization can be imposed by adding a term to the cost function, the choice of the regularization parameter is difficult in practice, (3) performance of the algorithm heavily depends on the discretization and neighborhood size settings, and (4) there is no prescribed approach for uncertainty quantification. In comparison, model-based approaches are more flexible and provide a more principled exploration of the parameter space.

We also examine the posterior means of the number of active basis functions in Fig 3. Based on this result, we highlight two additional advantages of the proposed method compared to the model in [11]. First, inference on the level of smoothness of the warping function is clearly informed by the underlying data. We see that when the true warping functions, and consequently the observed functions, have more local features, the (posterior mean) number of active basis functions is larger regardless of the prior chosen for the random truncation T. Note that the set of linear functions requires more basis functions due to the sharp turns at the break points. Second, one can flexibly incorporate constraints on the level of smoothness via this prior. For example, the model pois20 generates fewer active basis functions than the model pois80, regardless of the level of smoothness in the observed functions. The model caplow results in estimated warping functions based on at least 30 basis elements even when the true warping function is very smooth. On the other hand, the model caphigh results in estimated warping functions based on at most 50 basis elements even when the true warping function has many local features. Fig 4 shows four examples that compare registration performance between the Bayesian models caplow and caphigh. In each example, we display f2 (grey), f1 = f2γtrue (black), f2γest,caplow (green) and f2γest,caphigh (blue). The estimated warping functions are obtained using the posterior means in each case. It is clear that the model caplow allows for registration of more local features than the model caphigh.

thumbnail
Fig 3. Posterior mean of the number of (active) basis functions for different registration models in the pairwise simulation study.

The five boxplots in each panel correspond to the five sets of true warping functions (top row of Fig 2).

https://doi.org/10.1371/journal.pone.0287734.g003

thumbnail
Fig 4. Four examples of pairwise Bayesian registration with priors caplow and caphigh.

In each panel, we show f1 (black), f2 (grey), f2γest,caplow (green) and f2γest,caplow (blue). The estimated warping functions correspond to posterior means.

https://doi.org/10.1371/journal.pone.0287734.g004

Applications

In this section, we apply the proposed multiple function Bayesian registration model to the six datasets displayed in Fig 1. They arise in five different application domains:

  1. Right knee flexion and pelvis right roll. This data is comprised of two gait variables (measurements taken by markers as participants walk) for C = 12 participants. We obtained the data from the online supplementary material of [28]. Each gait cycle is linearly scaled such that they are observed on the same time interval, i.e., the x-axis can be interpreted as percentage of one gait cycle. For more information on the role of registration in gait cycle analysis see, e.g., [10, 11, 29]
  2. Growth rates. We use the growth rate functions of C = 54 girls from the Berkeley growth study [1], available in the R package fda [30]. This dataset is widely used to assess registration performance.
  3. Pinch force. Each function records the pinch force exerted by the thumb and forefingers during a brief squeeze. These measurements were collected for C = 20 test subjects [2, 12, 31] and the full dataset is available in the R package fda. The starting time of the pinch as well as the time spent to reach the maximum force are different across test subjects, necessitating a registration step to account for this temporal variation prior to further analysis.
  4. Neural spike trains (sequence of electrical pulses sent by the neurons to the brain). The dataset is comprised of C = 10 smoothed neural spike train functions; see [9] for a detailed description. This dataset is analyzed in multiple papers, including [24, 3236], with a particular interest in function registration.
  5. Handwriting samples. We use the x-coordinates of C = 50 replicates (generated by a single person) of handwritten Chinese characters for ‘statistical science’ [6]), available in the R package fda. This dataset is used in [5] as an application of function registration.

Prior specification

The covariance operator in the Gaussian process prior of the warping functions, , and the inverse-gamma prior for the model variance, , are identical to the priors specified in the pairwise registration simulation study. The covariance operator in the Gaussian process prior for the template function, , is specified by the Fourier basis with corresponding eigenvalues , where is fixed based on the scale of the observed functions. Using the Fourier basis to represent the template function is especially suitable when the observed functions are periodic on [0, 1], i.e., the gait cycle variables. For other datasets, we set the Fourier period to 2. As shown in the simulation study, these prior choices yield good registration results across different shapes of observed functions; on the other hand, the model is robust to alternative prior choices for these model parameters.

The random truncation is specified by truncation mechanism (1). For both the warping functions and the template, we use a discrete uniform prior on [5, Mmax = 200] for the number of basis functions M. This prior is non-informative and allows us to evaluate registration performance that is primarily driven by the data. Intuitively, the chosen range, [5, Mmax = 200], ensures that the warping functions have a minimum level of complexity (as captured by the first five Fourier basis elements), but are allowed to have as many local features as possible (since the observed functions are evaluated on a discretized grid of size 200). In practice, this is a suitable approach when one does not have strong prior information for the warping parameter and does not wish to restrict the level of shape-alteration during the registration process. For the proposals, we want to have a variety of small and intermediate jump sizes to explore the parameter space thoroughly. To that end, we use a pCN proposal with βz values equally spaced between 0.001 and 0.0001 for the functional parameters and use a 1-step random walk proposal for the number of basis functions. Convergence is visually monitored by checking the trace plots of the log-likelihood. Trace plots for two of the datasets are provided in the S1 File.

Results

For comparison, we also perform registration with the M20 model [11]. Registration results for the six different datasets under consideration are shown in Fig 5. In each panel, we show the registered data (left) with respect to the estimated template (middle). For the estimated template, we also visualize the level of uncertainty (as measured by the posterior pointwise standard deviations, standardized by the scale of the original data; blue = small standard deviation, red = large standard deviation). We see that, when the observed functions have very different shapes (e.g., neural spike trains), the estimated template tends to have more uncertainty. In the right panel, we plot the template function estimated using the M20 model. We see that the proposed method is better at producing a template function that resembles the shape of the original functions. For instance, the template estimated by the M20 model does not have the small wiggles at both ends of the pinch force curves, and the M20 model cannot recover a good template for the handwriting curves. This, again, shows the limitation of the model proposed in [11], where the number of basis functions can be mis-specified. In comparison, the proposed model uses 178 basis functions (posterior average) to estimate the template function of the pinch force dataset and 196 basis functions for the handwriting dataset. When the observed functions are relatively smooth, the posterior of the template function reflects that by using fewer basis functions (e.g., 55 and 33 for the two gait datasets). This shows the data-informative feature of the proposed method.

thumbnail
Fig 5. Registration results for six real datasets.

The original observed functions for each dataset can be found in Fig 1. Here, we display the registered functions (left panel), the estimated template function using the proposed random truncation model (middle panel), with the color corresponding to the pointwise standard deviation (red—larger standard deviation and more uncertainty; blue—smaller standard deviation and less uncertainty), and the estimated template function using the M20 model (right panel). The warping and template functions used to perform registration are estimated using the posterior means.

https://doi.org/10.1371/journal.pone.0287734.g005

For a quantitative assessment of registration results, we report the inverse of pairwise correlation (IPC) [11, 15], calculated using , where r(⋅, ⋅) is the pairwise Pearson’s correlation, f are the observed functions, and are the registered functions. The IPC values corresponding to registration performance of different models are shown in Fig 6. Overall results are comparable across different models when the observed functions are relatively smooth. The M20 model notably does not perform as well when the observed functions have many local features (e.g. pinch force, spike trains, and handwriting data) due to an under-specification of the number of basis functions. Compared to DP, the proposed Bayesian model achieves similar performance for all datasets (in the case of multiple function registration, DP aligns each function to an estimated template function in a pairwise manner; see [24] for details). Another numerical criterion (Sync) based on the distance between the registered functions shows similar results and is discussed in the S1 File.

thumbnail
Fig 6. IPC values for six real datasets after registration.

Smaller value means better alignment. The black line corresponds to the proposed approach (specifically, this is a 95% credible interval of the IPC values constructed using 200 randomly sampled posterior draws); the red round dot corresponds to the M20 model; the blue square corresponds to DP. Since the IPC value for each dataset is plotted on a different scale (and the values are not directly comparable across different datasets), we display the numeric values of the end points as a reference.

https://doi.org/10.1371/journal.pone.0287734.g006

We note that the IPC (and Sync) values are based only on the correlation or the distance between the registered functions and do not take into account how well the shapes of the functions are preserved. It also only accounts for the quality of the template indirectly via the registered functions. As we have shown in Fig 5, the M20 model does not recover the shape of the template function as well as the proposed method. On the other hand, DP sometimes does not preserve the original shapes of the observed functions after registration. For instance, we highlight two neural spike trains in the top row of Fig 7. The original observed curves are shown in the left panel; we see that one of the curves (blue) exhibits more local features than the other (red). Despite a good registration result (middle panel), DP has smoothed out the unmatched local features in the blue curve. In contrast (right panel), the proposed Bayesian model preserves the original shapes of the two curves better. In fact, when the observed functions have different numbers of features, e.g., local extrema, the random truncation component of the Bayesian model enables one to detect this pattern. The two highlighted neural spike trains were identified, because their corresponding warping parameters have the largest posterior means for the number of basis functions. An estimated warping function represented by a large number of basis elements signals that the observed function has been altered a lot in its local features after registration, likely due to some unmatched features being “squeezed,” as evident in this example.

thumbnail
Fig 7. Registration results highlighting two neural spike trains (top) and three growth rate functions (bottom).

https://doi.org/10.1371/journal.pone.0287734.g007

In some applications, it is not desirable to alter the shapes of the observed functions too much. The proposed Bayesian model offers a flexible way to control how much shape alteration is allowed via the prior distribution of the random truncation parameter. Specifically, shape-alteration during the registration process will be limited if the prior puts very small or zero probability on large values of the number of basis functions. We show an example of this for the growth rate functions in the bottom row of Fig 7. We again highlight three observed functions in the left panel. They all exhibit two modes while most of the other functions in the data have only one mode. After registration using DP (middle panel), the bimodal pattern in the highlighted functions is no longer obvious. As a result, one might overlook the fact that these individuals have two growth spurts rather than the more common pattern of one pubertal growth spurt. To limit the level of shape alteration, we apply the proposed Bayesian registration model with a restrictive discrete uniform prior on [1, 10] for random truncation for the warping parameters. In this case, the prior limits the number of basis functions to be at most 10. The corresponding results (right panel) show that the bimodal feature of the highlighted growth rate functions is much better preserved after registration.

Summary

We develop Bayesian models for pairwise and multiple function registration. These models build on existing Bayesian registration techniques which assign Gaussian process priors to the warping function, after a sequence of function space transformations. When building the registration models, the functional parameter is represented via an infinite sequence of basis functions, but at the implementation stage, it is necessary to truncate this sequence. Our main contribution lies in the randomization of this truncation process. This is done by introducing a new random truncation parameter that controls how many or which basis functions are used to represent the functional parameters. The resulting Bayesian models can then explore the full parameter space instead of a small subset of a truncated parameter space.

In practice, there are three main benefits of the proposed method compared to models where the truncation mechanism is fixed. First, the posterior distribution on the truncation parameter is informed by both the data and the prior, and one does not have to choose the truncation a priori, thus avoiding possible mis-specifications of the truncation parameter. Second, one can put restrictions on the registration process by using a restrictive prior for the truncation parameter, which controls how much shape alteration is allowed. For instance, as we have shown in the growth rate example, by limiting the number of basis functions to be at most 10 for the warping function parameter, we retain important features (one or two growth spurts) in the registered growth rate functions. Third, the new models also allow us to make inference on how much truncation has occurred. This can help detect when an observed function has undergone a significant shape change during registration, especially in its local features. We demonstrate the aforementioned advantages of the proposed approach through a simulation study and multiple real datasets.

In addition, the proposed modeling framework is very flexible. In the case of multiple function registration, one can use a different prior for each of the warping functions, corresponding to each of the observed functions, and for the template function, e.g., one can use non-informative priors for the warping functions and a restrictive prior for the template function that would constrain the template to have a smooth shape with limited local features. This enables fine-tuning of the models based on the application of interest.

The Metropolis-within-Gibbs algorithm we use to sample from the posterior distribution is efficient in the sense that the computational cost is largely unaffected by the size of the grid on which the functions are observed. The proposals for the functional and random truncation parameters further allow much flexibility in the jump sizes for exploring the parameter space. On the other hand, a drawback of the algorithm is that it is not informed by the likelihood. As a result, in some applications, convergence can be slow; for example, based on trace plots, the real datasets considered in the Applications section require at least 2 × 105 iterations and can take more than 106 iterations (trace plots for two datasets are given as examples in the S1 File; computation time for registering 10 functions is about 100 minutes per 105 updates). While this is not surprising, because the algorithm is exploring a very large parameter space, a possible future research direction is to design algorithms with faster convergence rates, by using likelihood-informed proposals or adaptive proposals with jump sizes automatically tuned by acceptance rates.

Supporting information

S1 File. Supplementary material.

This pdf file serves as an appendix to the main manuscript and includes: 1) additional derivations; 2) trace plots for two real data examples; 3) Sync values for assessing registration performance for the six real datasets in Section; and 4) a discussion of model sensitivity.

https://doi.org/10.1371/journal.pone.0287734.s001

(PDF)

S2 File. Code and data.

The code folder includes R code to perform pairwise and multiple function registration. Datasets used in the manuscript are also included as .RData files. A readme file is included to provide instructions.

https://doi.org/10.1371/journal.pone.0287734.s002

(ZIP)

References

  1. 1. Tuddenham RD, Snyder MM. Physical Growth of California Boys and Girls from Birth to Eighteen Years. Publications in Child Development at University of California, Berkeley. 1954;1(2):183. pmid:13217130
  2. 2. Ramsay JO, Li X. Curve Registration. Journal of the Royal Statistical Society Series B. 1998;60:351–363.
  3. 3. Ramsay JO, Gribble P, Kurtek S. Analysis of juggling data: Landmark and continuous registration of juggling trajectories. Electronic Journal of Statistics. 2014;8(2):1835–1841.
  4. 4. Ramsay JO, Gribble P, Kurtek S. Description and Processing of Functional Data Arising from Juggling Trajectories. Electronic Journal of Statistics. 2014;8(2):1811–1816.
  5. 5. Kneip A, Li X, MacGibbon KB, Ramsay JO. Curve Registration by Local Regression. The Canadian Journal of Statistics. 2000; p. 19–29.
  6. 6. Ramsay JO. Functional Components of Variation in Handwriting. Journal of the American Statistical Association. 2000;95(449):9–15.
  7. 7. Telesca D, Inoue LYT. Bayesian Hierarchical Curve Registration. Journal of the American Statistical Association. 2008;103(481):328–339.
  8. 8. Koch I, Hoffmann P, Marron JS. Proteomics Profiles from Mass Spectrometry. Electronic Journal of Statistics. 2014;8(2):1703–1713.
  9. 9. Wu W, Hatsopoulos NG, Srivastava A. Introduction to neural spike train data for phase-amplitude analysis. Electronic Journal of Statistics. 2014;8(2):1759–1768.
  10. 10. Helwig NE, Hong S, Hsiao-Wecksler ET, Polk JD. Methods to Temporally Align Gait Cycle Data. Journal of Biomechanics. 2011;44(3):561–566. pmid:20887992
  11. 11. Lu Y, Herbei R, Kurtek S. Bayesian Registration of Functions with a Gaussian Process Prior. Journal of Computational and Graphical Statistics. 2017;26(4):894–904.
  12. 12. Ramsay JO, Silverman BW. Functional Data Analysis. 2nd ed. New York: Springer-Verlag; 2005.
  13. 13. Srivastava A, Klassen EP. Functional and Shape Data Analysis. vol. 1. New York: Springer; 2016.
  14. 14. Claeskens G, Silverman BW, Slaets L. A Multiresolution Approach to Time Warping Achieved by a Bayesian Prior–posterior Transfer Fitting Strategy. Journal of the Royal Statistical Society. 2010;72(5):673–694.
  15. 15. Cheng W, Dryden IL, Huang X. Bayesian Registration of Functions and Curves. Bayesian Anal. 2016;11(2):447–475.
  16. 16. Bharath K, Kurtek S. Distribution on warp maps for alignment of open and closed curves. Journal of the American Statistical Association. 2020;115(531):1378–1392. pmid:34413553
  17. 17. Matuk J, Bharath K, Chkrebtii O, Kurtek S. Bayesian Framework for Simultaneous Registration and Estimation of Noisy, Sparse, and Fragmented Functional Data. Journal of the American Statistical Association. 2021; p. 1–17. pmid:36945325
  18. 18. Earls C, Hooker G. Variational Bayes for Functional Data Registration, Smoothing, and Prediction. Bayesian Analysis. 2017;12(2):557–582.
  19. 19. Kurtek S. A Geometric Approach to Pairwise Bayesian Alignment of Functional Data Using Importance Sampling. Electronic Journal of Statistics. 2017;11(1):502–531.
  20. 20. Matuk J, Herbei R, Kurtek S. Bayesian Registration of Functions. Wiley StatsRef: Statistics Reference Online. 2021; p. 1–15.
  21. 21. Stuart AM. Inverse Problems: a Bayesian Perspective. Acta Numerica. 2010;19:451–559.
  22. 22. Cotter SL, Roberts GO, Stuart AM, White D, et al. MCMC Methods for Functions: Modifying Old Algorithms to Make them Faster. Statistical Science. 2013;28(3):424–446.
  23. 23. Srivastava A, Klassen E, Joshi SH, Jermyn IH. Shape Analysis of Elastic Curves in Euclidean Spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2011a;33(7):1415–1428.
  24. 24. Srivastava A, Wu W, Kurtek S, Klassen E, Marron JS. Registration of Functional Data Using Fisher-Rao Metric. arXiv preprint. 2011b;arXiv:1103.3817.
  25. 25. Kurtek S, Srivastava A, Klassen E, Ding Z. Statistical Modeling of Curves Using Shapes and Related Features. Journal of the American Statistical Association. 2012;107(499):1152–1165.
  26. 26. Srivastava A, Jermyn I, Joshi S; IEEE. Riemannian Analysis of Probability Density Functions with Applications in Vision. IEEE Conference on Computer Vision and Pattern Recognition. 2007; p. 1–8.
  27. 27. Tucker JD. Elastic Functional Data Analysis; 2020. Available from: https://cran.r-project.org/web/packages/fdasrvf/fdasrvf.pdf.
  28. 28. van den Bogert AJ, Geijtenbeek T, Even-Zohar O, Steenbrink F, Hardin EC. A Real-time System for Biomechanical Analysis of Human Movement and Muscle Function. Medical & biological engineering & computing. 2013;51(10):1069–1077. pmid:23884905
  29. 29. Sadeghi H, Allard P, Shafie K, Mathieu PA, Sadeghi S, Prince F, et al. Reduction of Gait Data Variability Using Curve Registration. Gait & posture. 2000;12(3):257–264. pmid:11154937
  30. 30. Ramsay JO, Wickham H, Graves S, Hooker G. Functional Data Analysis; 2014. Available from: https://cran.r-project.org/web/packages/fda/fda.pdf.
  31. 31. Ramsay JO, Wang X, Flanagan R. A Functional Data Analysis of the Pinch Force of Human Fingers. Applied Statistics. 1995; p. 17–30.
  32. 32. Patriarca M, Sangalli LM, Secchi P, Vantini S. Analysis of spike train data: An application of k-mean alignment. Electronic Journal of Statistics. 2014;8(2):1769–1775.
  33. 33. Wu W, Srivastava A. Analysis of spike train data: Alignment and comparisons using the extended Fisher-Rao metric. Electronic Journal of Statistics. 2014;8(2):1776–1785.
  34. 34. Cheng W, Dryden IL, Hitchcock DB, Le H. Analysis of spike train data: Classification and Bayesian alignment. Electronic Journal of Statistics. 2014;8(2):1786–1792.
  35. 35. Lu X, Marron JS. Analysis of spike train data: Comparison between the real and the simulated data. Electronic Journal of Statistics. 2014;8(2):1793–1796.
  36. 36. Hadjipantelis PZ, Aston JAD, Müller HG, Moriarty J. Analysis of spike train data: A multivariate mixed effects model for phase and amplitude. Electronic Journal of Statistics. 2014;8(2):1797–1807.