Functional principal component analysis for identifying multivariate patterns and archetypes of growth, and their association with long-term cognitive development

For longitudinal studies with multivariate observations, we propose statistical methods to identify clusters of archetypal subjects by using techniques from functional data analysis and to relate longitudinal patterns to outcomes. We demonstrate how this approach can be applied to examine associations between multiple time-varying exposures and subsequent health outcomes, where the former are recorded sparsely and irregularly in time, with emphasis on the utility of multiple longitudinal observations in the framework of dimension reduction techniques. In applications to children’s growth data, we investigate archetypes of infant growth patterns and identify subgroups that are related to cognitive development in childhood. Specifically, “Stunting” and “Faltering” time-dynamic patterns of head circumference, body length and weight in the first 12 months are associated with lower levels of long-term cognitive development in comparison to “Generally Large” and “Catch-up” growth. Our findings provide evidence for the statistical association between multivariate growth patterns in infancy and long-term cognitive development.


Objective of the study
The link between deficient growth in infancy and later life cognitive performance degradation has been widely accepted [1][2][3]. Stunting and faltering during infancy, or early childhood, are associated with reduced cognitive ability in later age performance [4,5], and these growth patterns have been the subject of extensive investigation [6][7][8][9]. In most of the previous work, investigators have studied this association by examining single growth indicators, for example head circumference [10] or body weight [11]. In particular, [12]  development of children in Vietnam is associated with pre-defined growth features at age 1 year. While features such as stunting, underweight, wasting and small head circumference were examined, the previous analysis was based on one growth modality, for example body length or weight. We propose a straightforward way to combine multiple growth indicators under a single framework. Our approach provides a comprehensive assessment of the potential risk in terms of cognitive development using longitudinal information from growth patterns of several growth modalities. Specifically, we demonstrate a data analysis procedure that combines measurements from three commonly recorded time-varying growth traits, head circumference, body length and body weight. We then identify growth patterns that can be associated with subsequently measured full-scale IQ (Wechsler abbreviated scale of intelligence, WASI). The simultaneous consideration of multiple trajectories is a main novel feature of our approach.
We also devise a simple method to learn archetypal growth patterns from data and examine their association with subsequent IQ outcomes. Our methods assess multiple growth indicators nonparametrically without the need of prior growth charts [13]. Using a functional data analysis (FDA) framework [14,15], the proposed methodology combines multiple growth indicators and identifies data-driven clusters of infants according to their growth profiles. Recently, [16] and [17] considered quantile contour estimation of functional principal components (FPC) with emphasis on analysis for growth curves, but only with a single growth trajectory sample. In a related approach, [18] focused on finding subjective-specific warping functions to extract common features among multivariate growth traits. In contrast to existing approaches, we profile multiple growth patterns in terms of archetypal analysis [19][20][21], where we implicitly assume that extreme growth patterns can be used to represent individual growth curves in the sample through convex combination.
Our findings from applying the proposed methodology to the PROBIT growth study cohorts [22,23] suggest that the proposed methodology is capable of identifying infant subgroups that differ in a statistically significant way in terms of the average level of associated IQs and thus can serve as a useful tool for identifying subgroups at risk of impaired cognitive development.

Data description
The data were collected as part of WHO's Promotion of Breastfeeding Intervention Trial (PROBIT) in the Republic of Belarus [22,23]. They include growth measurements taken during the infancy of full term babies who weighed at least 2.5kg at birth among 17,045 total subjects. The physical traits recorded are head circumference, body length and body weight which were measured six times during the first year after birth. However, the sampling schedules were not strictly followed and some children did not have a full set of six measurements, the data are best characterized as irregularly sampled longitudinal observations and see Fig 1. For example, about 5.5% of the children have less than six measurements. For children with complete records, average measurement times were approximately 1.05 (0.12), 2.05 (0.13), 3.11 (0.24), 6.12 (0.31), 9.12 (0.34) and 12.10 (0.21) months after birth, where standard deviations at each visit are given in parentheses. We also refer to the design plot for measurement times during the first year in [13], which demonstrates the extent of sparsity and irregularity of the observation schedules over the first year after birth. The cognitive ability of children was assessed using the WASI score of the full-scale IQ measured at 6.5 years.
In addition to time-varying growth traits, a multitude of demographic covariates were also recorded. These covariates were used in later stages to control for potential confounding effects including socio-demographic factors. For example, children whose parents attended university have higher IQ measurements on average than children whose parents did not to complete high-school education. These covariates are known to affect later age IQ throughout a child's infancy [24][25][26][27]. Other covariates included the sex of the child, maternal and paternal education levels, maternal and paternal age-at-birth, maternal smoking during pregnancy, duration of exclusive breast feeding and the hospital where the child was born. The sample size after preprocessing was n = 12,809 children, whose data were analyzed in our study. For details of data records and preprocessing, we refer to [13] and [22].

Functional principal component analysis
Let X i (t) be a realization of a time-varying trait X(t) for the i-th subject, 1 � i � n, at each time point t 2 T . Assuming independent measurements between subjects and that the X i (t) have smooth trajectories over time t, we apply functional principal component analysis (FPCA) to decompose patterns of temporal variation. A basic feature of FPCA is that the time-varying trait of the i-th subject admits the Karhunen-Loève expansion [28][29][30] where μ(t) = EX(t), and the ξ ik are uncorrelated random variables with mean zero and variance λ k satisfying λ 1 � λ 2 � � � �. Here in Eq (1), the ξ ik are the k-th FPC scores of the i-th subject, associated with the eigenfunction ϕ k for all k � 1. For theoretical background on FPCA and related techniques, see [15,[31][32][33][34][35][36].
In longitudinal studies, however, measurements of time-varying traits are only available at N i successive time points, say t i1 < � � � < t iN i , for the i-th subject. We note that the set of time points ft i1 ; . . . ; t iN i g may differ among the n subjects and N i may be small. FPCA for longitudinal data has been widely investigated [14,[37][38][39][40]. Specifically, [41,42] proposed a technique to perform FPCA for sparse longitudinal data, based on principal components analysis through a conditional expectation (PACE) scheme. Specifically, we consider sparse and noisy longitudinal observationsX ij ¼ X i ðt ij Þ þ � ij , instead of continuous and unperturbed observations of time-varying traits X i (t), where the � ij are independent mean zero measurement errors. By assuming that ξ ik and � ik follow a joint normal distribution, the best linear predictors of the FPC scores ξ ik are given byx . . . ;mðt iN i ÞÞ > are the estimates of mean vectors of EX i , andŜX i is the estimated N i × N i variance-covariance matrix of SX i with (j, ℓ)-elements given by CovðX ij ;X i' Þ. Also, ðl k ;� k ðtÞÞ, k � 1, are pairs of estimators for eigenvalues and eigenfunctions, which are the solutions of the following equations with respect to (λ k , ϕ k (t)), Z T Gðs; tÞ� k ðsÞ ds ¼ l k � k ðtÞ ðk ¼ 1; 2; . . .Þ; subject to l k � l kþ1 and where G(s, t) = Cov(X(s), X(t)) is the auto-covariance function of X, so that we may writê [43] and [15] for comprehensive overviews on FDA and recent developments in the interface between FPCA and longitudinal data. Once we have estimated eigenfunctions� k ðtÞ through the PACE method in Eq (2), longitudinal patterns ofX i can be summarized by the corresponding FPC scoresx ik . In fact, unobserved time-varying traits X i (t) can be reconstructed asX i ðtÞ ¼mðtÞ þ P K k¼1xik�k ðtÞ, followed by the representation in Eqs (1) and (3) with a cut-off value K � 1. The truncation point K can be chosen as the smallest value satisfying P K k¼1lk = P '�1l' � k for a given 0 < κ < 1, so that a fraction κ of variance is explained (FVE), see [15,30]. The infinite dimensional functionsX i will then be represented by K-vectors ðx i1 ; . . . ;x iK Þ > , which provides the required dimension reduction.

Identification of outlying subjects
To study archetypes in the multivariate data analysis framework, we cluster longitudinal obser-vationsX i into subgroups based on trajectory patterns of reconstructed time-varying traits X i (t). Time-varying traits are recovered by the first few FPC scores with high fraction of variance explained (FVE). In practice, the first two FPC scores produce relatively clear discrimination of the data characteristics in many sparse and irregular longitudinal studies [41,42,44,45]. As an exploratory illustration tool for outlier detection in multivariate data analysis, the bagplot [46] was introduced as a generalization of the univariate boxplot. In the bagplot, halfspace location depth [47] is usually adopted so that the multivariate data points are ordered by an extended notion of univariate rank. The halfspace location depth Dðx; X n Þ of a point is defined by the smallest number of ξ i contained in any halfspace with boundary line passing x. Then, data points can be ordered by depth, that is Dðξ i ; X n Þ � Dðξ i 0 ; X n Þ, 1 � i 6 ¼ i 0 � n. For modern concepts and related work on statistical depth, see also [48] and [49].
In this study, for the purpose of providing flexible inference based on sparse and irregularly observed functional and longitudinal data, we utilize the highest density region (HDR) as in [50] and [51]. We consider the (1 − α)-HDR for the K-variate data defined by where f ξ is the joint density of a random vector ξ = (ξ 1 , . . ., ξ K ) > and in Eqs (4) and (5), respectively. Taking α = 0.05 yields a support region where observations are expected to fall with at least 95% probability. We also note that the HDR captures the nature of the distribution of the data like location, scale, correlation and tail information in a flexible manner. [52] proposed a kernel-type estimator of R ξ ð1 À aÞ, where f ξ and f α are replaced by kernel density estimators, respectively. For example, one can usef is a scaled version of a baseline kernel L that is a probability density function with finite variance. The kernel estimatorRð1 À aÞ of Rð1 À aÞ also enjoys level information of the joint density f ξ . We identify (100 × α)% extreme subjects in a sample as those falling outside ofRð1 À aÞ. For densely observed functional data, recent studies have investigated several measures of functional outliers, such as band depth and extremal depth for functional data [53][54][55].

Joint feature extraction from multiple time-varying traits
In this subsection, we describe how we perform dimension reduction for multivariate longitudinal observations by employing the covariance structure between multiple traits. Using Eq (1) and the FVE method introduced in the previous subsection, let X �;½j� ðtÞ ¼ P K ½j� k¼1 x ½j� k � ½j� k ðtÞ be truncated versions of the original time-varying traits X [j] (t) using only the first , and let z ½j� denote the standardized k-th FPC score of the j-th longitudinal trait, respectively. Then the functional covariance structure among the truncated time-varying traits ( can be reduced to the variance-covariance matrix of ðz ½j� This suggests to apply conventional principal component analysis (PCA) on the vector of standardized marginal FPC scores ðz ½j� Then, time-varying associations among multiple time-varying traits can be reproduced by a few PC scores in this second analysis. This approach has strong connections with the joint functional analysis methods of multiple random processes [56][57][58], and we also refer to [59] for similar ideas in a recent study on relationships between univariate and multivariate functional principal component analyses.

Identifying subgroups for risk associated with outcomes
Our study aims to identify at-risk longitudinal growth patterns associated with undesirable outcomes. For this, we consider conditional density function of outcomes Y given a collection of multiple time-varying traits. Let f Y|S be the conditional density of Y given a collection S of principal components that are obtained from the principal component analysis of the marginal FPC scores ðz ½j� In this study, we suggest four clusters (S m : 1 � m � 4) of standardized principal components of the multiple traits based on the (1 − α)-HDR method as follows: where Z = (Z 1 , Z 2 ) > is a 2-vector consisting of the first two PC scores obtained from the PCA of ζ = (ζ [1] , ζ [2] , ζ [3] ) and Rð1 À aÞ is the (1 − α)-HDR of Z as in Eq (4). Also, ρ 1 and ρ 2 are the eigenvalues associated with the first two PC scores Z 1 and Z 2 , respectively. We then examine distributional differences among f YjS m ð�jS m Þ, 1 � m � 4, and quantify the distributional differences with analysis of variance (ANOVA).
For practical implementation, we use conditional kernel density estimators for f Y|S , given Here jŜ m j equals the number of elements inŜ m , which are empirical clusters of Eq (6) defined bŷ where theẐ i are vectors of the first two PC scores from the PCA ofζ i ¼ ðζ ½1� i ;ζ ½2� i ;ζ ½3� i Þ and Rð1 À aÞ is the (1 − α)-HDR ofẐ i as defined in the previous subsection. Also,r 1 andr 2 are estimates of the eigenvalues associated with the first two PC scores, respectively.
Finally, we identify subgroups for risk associated with outcomes based on multiple comparison techniques. Once we find significant differences among different subgroups, post-hoc procedures can be applied to perform multiple comparisons and control for multiple testing, which then lends support to specify risk subgroups associated with outcomes. For example, Bonferroni or Benjamini-Hochberg [60] procedures can be applied for pairwise analysis and in the next section we adopt Tukey's post-hoc analysis [61] as a multiple comparison procedure for testing mean differences between all pairs of groups. We also use the Kruskal-Wallis rank sum test [62] as a nonparametric procedure for one-way ANOVA, and the Tukey-Kramer test (or Nemenyi test) for pairwise comparisons.
We report the simulation results in Table 1, where the numbers of joint extreme trajectory clusters associated with outcomes obtained from 1000 Monte Carlo repetitions with sample size n = 1000 are shown for α = 0.05. Tukey's post-hoc multiple comparison was employed to determine how many associated clusters exist at each Monte Carlo run. That is, at each repetition, we counted the subgroups which are completely separated by Tukey's post-hoc analysis. By comparing conditional mean differences of outcomes between the four extreme clusters, we found that the proposed method identified more risk clusters than the marginal methods which detected two clusters on average for all cases. The joint method identified the three or four of the archetype clusters which depict (high Z 1 , high Z 2 ), (high Z 1 , low Z 2 ), (low Z 1 , high Z 2 ) and (low Z 1 , low Z 2 ) up to 90.4% (= 59.8% + 30.6%). This result supports the use of multiple trajectories instead of a single trajectory when identifying archetypes of risk sets. This applies even as the first two PC scores have less than 50% FVE, as in this simulation example. Example for visualization of observations from simulation. The proposed method identifies clusters associated with response outcomes Y characterized by archetypal covariate levels Z = (Z 1 , Z 2 ), for example (high Z 1 , high Z 2 ), (high Z 1 , low Z 2 ), (low Z 1 , high Z 2 ) and (low Z 1 , low Z 2 ), which are symbolized by red, purple, green and blue points, respectively, where the crosses denote the cluster centers. The surface demonstrates the conditional mean response when regressing Y on Z = (Z 1 , Z 2 ) for n = 200 data points. https://doi.org/10.1371/journal.pone.0207073.g002

Marginal analysis for longitudinal measurements of growth traits. PROBIT contains three main time-varying traits; head circumference (HC), body length (LN) and weight (WT).
For the marginal FPCA of these three variables, we applied the PACE technique introduced in the Methodology section, since we only have sparse and irregular observations available. As in the simulation study, we also used the "fdapace" package in R [63]. Auto-covariance functions of each time-varying trait were reconstructed by the first two eigenfunctions. The fractions of variance explained (FVEs) were 97.70%, 96.92% and 98.14% for HC, LN and WT, respectively. See Fig 3 for illustrations of estimated auto-covariance functions and eigenfunctions. The first and second eigenfunctions can be regarded as "General growth" and "Growth acceleration", respectively. Based on the observed high FVE coverages, we assume in the rest of the paper that these two qualitative features carry information about the longitudinal patterns of time-varying growth traits in the PROBIT data.
The marginal analysis of outlying subgroups was performed with a (1 − α)-high density region (HDR) with α = 0.05. Subjects were classified as "normal" if they belonged to the 95% support region in the HDR criterion as in Eq (4), while outlying subjects were classified as 5% extreme cases falling outside the HDR criterion. In this study, we considered four exclusive subgroups as in Eq (10), where the trait index [j], 1 � j � 3, stands for HC, LN and WT, respectively. The four outlying subgroups correspond to distinctive growth patterns, which can be labeled as "Generally Large", "Catch-up", "Stunting" and "Faltering" (Fig 4). We found that the outlying patterns were discordant across traits. For example, subjects who were classified into the generally large head circumference subgroup could be normal for body length or weight, and vice versa. Moreover, as there are 4 3 -combinations of subgroups entailed by the marginal analysis, it is difficult to associate all these multiple trajectory patterns with the response of interest, which is IQ at 6.5 years.
Potential risk subgroups for cognitive development. We constructed joint outlying subgroups of multiple time-varying traits based on the HDR method and standardized PC scores of multiple traits ðẑ ½j� i ¼x ½j� ik =ðl ½j� k Þ 1=2 : 1 � k � 2; 1 � j � 3Þ as described in Eq (7). Principal component analysis results for the six FPC scores are presented in Table 2, where the first and second FPC stand for scores of general growth and growth acceleration, respectively. We found that these two features were captured in the first two PC loadings. In this study, we  focus on the first two PC scores as they explain more than 95% of the variation for each of the three modalities [41,42,45]. On the other hand, socio-economic factors affect childhood intelligence in ways that are not reflected in the FPCA of time-varying traits [26,27]. To avoid confounding effects by socio-economic variables, we used a linear mixed effects model to reduce the influence of the socio-economic indicators. Hospital information was treated as a random effect as it is related to the random clustered design of the PROBIT study. We used the residuals of the linear mixed effects model and marginalized the effect of the potentially confounding variables considered above. For details of the data preprocessing, we refer to [22] and [13].
In Fig 5, we find that conditional densities of IQ measured at 6.5 years, given the joint outlying subgroups, exhibited different distributional behaviors. The four subgroups were constructed by a principal component analysis of the standardized six FPC scores for HC, LN and WT. The significance of the group mean differences was examined by one-way ANOVA (pvalue = 0.002), and also by the Kruskal-Wallis rank sum test [62], a nonparametric procedure for one-way ANOVA (p-value < 0.001), so that the results were qualitatively the same. For a more detailed comparison, we performed post-hoc analysis with Tukey's multiple comparison procedure. As shown in Table 3 and Fig 5, we found significant mean differences among   outlying subgroups in a family-wise 5%-level test. "Stunting" and "Faltering" were associated with higher risk in comparison with the "Generally Large" and "Catch-up" subgroups. Similar results were obtained by using the nonparametric procedure for pairwise comparisons of the Tukey-Kramer test. We also applied several multiple comparison techniques such as Bonferroni and Benjamini-Hochberg methods for post-hoc analysis and similar results were obtained by controlling false discovery rate (FDR) at 5%. For example, we found that "Stunting" and "Faltering" were associated with higher risk in comparison with the "Generally Large" and "Catch-up" subgroups after application of both procedures. The results were suggestive of higher risk for "Faltering" versus "Catch-up" (p-value = 0.060 after Bonferroni correction). We close this section with a short remark on Figs 6 and 7 which presents the result of the marginal analysis as described in the Simulation study section. In contrast to the proposed joint method, we found that the marginal procedures may not effectively detect risk growth patterns associated with long-term IQ outcomes. From the post-hoc analysis, "Generally Large" and "Stunting" subgroups had significantly different IQ performance for head circumference (p-value = 0.002), body length (p-value = 0.003) and weight (p-value < 0.001), but no significant difference was found between other subgroups such as "Catch-up" or "Faltering" for head circumference and body length. We note that "Stunting" for one of the marginal components may not be an at-risk growth pattern associated with IQ development compared to Here, qualitative growth patterns include "Generally Large" (red), "Catch-up" (purple), "Stunting" (green) and "Faltering" (blue), whereas the black dashed line represents the "normal" subgroup that consists of subjects who do not belong to the four outlying subgroups. (Right panel) Tukey's multiple comparisons of mean differences for standardized IQ along outlying subgroups are demonstrated by family-wise 95% confidence intervals, where we label the four subgroups as R (red, "Generally Large"), P (purple, "Catch-up"), G (green, "Stunting") and B (blue, "Faltering").
https://doi.org/10.1371/journal.pone.0207073.t003 the other subgroups. Moreover, "Faltering" for weight showed higher IQ performances than the "Stunting" subgroup (p-values < 0.001), while failure to thrive in infancy, defined as weight faltering in the first 9 months of life, was previously found to be associated with persistent deficits in intellectual development when measured at 8 years [64]. These results suggest that the combination of multiple growth patterns can indeed be beneficial for identifying risk subgroups associated with IQ outcomes.

Discussion
This paper outlines a statistical framework for exploring multivariate functional patterns deduced from sparsely and irregularly sampled longitudinal data and their association with long-term outcomes. For the joint analysis of children's growth and IQ in the PROBIT data, we propose a straightforward way to combine multiple growth indicators under a single framework. We extract multiple growth features jointly, by using standard multivariate analysis of  Table 2. https://doi.org/10.1371/journal.pone.0207073.g006 Multiple FPCA and their association with outcomes the functional principal components. The major modes of growth variation are then represented at the subject level and we can thus profile outlying multiple growth patterns, which can be considered as archetypes of growth. The focus of this paper is how to combine multivariate functional data to identify extremal curve patterns and associate these features with responses. One may consider an alternative application of multivariate functional principal component analysis as in [58,59,65,66] or functional ANOVA [67][68][69] as alternatives. However, for all approaches it is critical how to determine outlying and extremal patterns jointly from multiple functional data, and the highdensity region (HDR) method to detect outlying functional principal components that we adopt here is a natural extension of similar nonparametric approaches in multivariate data analysis. Recently several related studies have been introduced for functional outlier detection [54,70,71] but it still remains an open problem how to combine these with other methodologies such as clustering and archetypal analysis.
In the PROBIT growth data analysis, we identified four archetypal subgroups of infant growth patterns, namely "Stunting", "Faltering", "Generally Large" and "Catch-up". In addition we also found that covariance structures have marginally similar patterns across the functional traits considered; head-circumference, body length and weight. According to our analysis, subgroups corresponding to "Stunting" and "Faltering" in the infant period had lower downstream IQ compared to "Generally Large" and "Catch-up" subgroups. This finding is supported by previous studies that link deficient infant growth and later life cognitive performance degradation.
It is worth mentioning that single growth indicators were not found to be associated with risk of lowered IQ, and the marginal analysis of single growth traits did not produce informative results in the PROBIT analysis (See Fig 7). Also, there is a possibility that the absence of any measure of cognitive ability during infancy in the data could be explained by reverse causality, namely, poor cognitive function in infancy may have led to worse dietary intake. Both may have been consequences of poor parenting or unmeasured insults during pregnancy or infancy. The proposed methods are not limited to specific data structures, such as growth data, but can be applied to many other kinds of longitudinal data as well, whenever a downstream outcome is of interest.