Canonical Correlation Analysis of Infant's Size at Birth and Maternal Factors: A Study in Rural Northwest Bangladesh

This analysis was conducted to explore the association between 5 birth size measurements (weight, length and head, chest and mid-upper arm [MUAC] circumferences) as dependent variables and 10 maternal factors as independent variables using canonical correlation analysis (CCA). CCA considers simultaneously sets of dependent and independent variables and, thus, generates a substantially reduced type 1 error. Data were from women delivering a singleton live birth (n = 14506) while participating in a double-masked, cluster-randomized, placebo-controlled maternal vitamin A or β-carotene supplementation trial in rural Bangladesh. The first canonical correlation was 0.42 (P<0.001), demonstrating a moderate positive correlation mainly between the 5 birth size measurements and 5 maternal factors (preterm delivery, early pregnancy MUAC, infant sex, age and parity). A significant interaction between infant sex and preterm delivery on birth size was also revealed from the score plot. Thirteen percent of birth size variability was explained by the composite score of the maternal factors (Redundancy, RY/X = 0.131). Given an ability to accommodate numerous relationships and reduce complexities of multiple comparisons, CCA identified the 5 maternal variables able to predict birth size in this rural Bangladesh setting. CCA may offer an efficient, practical and inclusive approach to assessing the association between two sets of variables, addressing the innate complexity of interactions.


Introduction
There is a growing interest among maternal and child health researchers in studying the relationship between birth size and maternal socio-demographic and health factors. It is universally recognized that birth size is a lagged indicator of fetal health and predictive of neonatal health and survival. Birth weight in particular is strongly associated with mortality and morbidity in infancy and early childhood [1,2]. However, fetal growth is largely, but not solely, determined by the availability of nutrients from the mother before and during gestation, as well as placental capacity to supply these nutrients in sufficient quantities to the fetus [3], and birth size can reflect the intrauterine environment.
Maternal nutritional status largely depends on socio-economic factors. Women from a higher socioeconomic status have increased access to and consumption of nutritious foods during or prior to gestation and more antenatal care (ANC) visits and nutrition supplementation during gestation. Small birth size is more common in resource poor settings or among more disadvantaged populations [4,5,6].
Birth weight is often the exclusive birth size measure used to evaluate fetal growth. However, other measurements like length and head, chest, and arm circumferences may be important in predicting long-term health and development outcomes [7]. When exploring the health effects of different exposures, observational epidemiologic studies often deal with data that include both a set of exposure variables and a set of outcome variables. Routine statistical approaches such as multiple linear regression used to analyze the relationship between exposures and outcomes such as birth size are usually challenged by the potential issues of multiple testing and multicollinearity [8,9]. In some literatures, authors made an effort of analyzing birth size and other maternal, social or environmental variables [7,10,11,12,13] used multiple linear regression for analysis despite its limitations. Since CCA assesses the correlation between two composite variables called canonical variate, one representing a set of the exposure variables and the other a set of outcome variables [8,9], it may be a useful method to evaluate the effect of maternal factors on infant's size at birth.CCA is the most general case of general linear model [9,14,15,16] and thus it can be used to conduct the univariate and multivariate analyses that CCA subsumes, including multiple regression as a special case [17]. CCA has several advantages for researchers which were described elsewhere [18,19]. Thus CCA is technically able to analyze data involving multiple sets of variables and is theoretically consistent with that purpose [9]. Although CCA is used currently in many branches of research: social and behavioral research [8], bioinformatics [20], genetics [21], neural network [22], environmental research [23] etc, it is relatively uncommon in public health research and to our knowledge, CCA has not been applied to analyze the relationship between maternal factors and birth size. The aim of this research is to explore the relationship between birth size and maternal factors using CCA in a community based maternal and child health and nutrition research project. We also want to identify the influential variables in the relationship and the significant interactions between variables.

Study design and participants
The data reported in this analysis were collected during a field based double-masked, cluster randomized, placebo-controlled trial assessing the efficacy of maternal vitamin A or b-carotene supplementation on maternal and infant mortality through 6 months of age from January 2002 to July 2007. Details are available elsewhere [24,25,26]. In brief, this study was conducted in a contiguous ,435 sq km area in rural northwestern Gaibandha and Rangpur Districts of Bangladesh, with a population of ,650,000. Predefined household clusters consisting approximately 250 households called sector (n = 596) were randomized to receive study supplements. Married women of reproductive age were enumerated through a baseline census and a subsequent 5 weekly surveillance was carried out to include newly married women. A 5-weekly visit was conducted to assess menstrual history. When a woman reported having missed her menstrual period in the past 30 days, pregnancy was confirmed using human chorionic gonadotropin based on the spot urine test. Once a woman was ascertained her pregnancy, she was asked for consent to receive study supplementation and providing data. Throughout the enrollment period 59721 pregnant women consented and enrolled into the trial [27].
On enrollment into the trial, mothers were interviewed about household socioeconomic conditions, education, demographic characteristics, previous pregnancy history, frequencies of dietary intake and morbidity in the previous 7 days and measured for midupper arm circumference (MUAC) [26]. A Living Standard Index (LSI) was constructed using principal component analysis from household socio-economic variables and was used as the main socio-economic variable [27]. Mothers were visited, provided allocated supplements (vitamin A, b-carotene or placebo) and checked for pregnancy and vital status throughout pregnancy to 3 months post-partum, at which time another interview was completed to obtain further data on maternal diet and morbidity, ANC, events and care during labor and delivery, and conditions of the infant.
Birth anthropometry was collected on infants of consenting mothers who took part in a placebo-controlled newborn vitamin A supplementation trial that was nested into the latter half of the above maternal trial [24]. Live-born infants (n = 21,585) were visited for dosing by field staff as soon as possible after birth (median (Inter Quartile Range, IQR) hrs: 7 (2,18)). Of this number, 16,290 infants (75%) were singletons who were subsequently visited and measured by trained one of 56 anthropometrists within 72 hours of birth (median (IQR) hrs: 18 (9,36) and included in the present analysis.
Birth size measurements included weight, length, MUAC and head and chest circumferences. Birth weight was measured to the nearest 10 g using a Tanita BD-585 digital pediatric scale (Tanita Corporation, Tokyo, Japan). Length was measured to the nearest 0.1 cm using an affixed headboard and movable footplate that had been fashioned for use with the Tanita scale. Circumferential measurements were made to the nearest 0.1 cm with a Ross insertion tape (Abbott Laboratories, Columbus, OH). All measurements, except for weight, were measured in triplicate with the median taken as the accepted value, as previously described [28]. The cut-offs used to define a small infant are, weight (,2.5 kg), MUAC (,10 cm), head circumference (,33 cm) and chest circumference (,30.5 cm) [29]. Among the 16,290 infants on whom birth anthropometry was collected, 14,506 (89%) had complete data and were included in the CCA which does not allow missing values.
The maternal characteristics included in the present analysis are: age at enrollment, parity, early pregnancy mid upper arm circumference (MUAC, cm), education (yrs), LSI, number of ANC visits, and maternal trial supplementation (Vitamin A or bcarotene). Additional infant characteristics included preterm (,37 week of gestation) delivery status and sex.

Ethics Statement
The overall Jivita study protocol was reviewed and approved by both the Bangladesh Medical Research Council (BMRC) and the Institutional Review Board (IRB) of Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA. Documented consent was given by all participating women.

Canonical Correlation Analysis (CCA)
CCA is a multivariate statistical model that facilitates the study of linear interrelationships between two sets of variables: one set of variables is referred to as independent and the other as dependent; a composite score is formed for each set. CCA develops a canonical function that maximizes the correlation between the two composite variables. Additionally, CCA develops as many functions as there are variables in the smaller variable set; each function is independent (orthogonal) from the others so that they represent different relationships among the sets of dependent and independent variables [32]. The loadings of the individual variables differ in each canonical function and represent variables' contributions to the specific relationship being investigated. Now the challenge is to choose how many of them should be interpreted, however, in most cases the first function is the most legitimate. Hair et.al. [30] suggested 3 criteria of choosing the important functions as they believed that the use of a single criterion such as the level of significance is too superficial. Because the composite scores are calculated for each set to maximize the correlation between them, they don't care how much variability they take in to account of each set. The 3 criteria are: (i) level of significance (ii) magnitude of the canonical correlation, and (iii) redundancy measure for the percentage of variance accounted for from the two data sets like multiple regression's R 2 statistic. We interpreted the most widely used test for significance of each function, the F statistic [31]. No generally accepted guidelines have been established regarding suitable sizes for canonical correlations. The decision is usually based on the contribution of the findings to better understand the research problem studied. The redundancy index [32] is analogous perfectly to the R 2 statistic in multiple regressions. According to Sherry and Hensen [8], any function that explains ,10% of the remaining variance after that explained by a certain number of functions, even if it has significant correlation, the effect sizes of the other functions are considered less impressive. In this paper we applied the criterion of a correlation significance level of 5% and redundancy coefficient of .0.10 to choose the interpretable canonical functions. CCA can be used for both continuous and categorical data of either dependent and independent variables [30].
To determine the relative importance of each original variable in to each function three methods have been proposed (i) canonical weights (standardized coefficients), (ii) canonical loadings (structural correlations) and (iii) canonical cross-loadings. As the canonical weights, like regression weights, are vulnerable to multicollinearity, most of the literature suggest to use canonical loadings or crossing loadings [9,23,30]. We used both loadings and cross loadings, however, there is no established cut off. There is a rule of thumb if any variable loading is .|0.30| then it can be considered to be an important contributing variable in to the function [33]. The score plot, 1 st variate on the horizontal axis and the 2nd variate on the vertical axis, of composite score also helped to find natural variable groupings in to the data set [34].
Multiple linear regression was used to examine the relationship between birth size and maternal factors and to compare the performance of the model with important maternal variables derived from CCA and the model with all maternal variables. Five models, one for each infant's size variable, were fitted with (i) 10 maternal factors and (ii) with only that factors which had significant loadings ($30) in the canonical correlation analysis.
To support our CCA findings we stratified our samples by prematurity status and infant sex and investigated their interaction on birth size. We used mean and 95% CI of the 5 anthropometric measurements for 4 strata (Term-Female, Term-Male, Preterm-Female and Preterm-Male). Multivariate Analysis of Variance (MANOVA) was used to investigate interaction effects on infant's size at birth. All analyses were performed using statistical software R version 2.14.1. We used the CCA and yacca R packages.

Results
More than 50% of the infants were born small. That is they were born with weight ,2.5 kg, MUAC,10 cm, head circumference,33 cm and chest circumference,30.5 cm. Twenty seven percent of infants were preterm. Half of the infants were male. Mean (SD) maternal age was 22.0 (5.9) years and MUAC was 23.0 (2.0) cm. Most of the women (74%) had not reported an ANC visit. Nearly half of the women (,43%) were nulliparous and their  4). Half of the women were literate (52%) and their mean (SD) years of schooling was 3.8 (3.9) ( Table 1). Table 2 represents the Pearson's correlation coefficient between the maternal factors and infant's size at birth. All maternal variables except preterm delivery and vitamin A or b-carotene supplementation were positively correlated with infant size at birth. All the infant's anthropometric measurements were negatively correlated with preterm delivery (P,0.05 for all), however, there was no correlation with maternal vitamin A or b-carotene supplementation.
The canonical correlation coefficients and the redundancy indices are presented in Table 3. The CCA is restricted to deriving 5 functions because the dependent set contained the minimum number of 5 variables. The correlations for each successive function were 0.42, 0.19, 0.08, 0.04 and 0.02. All correlations except for the last were statistically significant (P,0.05, F-test). However, the redundancy index for all functions except the first one was zero. Therefore, only the first function is noteworthy in the context of this study.
The loadings and cross loadings of the variables for the 1 st canonical function are presented in Table 4. Looking at the loadings of the variables for function 1 the most important predictor of birth size was preterm delivery (loading: 20.74) followed by maternal early pregnancy MUAC (loading: 0.37), infant's sex (loading: 0.35), maternal age (loading: 0.34) and parity (loading: 0.32). Loadings of the birth size indicators demonstrated that all the anthropometric measurements similarly contributed to the first canonical function. So, all the infant's anthropometric measurements were most strongly negatively correlated with preterm delivery, and positively associated with maternal early pregnancy MUAC, infant sex, age and parity, in that order.
Regression coefficients are presented in Table 5. In the models with all 10 maternal factors, except vitamin A and b-carotene supplementation all other factors were significant predictors of infant size at birth. However, in all the models with 5 maternal factors selected through CCA, all 5 factors were significant predictors of infant size at birth. The differences between the coefficients of determination, R 2 of the full models and the models with 5 variables varied from 0.01 to 0.02. Figure 1 shows the biplot of the standardized weights for the first two canonical functions for both the maternal factors and infant's anthropometric variables and score plot for the first two composite scores of the maternal factors. Panel A of Figure 1 illustrates that among the maternal factors preterm delivery had the greatest influence on first canonical function and infant's sex had greatest influence on the second canonical function but maternal early pregnancy MUAC, age and parity had similar influence on both functions and maternal vitamin A and bcarotene supplementation and maternal education had no influence on either function. The infant size variables had no influence on the second function which implies that most of the variability in infant size was accounted for by the first composite score. Panel B of the Figure 1 shows the score plot of the first and second composite scores of maternal factors. Four different groups among the infants are revealed. The grouping results from the interaction effect of preterm delivery and infant sex as they dominate the relationship. Table 6 presents stratum wise mean and 95% confidence interval of birth size. Birth size was significantly different across stratum. MANOVA showed a significant interaction effect of preterm delivery and infant's sex on birth size; F = 161.83, p,0.001.

Discussion
We studied the association between birth size and maternal factors using canonical CCA. CCA was used instead of separate  Table 5. Regression analysis of influence of maternal factors on birth size using canonical correlation analysis. linear regression models for each birth size measurement because it simultaneously models effects of multiple independent variables on multiple dependent variables. As CCA uses information from all the variables in both the exposure and outcome variable sets and maximizes the estimation of the relationship between the two sets, CCA may offer a more efficient approach for assessing the effects of the maternal factors on infant size at birth than methods routinely used, such as multiple linear regression. CCA starts with simultaneous consideration of both exposure and outcome variables, limiting the inefficiencies that may accompany conventional multiple testing, and, thus, reducing type-1 error. Furthermore, in CCA the latent variable approach, as used, helped to avoid multicollinearity [23]. The resulting procedure gives a global view of association between indicators of infant size at birth and maternal factors. We found that infant size at birth in rural Bangladesh had significant but moderate association with maternal nutritional and socioeconomic factors. In addition to providing an assessment of the association between two sets of variables, the application of CCA helped in narrowing down fewer exposure (maternal factors) and outcome variables (birth size) that might contribute to the relationship based on the variable loadings to the composite scores. Thus, CCA could be used as a comprehensive approach to extracting information from data to simultaneously identify both key exposure and outcome variables so that the assessment of the relationship between an individual exposure and an outcome can be further preceded. Additionally, CCA revealed a significant interaction between preterm delivery and infant's sex on birth size through the score plot of composite scores. Because the birth size measurements are highly correlated, the combination of the indicators captures more information and, thus, as a composite variable may better predict future health outcomes more efficiently than use of a single birth size measure. For example, head circumference, as an indicator of brain volume [35], may provide important diagnostic and prognostic information, for example related to neurocognitive function [36], beyond that provided by birth weight alone. So too, might it be expected  that, along with birth weight, other indicators of birth size like length and head, chest and arm circumferences can provide additional information about a wider range of health outcomes related to future child growth, health and development. WHO suggests that a population with a prevalence of low birth weight of 15% or more or a prevalence of chest circumference at birth ,30 cm experiences a disproportionately elevated risk of infant mortality and morbidity and long-term adverse effects on childhood growth and performance [37]. We found that approximately half of the infants in this typical rural, Bangladeshi population [25] were born both low birth weight [28] and small in chest circumference (,30 cm), revealing a major public health concern and a subset of infants whose health risks may extend beyond those associated with either criterion alone.
Pearson's correlation coefficients showed that maternal factors, age, parity, MUAC in early pregnancy, LSI of socioeconomic well being, maternal education, number of ANC visits and infant sex were significantly positively associated with birth size whereas, expectedly, preterm delivery was strongly negatively associated with newborn size measures. The individual multiple linear regression analyses also depicted virtually identical results, i.e. in all 5 models, except vitamin A and b-carotene supplementation, all other predictors had significant b-coefficients (P,0.05) (data not shown). Christian and colleagues [28] also found no significant effect of maternal vitamin A or b-carotene supplementation on newborn's anthropometry in the same population. CCA reduced the number of factors necessary to predict birth size to age, parity, early pregnancy MUAC, infant sex, and preterm delivery (loadings .0.30). If CCA was performed with these 5 predictors instead of 10 then canonical correlation would remain almost the same, r = 0.41 (data not shown). Thus, if CCA was not used before fitting the regression model we would have 3 redundant variables as significant predictors of infant's size. So in addition to evaluating the association between two sets of variables, CCA can also be used as a data mining tool in that it was able to narrow down fewer exposure and outcome variables which might contribute to the relationship.
The score plot of the composite scores can also identify the effect of interaction between factors on outcome of interest [34]. The composite scores are the projection of original multidimen-sional variables to a lower dimension subject to constraint that the correlation between the composite scores of dependent and independent variable sets is maximized. That is, the composite score for the maternal factors was constructed to mirror multiple dimensions of infant size at birth. The effect of interaction between independent variables on the dependent variables was depicted in the score plot of 1 st and 2 nd composite score of the independent variables. In this study, following the canonical correlation analysis, the multivariate analysis of variance indicated that infant sex and preterm delivery displayed a significant interaction effect on birth size. Infant size was bigger for the male term followed by female term, male preterm and female preterm. Many literature also found this kind of interaction effect on birth size [38].
In conclusion, CCA was used to explore the significant association between infant's size at birth and maternal factors. The maternal factors affecting or not affecting infant size at birth, isolated through canonical correlation analysis, were consistent with evidence of these kinds of associations in the literature [11,12,13,39,40,41,42,43,44]. CCA may offer an efficient, practical and more biologically comprehensive approach to assessing the association between two sets of variables, by taking into account the innate complexity of interactions and biological pathways that between variables.