Development and Validation of a Vitamin D Status Prediction Model in Danish Pregnant Women: A Study of the Danish National Birth Cohort

Vitamin D has been hypothesized to reduce risk of pregnancy complications such as preeclampsia, gestational diabetes mellitus, and preterm delivery. However, many of these outcomes are rare and require a large sample size to study, representing a challenge for cohorts with a limited number of preserved samples. The aims of this study were to (1) identify predictors of serum 25-hydroxy-vitamin D (25(OH)D) among pregnant women in a subsample (N = 1494) of the Danish National Birth Cohort (DNBC) and (2) develop and validate a score predicting 25(OH)D-status in order to explore associations between vitamin D and maternal and offspring health outcomes in the DNBC. In our study sample, 42.3% of the population had deficient levels of vitamin D (<50 nmol/L 25(OH)D) and average levels of 25(OH)D-status were 56.7(s.d. 24.6) nmol/L. A prediction model consisting of intake of vitamin D from diet and supplements, outdoor physical activity, tanning bed use, smoking, and month of blood draw explained 40.1% of the variance in 25(OH)D and mean measured 25(OH)D-level increased linearly by decile of predicted 25(OH)D-score. In total 32.2% of the women were placed in the same quintile by both measured and predicted 25(OH)D-values and 69.9% were placed in the same or adjacent quintile by both methods. Cohen's weighted kappa coefficient (Κ = 0.3) reflected fair agreement between measured 25(OH)D-levels and predicted 25(OH)D-score. These results are comparable to other settings in which vitamin D scores have shown similar associations with disease outcomes as measured 25(OH)D-levels. Our findings suggest that predicted 25(OH)D-scores may be a useful alternative to measured 25(OH)D for examining associations between vitamin D and disease outcomes in the DNBC cohort, but cannot substitute for measured 25(OH)D-levels for estimates of prevalence.


Introduction
It has long been known that vitamin D has important functions related to calcium homeostasis and bone development [1], but there has been considerable recent interest in the non-classical functions of vitamin D. Some studies have shown associations between vitamin D deficiency and certain types of cancers, heart disease, schizophrenia, type 1 diabetes, multiple sclerosis and autoimmune diseases [1,2]. In pregnancy, vitamin D deficiency has been shown to be associated with complications such as preeclampsia, gestational diabetes mellitus, and primary caesarian section and it has been hypothesized to also induce increased risk of multiple sclerosis, heart disease, and cancer later in life [3][4][5][6][7][8][9][10]. However, few supplementation trials exist, and observational studies have exhibited inconsistent results for most outcomes.
A specific purpose of identifying determinants of vitamin D status is to develop algorithms that can be used to predict the vitamin D status of individuals for whom vitamin D status is not available or prohibitively expensive to estimate in the entire population. Increasingly, researchers have used ''vitamin D prediction scores'' constructed from variables found to influence longer term vitamin D status to explore associations with chronic disease in non-pregnant subjects [14,[17][18][19]21,[24][25][26][27][28]. Less work has been done to apply this concept to pregnancy or to explore the validity of vitamin D predictive scores in estimating vitamin D status in the shorter term. Using data from the Danish National Birth Cohort (DNBC) [32,33], the aims of this study were to (1) identify determinants of vitamin D status in pregnant women (2) develop and validate a prediction model of 25(OH)D-score for the purpose of exploring associations between vitamin D and maternal and offspring health outcomes in the DNBC.

Methods
The study was performed within the DNBC; a nationwide prospective cohort study with long term follow-up [32,33]. The cohort consisted of 101,042 pregnancies recruited in Denmark from 1996-2002. Enrolment criteria were intention to carry to term and ability to fill in questionnaires and take part in interviews in Danish. Women were enrolled at the first antenatal visit to their general practitioners. Information on lifestyle, diet, and socioeconomic status of the pregnant women was obtained from a recruitment form, a food frequency questionnaire (FFQ), and four telephone interviews (see timing of activities in Figure S1). Follow-up of the cohort is still ongoing.
In a previous case-control study of postpartum depression (PPD) 25(OH)D-levels were measured in 1497 pregnant women (892 non-cases and 605 cases) from blood drawn in week 25 of gestation (manuscript in preparation). The study showed no overall association between vitamin D status and risk of PPD. These data form the basis for the present study. Data were arbitrarily divided into two groups; one predictive group to develop a prediction model and one validation group to validate the model. The prediction group was randomly assigned 33% of non-cases and the validation group was assigned the remaining 67%. The prevalence of PPD is approximately 10% in the general population [34,35], and in order to make the validation group as reflective of the overall cohort as possible we randomly allocated 66 cases of women with PPD to the validation group to reach 10% prevalence. The remaining cases were added to the prediction group to maximize the sample size in the exploratory phase. See composition of the two groups and flow chart of the study in Figure S2.
The blood samples were collected at the general practitioner (GP) and sent for processing and storage by regular mail. Samples were thus transported at normal temperatures for up to 48 hours, but most arrived within 28 hours. After 9-15 years in a 280uc freezer, samples were thawed and 30 ml plasma were used to analyze 25(OH)D 3 and 25(OH)D 2 by using ''MSMS vitamin D'' kit from Perkin Elmer (Waltham MA). Briefly, 30 uL of serum samples were deproteinized in microtiter plates using 120 uL acetonilrile containing 2 H 3 -25-OH vitamin D2 and 2 H 3 -25-OH vitamin D3 as internal standards. The supernatant was transferred to fresh plates and dried under a gentle flow of nitrogen. Subsequently the samples were derivatized using PTAD dissolved in acetonitrile. The derivatization reaction was quenched with quench solution and the samples were subjected to LC-MSMS analysis. The LC-MSMS system consisted of a CTC PAL autosampler (CTC Analytics, Zwingen, Switzerland), a Thermo surveyor LC pump and a Thermo TSQ Ultra triple quadrupole mass spectrometer (Thermo Scientific Waltham, MA). Separation was achieved using a Thermo Gold C18 column (5062,1 mm, 3 u). The following transitions were used: 619.3/298.1 and 607.3/ 298.1 for 25-OH vitamin D2 and D3 respectively, 622,3/301,1 and 610,3/298,1 for internal standards of D2 and D3 respectively, 625.3/298,1 and 613,3/298,1 for the calibration standards of D2 and D3 respectively. The total coefficient of variance was 8%.
Univariate regression analyses were performed to explore associations with vitamin D status. In order to test for non-linear associations we performed spline regression models. We checked for effect modification of outdoor physical activity, tanning bed use, and travels to sunny destinations by season (winter = October to March, summer = April to September) in multivariate models consisting of 25(OH)D-level, season and the relevant variable. Variables and interaction terms with p-values,0.10 in either linear or non-linear univariate analyses were included in multivariate regression models. To develop parsimonious vitamin D scores, only variables with p-values,0.05 were retained in the final model: a stepwise approach was used to remove variables always excluding the variable with the highest p-value first. The linear term was kept in the analysis despite a p-value.0.05 if the variable had a significant non-linear term. Observations with missing values in any of the variables included in the prediction model were excluded and so were 3 observations with 25(OH)Dstatus .150 nmol/L because they were outliers and it was felt that they would exert disproportionate influence on the final model. (see flowchart in Figure S2). We compared characteristics of women who were excluded from this analysis due to missing values with those who were included and only minor differences were observed with regard to country of birth, BMI, outdoor physical activity, total physical activity and intake of vitamin D from supplements (data not shown).
The  Table 1 presents characteristics of the overall study population and of the prediction and validation groups. About half the included variables were equally distributed in the prediction group and the validation group, but mean physical activity level, outdoor physical activity level, tanning bed use, alcohol intake, 25(OH)D 2 -level, 25(OH)D 3level, and total 25(OH)D-level were significantly different in the prediction and validation groups. The distribution of occupational status was also significantly different in the two groups and so was the proportion of PPD-cases as this was by definition the way the groups were constructed.
In univariate linear regression analyses the following variables were found to be significantly associated with 25(OH)D-status and were included in the prediction model: Intake of dietary vitamin D, fish intake, average UVB-radiation in the two months prior to blood draw, BMI, socio-occupational status, month of blood draw, parity, smoking, tanning-bed use, travels to sunny destinations, vitamin D intake from supplements, and maternal country of birth. A statistically significant interaction between tanning bed use and season was also found. Using bsplines with four degrees of freedom we detected non-linear associations for physical activity, energy intake, intake of dietary vitamin D, BMI, maternal age, tanning bed use, vitamin D intake from supplements and outdoor physical activity and these variables were also included in the prediction model. Further investigation indicated a quadratic relationship and hence quadratic terms were included for these variables to account for this non-linearity.
In the multivariate model variables were excluded in the following order: Fish intake, energy intake, socio-occupational status, vitamin D intake from supplements (non-linear term), physical activity, BMI, average UVB-radiation in the two months prior to blood draw, travels to sunny destinations, maternal country of birth, maternal age, and parity. Thus the final model included the variables smoking, month of blood draw, linear and non-linear terms for dietary vitamin D intake, linear and non-linear terms for tanning bed use, the interaction term between tanning bed use and season, vitamin D intake from supplements, linear and non-linear outdoor physical activity ( Table 2). This model explained 40.1% (R 2 = 0.401) of the variance in 25(OH)D-status.
The model was then used to predict 25(OH)D-scores in the validation group. The distribution of measured 25(OH)D-levels and predicted 25(OH)D-scores in the validation group can be seen in Figure 1. Most of the predicted scores fell between 20-100 nmol/L, a range that was considerably tighter than actual measured vitamin D levels. Because the model produces a 25(OH)D-score based on observed characteristics it is possible for scores to be negative despite the fact that this is biologically impossible, and this occurred for one person's estimated levels. The Pearson correlation coefficient between the measured 25(OH)D-levels and the predicted 25(OH)D-scores in the validation group was 0.4 (P,0.0001).
To determine how well the prediction model performed in ranking the individuals we cross-classified individuals from the validation group by quintiles of measured and predicted 25(OH)D-values ( Table 3). In total 32.1% of the individuals were placed in the same quintile by both measured and predicted 25(OH)D-values, 69.9% were placed in the same or adjacent quintile by both measures, and only 1.9% were placed in opposite quintiles. Cohen's weighted kappa coefficient was 0.3 which reflects fair agreement according to Landis et al. 1977 [36]. Figure 2 shows that mean measured 25(OH)D-levels increased by decile of predicted 25(OH)D-score although the contrast between consecutive deciles appeared to be greater in the middle of the distribution (from 40-75 nmol/L) than at the upper or lower ends of the distribution.

Discussion
In this subsample of the DNBC, we found high levels of vitamin D deficiency/insufficiency, evidenced by the fact that only a quarter of women had levels above 75 nmol/L. Based on a literature search we created a prediction model of 25(OH)D-score using factors previously known or suspected to influence vitamin D status. We found that 25(OH)D-scores were significantly predicted by intake of vitamin D from diet and supplements, outdoor physical activity, tanning bed use, smoking, and month of blood draw.
The final model explained 40% of the variation in 25(OH)Dstatus, and mean measured 25(OH)D-level increased linearly by decile of predicted 25(OH)D-score. This suggests that our vitamin D score had good ability to rank individuals according to vitamin D status. The same conclusion was reached when study participants were cross-classified according to measured and predicted 25(OH)D-status.
Our prediction model had a higher predictive power than most previous models reported in the literature (40.1% vs. 16-32%). [14,15,[17][18][19][20][22][23][24][25][26][27][28][29]31]. These studies have generally tried to estimate longer term vitamin D status, which is a more relevant time window for the development of chronic disease outcomes such as cancer, and one might expect shorter term predictive power to be better. Our prediction model is specific to midpregnancy and its ability to characterize status during the first or last trimester is uncertain. While it is generally assumed that levels of 25(OH)D remain fairly constant during pregnancy, we found that season was an important determinant of vitamin D status, and might introduce misclassification error when using this score to try to predict status in other trimesters of pregnancy [37] For many pregnancy-related outcomes, and potentially later life outcomes of offspring, the window of etiological relevance of vitamin D is uncertain. making this an important issue worthy of further exploration. However, the same approach could be taken to construct scores using samples taken in early or late pregnancy.
Only one other study validated their prediction model: Bertrand et al. (2012) cross-classified measured and predicted 25(OH)Dstatus and found 59.8%-66.5% to be ranked in same or adjacent quintile and concluded that the prediction model could be used to rank individuals according to vitamin D status [14]. Also they substituted predicted score of 25(OH)D for measured 25(OH)Dstatus in data from a previously published study of colorectal cancer [38], and they saw similar results [14]. We have not yet performed a similar validation substitution exercise, but given that the proportion of subjects ranked in the same or adjacent quintile in our validation study was higher than in the study by Bertrand et al. (2012) we feel that our score could have similar utility for ranking individuals for purposes of relating predicted vitamin D status to health outcomes. Several previous studies have shown smoking and month or season of blood draw to influence 25(OH)D-status [14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31]. Similar to findings of most studies, we found higher 25(OH)Dscores among nonsmokers compared with smokers. We had expected daily average UVB-radiation at and 2 months prior to the day of blood draw to be a strong predictor of 25(OH)D-score. However, in our model, month of blood measurement was a stronger predictor of status than UVB-radiation, perhaps because UVB-radiation was only collected at one site in the country, or because we lacked information on details such as use of sunscreen or extent of skin covering with clothing that may have been captured better by the seasonal variable.
While both overall physical activity and presumed outdoor physical activity were considered as potential covariates based on the findings of previous studies, only outdoor physical activity significantly predicted 25(OH)D-status. The pathway by which physical activity leads to increased skin synthesis of vitamin D is most likely through time spent outdoors and subsequent dermal synthesis of vitamin D. We looked for interactions between outdoor physical activity and season but we did not find any. Earlier studies used total physical activity as a proxy for outdoor UV exposure and one strength of our study was that we had a list of physical activities that enabled us to distinguish physical activities that were likely to have been conducted outdoors [14,17]. However, we lacked information on other time spent outdoors, use of sunscreen, skin color and other factors known to influence dermal synthesis of vitamin D which might have improved our predictive power. Since exposure to sunlight is an important predictor of vitamin D status, and dermal synthesis of vitamin D is very limited during winter in Denmark we included travels to sunny destinations as a potential predictor of vitamin D status. To our knowledge, no other studies have included travel in their prediction analyses, and we expected to see an effect modification of season on the association between travels to sunny destinations and vitamin D status. However, only few study participants reported travelling to sunny destinations during the winter months, and data may thus have been too sparse for such associations to reach statistical significance. However, we did find that tanning bed use predicted 25(OH)D-status, and that this relationship was modified by season suggesting increased use during sun-deprived periods [23,30].
A number of previous studies have found BMI and total body fat percentage to predict vitamin D status in non-pregnant subjects [14,[17][18][19][20]22,25,26,28,29,31]. It is thought that vitamin D, a fat soluble vitamin, may be sequestered in adipose tissue leading to reduced serum vitamin D status [1,13]. However, pre-pregnancy BMI did not predict 25(OH)D-status in our data. This could have been due to self-reported measures of weight and height in the DNBC, and because pre-pregnancy BMI reflected another point in time compared with the blood sample. Strengths of our study included the possibility to investigate a wide range of potential vitamin D determinants and a large sample size as well as the relatively good predictive power of the final model. As far as limitations, it is important to note that the measurement of serum vitamin D levels also involves error and is therefore an ''alloyed gold standard'' as far as representing actual vitamin D status of women. However, we did use liquid chromatography -tandem mass spectrometry to measure women's levels which is considered by many to be the most reliable method of assessing 25(OH)D levels. We also lacked information on ethnicity and skin color. However, our population was largely homogenous because fluency in Danish was a prerequisite for inclusion in the cohort. For this reason, and the unique availability of certain variables in our dataset, our final  model is likely to be generalizable only to our specific cohort, although similar approaches could be taken to develop and validate models in other populations. We also lacked information on certain variables related to skin pigmentation, use of sunscreen or protective clothing, which might have increased the predictive power of our model. Our prediction dataset included a disproportionate number of cases of PPD because we wanted to make the best use of available data, and it is possible that this may have led to a reduction in predictive power in the validation study if this had introduced any bias. However, we found no significant association between PPD and 25(OH)D-levels in univariate analysis suggesting that this did not adversely affect the development of the score. Blood samples were stored for 9-15 years before they were measured. Even though we used LC-MSMS analysis, which is considered the most accurate measure of vitamin D status, it is possible that there may have been some deterioration of vitamin D levels in the preserved sample. However, as noted in a recent study of 40 year old samples, if deterioration occurred, it would have led to lower levels in the entire population, and the relative differences between individuals would not be affected [39].
In conclusion, our prediction model accounted for 40.1% of the variance in vitamin D status and showed acceptable ability to properly classify individuals by quintiles of status in this cohort. 25(OH)D-status was predicted by intake of vitamin D from diet and supplements, outdoor physical activity, tanning bed use, smoking, and month of blood draw. Since prediction models only explain a proportion of the variation in 25(OH)D-status it is important to bear in mind that predicted 25(OH)D-scores cannot substitute actual blood measurements as a tool in evaluating individual vitamin D status but can be used in ranking individuals in a group according to 25(OH)D-status for purposes of examining relationships between vitamin D and disease outcomes in this cohort.

Conclusion
In a subsample of DNBC we found 25(OH)D-scores to be predicted by intake of vitamin D from diet and supplements, outdoor physical activity, tanning bed use, smoking, and month of blood draw. Although not an ideal substitute of exact levels of vitamin D status, our prediction model showed acceptable ability of ranking individuals according to which is useful in future studies of DNBC as an alternative to vitamin D biomarkers when these are not possible to obtain due to limitied sample volumes and costs of biomarker analyses.