The association between vitamin D status and COVID-19 in England: A cohort study using UK Biobank

Background Recent studies indicate that vitamin D supplementation may decrease respiratory tract infections, but the association between vitamin D and COVID-19 is still unclear. Objective To explore the association between vitamin D status and infections, hospitalisation, and mortality due to COVID-19. Methods We used UK Biobank, a nationwide cohort of 500,000 individuals aged between 40 and 69 years at recruitment between 2006 and 2010. We included people with at least one serum vitamin D test, living in England with linked primary care and inpatient records. The primary exposure was serum vitamin D status measured at recruitment, defined as deficiency at <25 nmol/L, insufficiency at 25–49 nmol/L and sufficiency at ≥ 50 nmol/L. Secondary exposures were self-reported or prescribed vitamin D supplements. The primary outcome was laboratory-confirmed or clinically diagnosed SARS-CoV-2 infections. The secondary outcomes included hospitalisation and mortality due to COVID-19. We used multivariable Cox regression models stratified by summertime months and non-summertime months, adjusting for demographic factors and underlying comorbidities. Results We included 307,512 participants (54.9% female, 55.9% over 70 years old) in our analysis. During summertime months, weak evidence existed that the vitamin D deficiency group had a lower hazard of being diagnosed with COVID-19 (hazard ratio [HR] = 0.86, 95% confidence interval [CI] = 0.77–0.95). During non-summertime, the vitamin D deficiency group had a higher hazard of COVID-19 compared with the vitamin D sufficient group (HR = 1.14, 95% CI = 1.01–1.30). No evidence was found that vitamin D deficiency or insufficiency was associated with either hospitalisation or mortality due to COVID-19 in any time strata. Conclusion We found no evidence of an association between historical vitamin D status and hospitalisation or mortality due to COVID-19, along with inconsistent results for any association between vitamin D and diagnosis of COVID-19. However, studies using more recent vitamin D measurements and systematic COVID-19 testing are needed.


Introduction
The COVID-19 global pandemic is one of the biggest public health crises in recent history. The rapid spread of SARS-CoV-2 infection has caused serious casualties, overwhelming healthcare systems and disrupting societies. In the UK, more than 170,000 deaths due to COVID-19 within 28 days of a positive test were reported in the first year [1], planned surgeries and care have been delayed or cancelled [2], and prolonged lockdown measures along with the pandemic have worsened mental health [3]. Despite the introduction and distribution of COVID-19 vaccines by the end of 2020, controlling this pandemic at a global scale remains extremely difficult. Studying the aetiology of SARS-CoV-2 is important to inform effective prevention strategies in public health.
Vitamin D is essential to bone health for its ability in regulating calcium and phosphate homeostasis, and recent studies indicate it may have some immunomodulatory effects. At the cellular level, vitamin D can increase the production of antimicrobial peptides [4,5] and regulates adaptive immunity response [6]. Clinically, a systematic review of observational studies indicated that vitamin D deficiency might be associated with longer duration of acute respiratory tract infection [7]. Another systematic review and meta-analysis including data from 37 original trials showed that vitamin D supplementation may protect against respiratory tract infections (pooled odds ratio = 0.92, 95% CI = 0.86-0.99) [8]. Because of its potential for preventing respiratory infections, vitamin D supplementation and fortification of food have been discussed as possible cheap public health interventions against COVID-19 [9].
Despite this potential, the association between vitamin D and COVID-19 is still unclear. If vitamin D deficiency is associated with COVID-19, vitamin D supplementation may be a potential public health intervention. Consequently, we aimed to conduct a historical cohort study using UK Biobank dataset and linked electronic health records, to better understand the association between serum vitamin D status, vitamin D supplementation and diagnosis of COVID-19 and outcomes.

Study population and eligibility
The study population was from UK Biobank, a nationwide cohort established between 2006 and 2010 [10]. In brief, participants aged 40 to 69 were recruited to 22 assessment centres around the UK. Their demographic information was collected through a touch screen questionnaire, and they received serum biochemical tests including vitamin D analysis. UK Biobank participants also gave their consent to have electronic health records linked, including primary care and inpatient care records, and death certificates [10,11]. The primary care data open license (https://github.com/liang-yu12/vd_ covid). Full pseudonymized participant data cannot be openly shared under the material transfer agreement with UK Biobank and ethics approval. Other researchers can apply for UK Biobank data to answer specific research questions. Further information about applying for data access can be obtained from the UK Biobank website (https:// www.ukbiobank.ac.uk) or by emailing UK Biobank (ukbiobank@ukbiobank.ac.uk).
were provided by data system suppliers TPP and EMIS in England, and the inpatient care records and death records were provided by NHS digital. The external data providers extracted the health records by matching participant identifiers, including unique participant identifiers, NHS number, date of birth, sex and postcode. These health records were further processed and checked by UK Biobank before importing into the database [12]. We only included participants in England who had at least one serum vitamin D test, primary care registration records and inpatient care records. Those who lacked serum vitamin D test records, were not registered in England, did not have both inpatient and primary care registration records, were lost to follow-up, or died before 16 March 2020 were excluded. The distribution of demographic factors of the included and excluded participants were compared. The design of the cohort and inclusion and exclusion criteria are depicted in Fig 1.

Primary exposure: Vitamin D status
The primary exposure was serum vitamin D status. The measurement of vitamin D levels in UK Biobank has been described previously [13]. In brief, serum vitamin D levels were measured when a participant visited a UK Biobank assessment centre, where their blood samples were collected and stored at -80˚C. Serum 25-hydroxyvitamin D status was measured using chemiluminescence immunoassay (DiaSorin Ltd. LIASON XL, Italy) in a centralised laboratory [14]. The testing process has been verified by quality control samples and through an external quality assurance scheme [15,16]. Currently, no global consensus exists for determining vitamin D deficiency. We defined serum vitamin D status using Public Health England's definition (deficiency: <25 nmol/L; insufficiency: 25-50 nmol/L; sufficiency: > = 50 nmol/L) [17]. Participants who had their serum vitamin D levels tested between April and October were labelled as 'during summertime months,' and those who were had been tested between November and March were assigned as 'during non-summertime months.'

Secondary exposure: Vitamin D supplementation and vitamin D prescription
The secondary exposures for this study were 1. taking vitamin D supplementation, or 2. receiving a vitamin D prescription from a GP. Information about vitamin D and other mineral supplementations was collected through a self-reported questionnaire using touch panels at the assessment centre between 2006 and 2010. We defined vitamin D supplementation as people who were taking vitamin D and associated minerals, including vitamin D, multivitamins, fish oil and calcium supplementation. Information about vitamin D supplementation was coded as 'taking vitamin D supplement' and 'not taking vitamin D supplement,' and it was coded as missing if a participant did not respond to the questionnaire.
Vitamin D prescriptions included all medications listed in British National Formula section 9.6.4, and we further compiled a prescription code list in Dictionary of Medicines and Devices (DM+D) using an existing mapping tool published by the NHS [18]. By using the DM+D code list, we identified participants who had ever received vitamin D prescriptions from the primary care prescription datasets. Vitamin D prescription was coded as 'had vitamin D prescriptions' and 'not receiving prescriptions.'

Primary outcome: Diagnosis of COVID-19
The primary outcome of our study was the diagnosis of COVID-19, which was defined through laboratory testing or by clinical diagnosis of COVID-19. The laboratory tests for SARS-CoV-2 infection were performed using PCR, which was performed by the NHS (Pillar 1) or commercial partners (Pillar 2) [19,20]. These testing results were reported to Public Health England and automatically imported into UK Biobank weekly [21]. Clinically diagnosed COVID-19 was defined as participants having diagnosis of COVID-19 codes in their electronic health records, either in primary care or inpatient care, or on the death certificate. We used existing code lists in CTV3 codes, SNOMED-CT and ICD-10 to identify the diagnosis of COVID-19.

Secondary outcome ascertainment: Hospitalisation and mortality due to COVID-19
Hospitalisation due to COVID-19 was defined as COVID-19 related diagnosis (ICD-10 codes U071 or U072) recorded in the inpatient care dataset, and the admission date of each record was extracted. Mortality due to COVID-19 was defined as a participant having a COVID-19 diagnosis (ICD-10 codes U071 or U072) in the death registry data and being diagnosed as COVID-19 within 28 days, and the date of death was also recorded.

Measurement of covariates
We included basic demographic factors associated with vitamin D deficiency or insufficiency in our model, described in our previous paper [13]. Demographic variables recorded between 2006 and 2010, such as sex, age, ethnicity, body mass index (BMI), alcohol drinking frequency, cigarette smoking, index of multiple deprivations (IMD), the time receiving serum vitamin D tests and the region of the UK Biobank assessment centre, were included in our analysis. The current age at the start of the pandemic was calculated from participants' year of birth, which was coded as 'under 70 years old' and 'greater than and equal to 70 years old.' Other continuous covariates were further grouped into categorical variables. Self-reported ethnicity was classified as 'white,' 'black,' and 'Asian and others' according to the original questionnaire. BMI was grouped following National Institute for Health and Care Excellence guidelines for different sexes and ethnicities [22]. IMD scores were classified by five quintiles, and the quintile with the highest scores was assigned as 'most deprived. ' We categorised the location of 22 UK Biobank assessment centres by the regions of England. Smoking statuses were coded as 'nonsmoker', 'ex-smoker', or 'current-smoker' according to the original questionnaire. Regarding drinking frequency, participants were recoded as 'weekly' if participants reported drinking three or four times a week, and monthly if drinking one to three times a month was reported. Participants reported with 'prefer not to say' were labelled as missing value.
In addition, we included clinical covariates such as clinically extreme vulnerability and underlying chronic diseases. Participants who were clinically extremely vulnerable to COVID-19 were defined by Public Health England [23]. Underlying chronic diseases included hypertension, cardiovascular diseases, diabetes mellitus and asthma. Clinical covariates were assessed as a history of ever having one of the medical conditions of interest recorded in linked primary or secondary care records from the start of GP registration or HES recording until 16 th March 2020. For health conditions such as chemoradiotherapy, blood cancer and bone marrow transplantation, we only included people who had a recent history in less than six months before the index date.

Statistical analysis
The follow-up time of our study began on 15 th March 2020. Because the availability of clinical datasets varied, the end of follow-up was defined differently for each outcome. For the primary outcome, SARS-CoV-2 infections, the event dates were the dates of diagnosis of COVID-19, and the censoring dates were the date of death or 18 th January 2021. For hospitalisation due to COVID-19, the event dates were the dates of admission, and the censoring dates were the date of death or 30 th November 2020. For mortality due to COVID-19, the event dates were the dates of death due to COVID-19, and the censoring dates were the dates of death due to other causes or 18 th December 2020. In addition, among all UK Biobank participants with vitamin D testing data, we analysed the association between testing for vitamin D during the summertime months and vitamin D status using logistic regression.
The proportional hazard assumption was examined by using log(-log were violated. Therefore, we used stratified Cox regression to assess the association between vitamin D exposure and COVID-19 outcomes. The follow-up time of our models was stratified at 25 th October 2020, which was the end date of the British summertime months. We carried out a crude analysis, then generated a partially adjusted model controlling sex and age, as well as a full model adjusting for all covariates. All statistical analysis was performed by using R Statistical Software (version 4.0.3, R Foundation for Statistical Computing, Vienna, Austria).

Sensitivity analysis and model checking
The sensitivity analysis and justifications are summarised in Table 1. We repeated our analysis while changing the outcome to laboratory-confirmed SARS-CoV-2 infections, and we redid the analysis for hospitalisation and mortality among the subgroup with COVID-19 diagnosis. In addition, the association between receiving vitamin D tests during British summertime months and vitamin D status was analysed using logistic regression adjusting for covariates among all participants with at least one vitamin D level test.

Ethics
The UK Biobank data had been de-identified by removing personal data fields, such as postcodes and dates of birth [24]. The de-identified and anonymous data were released to eligible researchers for study purposes only. UK Biobank already has its Research Tissue Bank (RTB) approval from its Research Ethics Committee (REC), which covers most usage of the data under the UK Biobank ethics and governance framework [25]. The UK Biobank project was also approved by the Northwest Haydock Research Ethics Committee (reference: 11/NW/ 0382). Our project was approved by UK Biobank (ID:51265) and by the Research Ethics Committee of the London School of Hygiene and Tropical Medicine (reference: 17158). For COVID-19 data, an approved UK Biobank project will be automatically authorised to conduct COVID-19 related research after registering to access COVID-19 data [26]. We followed the principles of the Declaration of Helsinki [27].

Study population
The selection of eligible participants is shown in Fig 2. After excluding ineligible people, a total of 307,512 participants were included in our analysis. The comparison of included and excluded participants is summarised in S1 Table. The distribution of sex, age, ethnicity, BMI, drinking behaviour, smoking, IMD, region and taking vitamin D supplementation was similar between included and excluded participants. More participants were clinically extremely vulnerable to COVID-19 or had underlying comorbidities among included participants with their electronic health records linked than those who were not eligible for inclusion.
Among eligible participants, more people were vitamin D sufficient (142,947; 46%) or insufficient (126,802; 41%) compared with vitamin D deficient participants (37,763, 12%). 65% received their vitamin D levels checked during British summertime months, while 35% were measured in non-summertime months. The distribution of demographic factors by vitamin D status is summarised in Table 2. The distribution of sex, taking vitamin D supplementation, region of residency, clinical vulnerability to COVID-19 and underlying chronic comorbidities was similar across different vitamin D groups. Compared to participants with insufficient or sufficient vitamin D levels, the vitamin D deficiency group had more participants who were under 70 years, non-white, obese and more deprived. Furthermore, the

PLOS ONE
Vitamin D and COVID-19 in England proportions of alcohol drinking and taking vitamin D supplements were lower among the vitamin D deficiency group.

Description of outcomes
The distribution of diagnosis of COVID-19, hospitalisation, and mortality due to COVID-19 over time is summarised in S1 Fig. As can be seen in S1a Fig, among 10,165 participants with SARS-CoV-2 infection, more participants were diagnosed with COVID-19 in spring (13.8%), autumn (51.4%), and winter (31%), while fewer cases were reported in summer (3.8%).
Despite the shorter follow-up period, similar distributions were also noted for hospitalisation (S1b Fig) and mortality (S1c Fig). In the larger cohort containing all participants with vitamin D records, we found that participants visiting the UK Biobank assessment centre during British summertime months had around 60% lower odds of vitamin D deficiency or insufficiency than those receiving tests during non-summertime months (S2 Table).  3 . IMD scores were classified by quintile. 4 . Vitamin D supplement includes vitamin D, multivitamin, fish oil and calcium supplementation. 5 . Vitamin D prescription included all drugs in BNF section 9.6.4, which were identified by using code lists in DM+D codes from linked GP prescription records. 6 . Health conditions were identified from linked electronic health records. 7 . The clinically extremely vulnerable groups were defined by using Public Health England's definition. 8 . Including hypertension, cardiovascular diseases, diabetes mellitus, or asthma.

Association between vitamin D status and diagnosis of COVID-19
https://doi.org/10.1371/journal.pone.0269064.t002  Table 4 summarises the association between serum vitamin D status and hospitalisation due to COVID-19 stratified by summertime months. In the crude and partially adjusted models, in British summertime months, vitamin D insufficiency or deficiency was associated with a

PLOS ONE
Vitamin D and COVID-19 in England higher hazard of hospitalisation due to COVID-19, while in non-summertime, such an association was not seen. We found either during or after British summertime months, after adjusting for covariates and compared with people with sufficient vitamin D status, no evidence existed that vitamin D insufficiency or deficiency was associated with a higher hazard of hospital admission due to COVID-19 (during British summertime months: insufficiency adjusted HR = 0.94, CI = 0.82-1.08, deficiency adjusted HR = 1.08, CI = 0.89-1.31; during non-summertime: insufficiency adjusted HR = 1.11, CI = 0.83-1.49, deficiency adjusted HR = 0.92, CI = 0.61-1.37). Other covariates such as male sex, age older than 70 years, non-white ethnicity, overweight or obesity, cigarette smoking, and being more deprived, clinically vulnerable or having underlying comorbidities increased the hazard of hospitalisation due to COVID-19. Compared with participants who never drink alcohol, more frequent alcohol drinking was associated with a decreased hazard of hospitalisation (Table 4).  ). In addition, male sex, age over 70 years, black ethnicity, underweight and obesity, cigarette smoking, being most deprived, clinical vulnerability and having underlying comorbidities were associated with an increased hazard of COVID-19 mortality. Frequent alcohol drinking was associated with a decreased hazard of COVID-19 mortality (Table 5).

Association between vitamin D prescription or supplementation and COVID-19
Some evidence existed that during summertime months, people who had been ever prescribed vitamin D supplementation from a GP had a higher hazard of being diagnosed with COVID-19 (S3 Table, adjusted HR = 1.22, CI = 1.13-1.32), hospitalisation (S4 Table, adjusted HR = 1.59, CI = 1.39-1.82) and mortality (S5 Table, adjusted HR = 2.31, CI = 1.68-3.17). During British summertime months, no evidence showed self-reported vitamin D supplementation was associated with a lower hazard of diagnosis of COVID-19 (adjusted HR = 0.88, CI = 0.76-1.01), while the hazard was higher during non-summertime (S6 Table, adjusted HR = 1.23, CI = 1.03-1.47). No evidence was found that self-reported vitamin D supplementation was associated with hospitalisation (S7 Table) or mortality (S8 Table) due to COVID-19 either during or after summertime months.

Sensitivity analysis
The repeated analysis of vitamin D status and laboratory-confirmed COVID-19 was similar to the original model (S9 Table). Similarly, the subgroup analysis of hospitalisation and mortality among patients with diagnosis of COVID-19 showed that there was no evidence that vitamin D status was associated with the hazard of COVID-19 hospitalisation or mortality (S10 and S11 Tables).

Discussion
In this large cohort study, we found no consistent evidence that historical vitamin D status was associated with COVID-19. No evidence showed that historical evidence of vitamin D deficiency or insufficiency was associated with hospitalisation or mortality due to COVID-19. During British summertime months, weak evidence existed that vitamin D deficiency was associated with a lower hazard of being diagnosed with COVID-19, while during non-summertime, the association was reversed. In the secondary analysis, during summertime months, people who ever received vitamin D prescription had a higher hazard of having diagnosis of COVID-19, hospitalisation, and mortality due to COVID-19. No association was found between self-reported vitamin D supplementation and hospitalisation or mortality due to COVID-19. Our study has some strengths. First, compared to previous studies using UK Biobank datasets early in the pandemic, the follow-up period of our study was longer, and therefore we were able to cover more than one wave of COVID-19 infections [28][29][30][31]. Second, our analysis adjusted for more clinical covariates using the latest electronic health records, allowing us to estimate the effect of vitamin D status more accurately. Third, despite the variation of COVID-19 testing strategies, the clinical outcomes of hospitalisation and mortality were collected in a systematic way, which minimised the misclassification bias of these outcomes. The large sample size of our study also provides more statistical power than previous studies using single-hospital records. Finally, our analysis showed that some known factors were also associated with COVID-19 hospitalisation and mortality, including male sex, older age, non-white ethnicity, abnormal BMI, cigarette smoking, being more deprived, being clinically vulnerable and having underlying comorbidities. These findings were similar to previous studies using a large electronic health records database [32], implying that our analysis regarding hospitalisation and mortality is valid. Nevertheless, the study has some limitations. First, the data regarding historical vitamin D status and vitamin D supplementation were collected between 2006 and 2013. The distribution of vitamin D status and supplementation behaviour may be very different now. A previous study among postmenopausal women repeatedly measured vitamin D levels after five years, and the results showed the intraclass correlation coefficient between the two results was only 0.59 (0.54-0.64), which was suboptimal [33]. Another study examined 15,473 people with repeated vitamin D level tests in UK Biobank data, which showed an 84% concordance rate after 4.3 years [30]. However, in our study, the vitamin D levels were measured seven to 15 years ago. The misclassification is likely to be non-differential, which could attenuate our estimates toward the null. In addition, information about self-reported vitamin D supplementation was only available for 54% of participants, which further reduced the statistical power of our analysis and may result in misclassification. Future studies should consider using more recent data about vitamin D status and more complete vitamin D supplementation information. Second, the diagnosis of COVID-19 was influenced by testing strategies, which is likely to have led to outcome misclassification. At the early stage of the pandemic in the UK, the testing capacity was limited to people who required inpatient care. Therefore, only participants with relatively severe symptoms were tested, and people who were asymptomatic or had mild symptoms had to stay at home instead of seeking medical care [34]. As the COVID-19 testing capacity increased, more people with mild or no symptoms were able to access testing and classified as cases. Since the COVID-19 testing was not systematic, the outcome of the diagnosis of COVID-19 was misclassified. For future studies, the COVID-19 outcomes should be ascertained systematically.
Third, despite large number of our population, the external validity of UK Biobank is limited. The participants of UK Biobank are not nationally representative, and they are wealthier, older, and more likely to be white and women, which may introduce healthy volunteer bias [35]. However, in our model, we adjusted for demographic covariates, and we also included IMD scores as a proxy for socioeconomic status. Our results regarding exposure and outcomes remain internally valid.
Previous small, single-hospital studies have shown an association between pre-hospitalised vitamin D levels and mortality [36,37], while other hospital-based studies enrolling more participants have indicated no evidence of such an association [38][39][40]. We found no evidence that historical vitamin D status was associated with inpatient admission or mortality due to COVID-19, which was similar to another study using UK Biobank data with a shorter followup period [30], while we adjusted for more clinical covariates and had a longer follow-up time. However, because the information on vitamin D status from UK Biobank was mainly collected between 2006 and 2010, this may not reflect participants' current vitamin D status. This finding of no association may be biased by misclassification of vitamin D exposure, so results should be interpreted cautiously.
Our study showed inconsistent associations between vitamin D deficiency and diagnosis of COVID-19 during the different follow-up times. During the summertime months, vitamin D deficiency was negatively associated with having a diagnosis of COVID-19 after adjusting for covariates. This result is similar to previous small studies performed early in the pandemic [41,42] and consistent with other studies using UK Biobank data [28][29][30][31]. A possible explanation is that people in the northern hemisphere are less likely to be vitamin D deficient during this period of the year, and during this time, people normally spend less time indoors, which also decreases the risk of being infected with SARS-CoV-2. However, the heterogeneity among the follow-up periods influences the association between vitamin D status and the diagnosis of COVID-19. During British summertime and non-summertime, there were two waves of COVID-19 pandemic caused by different circulating strains [43], and the government's response to COVID-19 also varied. In addition, as previously discussed, the potential misclassification of vitamin D exposure and COVID-19 outcome could have introduced biases, which may lead to inaccurate estimation in the association between vitamin D and COVID-19.
Our study showed no evidence that vitamin D prescription or supplementation was associated with COVID-19 admission or mortality. Previously, a single-hospital study also showed no association between vitamin D supplementation and COVID-19 admission or mortality [38], and a recent meta-analysis also indicated that vitamin D supplements were not associated with COVID-19 mortality reduction [44]. However, the information about vitamin D supplementation of UK Biobank was collected at least 10 years ago, which may not accurately reflect current vitamin D intake. Furthermore, despite adjusting for various clinical covariates, we still cannot exclude probable residual confounding effects such as confounding by indication. These results should be interpreted carefully in light of these likely biases.

Conclusion
Our study shows inconsistent associations between vitamin D status and diagnosis of COVID-19, as well as no association between historical vitamin D status and the risk of hospital admission and mortality due to COVID-19. However, these results were limited by the potential for misclassification bias caused by the historical vitamin D status and the changing COVID-19 testing strategies. To precisely investigate the possible role of vitamin D in COVID-19 prevention, more studies using recent vitamin D status data and systematic COVID-19 surveillance will be needed. In addition, the results of an ongoing trial may provide more compelling evidence on the effects of vitamin D supplementation on preventing COVID- 19 [45]. Because of the uncertainty of the association between vitamin D deficiency and the risk of COVID-19, there is currently insufficient evidence to support prioritizing vitamin D supplementation or fortification over other preventive strategies for COVID-19, such as mass vaccination programmes.
Supporting information S1

45.
Trial of vitamin D to reduce risk and severity of COVID-19 and other acute respiratory infections. Identifier: NCT04579640. 2020.