Smoking trajectories and risk of stroke until age of 50 years – The Northern Finland Birth Cohort 1966

Background Smoking is a well-known risk factor for stroke. However, the relationship between smoking trajectories during the life course and stroke is not known. Aims We aimed to study the association of smoking trajectories and smoked pack-years with risk of ischemic and haemorrhagic strokes in a population-based birth cohort followed up to 50 years of age. Methods Within the Northern Finland Birth Cohort 1966, 11,999 persons were followed from antenatal period to age 50 years. The smoking behaviour was assessed with postal questionnaires at ages 14, 31 and 46 years. Stroke diagnoses were collected from nationwide registers using unique study number linkage. The associations between smoking behaviour and stroke risk were estimated using Cox regression models. Results Six different patterns in smoking habits throughout the life course were found in trajectory modelling. During 542,140 person-years of follow-up, 352 (2.9%) persons had a stroke. Continuous smoking during the life course was associated with increased stroke risk (HR = 1.69; 95% CI 1.10–2.60) after adjusting for sex, educational level, family history of strokes, leisure-time physical activity, body mass index, alcohol consumption, hypertension, hypercholesterolemia, and diabetes. Per every smoked pack-year the stroke risk increased 1.04-fold (95% CI 1.03–1.06). Other smoking trajectories were not significantly associated with stroke risk, nor were starting or ending age of smoking. Conclusion Accumulation of smoking history is associated with increased risk of stroke until age of 50 years. The increased stroke risk does not depend on the age at which smoking started. Given that the majority starts smoking at young age, primary prevention of strokes should focus on adolescent smoking.


Introduction
Stroke is the second largest cause of death and third largest cause of disability, accounting for 10% of the disease burden in the world [1]. While the overall incidence of stroke is declining in developed countries, the incidence of stroke among people under 50 years of age, is increasing [1][2][3][4]. Of all strokes, about 80% are ischemic and 20% haemorrhagic. However, the proportion of haemorrhagic strokes, i.e. intracerebral haemorrhage (ICH) and subarachnoid haemorrhage (SAH), is higher among people under 50 years of age [5] About a quarter of ischaemic strokes and half of haemorrhagic strokes occur in people younger than 65 years of age [6]. Young stroke patients usually have better outcome after stroke than older patients [7,8]. Nonetheless, the risk for recurrent strokes and other cardiovascular events remain high among young stroke patients even for decades after the first stroke [9,10].
Smoking is a well-known risk factor for stroke both under and over 50 years of age [11,12], causing about one fifth of the total stroke burden [13]. The stroke risk is especially associated with current smoking, with a clear decline in stroke risk among individuals who quit smoking [14,15]. Previous studies have also shown a dose-dependent association between smoking and risk of stroke [16,17]. However, it is not known to what extent the association is different depending on the age at which smoking started or ended. Previous studies concerning adolescence smoking and stroke risk have assumed the smoking behavior to remain stable during the life course [18,19]. Nonetheless, it has been shown that after adolescents initiate smoking, they have diverse smoking trajectories [20][21][22][23].
Previous studies on smoking and risk of stroke have been collected retrospectively or had a follow-up period less than 50 years, and therefore have not been able to study the association between smoking trajectories and stroke risk throughout the life course [24][25][26][27]. The risk estimates for stroke may be dependent on the length of follow-up and age when smoking was measured. In a previous study of different risk factors of stroke, the effect of smoking on stroke risk depended on the follow-up time [28]. Previous studies suggest that the effect of smoking on stroke risk seems to be greater at younger age [29]. However, previous studies have included only adult subjects, and therefore, have not been able to study the effect of smoking on stroke risk starting from adolescence. For stroke prevention, it is important to investigate the life course smoking trajectories and stroke risk especially at young age.

Aims
We aimed to examine the relationship between 1) smoking trajectories, 2) smoked pack-years, and 3) starting and ending age of smoking, and the risk of ischemic or non-traumatic haemorrhagic stroke in a prospective population-based birth cohort study in Finland. A group-based trajectory modelling technique was used to investigate the smoking trajectories from 5 to 46 years of age, capturing the entire developmental course of smoking on the stroke risk at young age.

Design and population of the study
The study population is the Northern Finland Birth Cohort 1966 (NFBC1966), an unselected population-based birth cohort containing data on 12,058 babies born alive in the Finnish provinces of Oulu and Lapland with an expected date of birth in 1966. Permission to gather data was obtained from the Ministry of Social Affairs and Health, and the study was approved by the Ethical Committee of Northern Ostrobothnia Hospital District in Oulu, Finland. Data protection was scrutinized by the Privacy Protection Agency of Finland. Informed consent was inquired from all the participants. Subjects who declined use of their data (n = 59) were excluded from the study. Reasons why participants declined use of their data were unknown.
The sample for the current study included 11,999 subjects who were followed from birth date to their first stroke, death, moving permanently abroad, or to 31 st of December 2015. The mean follow-up time per participant was 45.2 years (standard deviation (SD) 12.0 years), and the median follow-up time 49.4 years. The 17 persons who moved abroad and whose moving date was not known were assumed to be followed until end of 2015.
Data collection of NFBC1966 started in the year 1965 when the mothers were pregnant. Data on the individuals born into this cohort, and on their parents, were collected since the 24th gestational week. At the age of 14 years the first follow-up was conducted by a postal questionnaire concerning cohort members' growth, health, living habits, school performance and family situation (n = 11,010; 93.6%). At the age of 31 years a postal questionnaire (n = 8,767; 77.4%) and a clinical health examination (n = 6,033; 71.3%) were conducted for cohort members to study their health, physical performance capacity, occupation and working history, use of public health services, and living habits. The same factors were assessed also at age 46 years with a postal questionnaire (n = 6,868; 66.5%) and a clinical examination (n = 5,861; 56.7%). Reasons for non-participation later in the study were unknown. Existing nationwide registers, i.e. Care Register for Health Care, Causes of Death Register, and register of medication reimbursement, were linked to questionnaire and clinical examination data with personal identification numbers. The linkage was fully complete for stroke diagnoses.

Smoking
Smoking was the studied exposure in this study. Smoking habits of subjects were asked in questionnaires at ages 14, 31, and 46 years. At age 14 the smoking habit was asked with options 'I have never tried', 'I tried once', 'I have tried twice or more', 'I smoke occasionally', 'I smoke about twice a week', 'I smoke 1-5 cigarettes daily', 'I smoke 6-10 cigarettes daily', and 'I smoke more than 10 cigarettes daily'. Persons smoking twice per week or more were categorized to be smoking at that age. At age 31 and 46, smoking status assessment was based on standards of the Finland Cardiovascular Risk (FINRISK) study and its four smoking-related questions ('Have you ever smoked?', 'Have you ever smoked regularly, almost daily for at least a year?', 'Do you smoke now?' and 'When was the last time you smoked?'). Those who answered 'yes' to a question 'Do you smoke now?' were considered to be smoking at that age. The starting and ending ages of smoking were asked both at age 31 and 46. The 31 years follow-up assessment was selected as the primary source for data on smoking starting age and the 46 years follow-up assessment was used as the primary source for data on smoking ending age to reduce recall bias. The binary smoking status of each year in persons' life, except for ages 14, 31, and 46, was calculated from the smoking starting and ending ages. This information was used in the trajectory model together with binary smoking status for ages 14, 31, and 46 which were collected from questionnaires.
Smoked pack-years by age 31, between ages 31 and 46, and by age 46 were calculated from questionnaires at ages 31 and 46. Calculation of pack-years was based on information on smoking starting and ending ages and the question 'How much per day do you usually smoke now or smoked before you gave up smoking?'. Number of years each subject had smoked were calculated by subtracting starting age from ending age.

Strokes
Stroke was the outcome variable of this study. Strokes were identified from national Care Register for Health Care or Causes of Death Register and classified by primary diagnosis (Table 1). Ischemic stroke and transient ischemic attack (TIA) were considered as 'ischemic strokes' and SAH and ICH as 'haemorrhagic strokes'. For analyses of any stroke type, ischemic strokes, haemorrhagic strokes, and other cerebrovascular diseases were combined. Traumatic SAH, traumatic ICH, epidural hematoma or subdural hematoma were not included in the series. The diagnostic coding has been based on the WHO international classification of diseases (ICD) in Finland since 1967 [30]. Subjects having two or more stroke diagnoses were classified by primary diagnosis.
Subjects were also asked in the 46-years follow-up questionnaire if they had a stroke diagnosed by a physician. Self-reported strokes were classified as other cerebrovascular diseases if the subject did not have diagnoses in registers (n = 13). The age of stroke onset was not known for the subjects with self-reported stroke who were not present in registers.

Covariates
Information on covariates was collected from follow-up questionnaires, clinical examinations, and national registers at age of 46 years. Educational level was classified into basic (� 9 years; comprehensive school), secondary (9-12 years; upper secondary school or vocational school) and tertiary (> 12 years; university or university of applied sciences) education by the highest self-reported education achieved in the questionnaire. The family history of strokes among 1 st degree relatives was also self-reported in the questionnaire. Leisure-time physical activity was measured as how many hours exercise the subject had per month according to questions 'How often do you exercise on your leisure-time a) with low intensity and b) with high intensity' and 'What is the duration of each exercise a) with low intensity and b) with high intensity'. Both low and high intensity exercises were considered equally. Weight and height were measured during the clinical examination and asked with the postal questionnaire. BMI (kg/m 2 ) was calculated from weight and height using the measured values as primary source. Daily mean alcohol consumption (g/day) was calculated from self-reported questionnaire data. Hypertension was defined as having a mean of measured systolic blood pressure � 140 mmHg, diastolic blood pressure � 90 mmHg, self-reported diagnosis of hypertension, or using antihypertensive medication (ATC codes C02 antihypertensives, C03 diuretics, C07 beta blocking agents, C08 calcium channel blockers, and C09 agents acting on the renin-angiotensin system) according to national register of medication reimbursement. The presence of hypercholesterolemia was noted in case of triglyceride level > 2.0 mmol/l, LDL-cholesterol > 3.0 mmol/l, HDL cholesterol < 1.0 mmol/l, or in case of current lipid-lowering therapy (ATC code C10 lipid modifying agents) in register. Diabetes mellitus was diagnosed in the presence of fasting blood glucose level � 7.0 mmol/l, blood glucose level of � 11.1 mmol/l after two hours 75g oral glucose tolerance test, HbA 1C � 48 mmol/mol (6.5%), self-reported type 1 or type 2 diabetes, or by the use of antidiabetic therapy (ATC code A10 drugs used in diabetes) in register.

Statistical analyses
Multiple imputation was used to impute missing data for independent variables of planned analyses. It included independent variables (smoking status and smoked pack-years), covariates, outcome variables and other variables used only as predictors for multiple imputation. The outcome variables for stroke diagnoses were complete for all 11,999 subjects and were not imputed. All 35 variables included in the multiple imputation procedure and rates of missing data for each variable are listed in S1 Table. Data were missing both due to loss of follow-up of subjects and due to missing measurements of available subjects. The overall amount of incomplete data was 27.5%, and therefore, multiple imputation was conducted 30 times. Data were assumed to be missing at random. Model for scale variables was linear regression and for nominal variables logistic regression. The pooled results were reported in the analyses. IBM SPSS Statistics 24 were used for multiple imputation.
To reveal latent trajectories in the smoking data, SAS version 9.4 (SAS Institute Inc., Cary, NC, USA) and the PROC TRAJ latent class growth modelling (LCGM) macro [31,32] were used. LCGM is semi-parametric modelling approach which aims to detect classes of individuals which share a similar pattern of change (i.e. trajectory) over time [33]. Information on smoking status for each age between 5 and 46 years were used in trajectory modelling. Due to binary data (smoking vs. non-smoking), we used the logit-based (LOGIT) model in the PROC TRAJ. Models with one to seven trajectory classes were tested, and the selection of six trajectory classes as the most suitable model was based on the following measures of model adequacy which are shown for each tested model in Results Table 2: 1) Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC), where lower absolute values indicate better fit of data; 2) the Bayes Factor (B 10 ) and the log form of the Bayes Factor (2log e (B 10 ) � 2(ΔBIC)), where ΔBIC is the BIC of the alternative (i.e. more complex) model less the BIC of the null (i.e. less complex) model; 2log e (B 10 ) is interpreted as the degree of evidence favoring the alternative model (> 6 indicates strong evidence against the null model); 3) posterior membership probabilities, where class averages of > 0.70 are considered acceptable; and 4) absolute and relative class sizes, also taking into consideration the subsequent analyses [31][32][33][34]. After the best model was selected, each subject was assigned to the trajectory class with the highest posterior membership probability [32,33]. The reference group with lowest smoking prevalence was named as 'non-smokers', even though some individuals in that smoking trajectory class had smoked during their life.
Difference in mean age of onset between ischemic and haemorrhagic strokes was studied with independent samples t-test. The incidences of strokes according to smoking trajectory classes were calculated. The associations (hazard ratios (HR) with 95% confidence intervals (95% CI)) between smoking trajectory classes, smoked pack-years, starting age of smoking and ending age of smoking with stroke risk were estimated using Cox regression models. The analyses of smoking trajectory classes and of smoked pack-years and stroke risk were adjusted for sex, educational level, family history of strokes, leisure-time physical activity, BMI, alcohol consumption, and presence of hypertension, hypercholesterolemia, and diabetes. Starting age and ending age of smoking were studied only among ever-smokers (n = 8941) and the analyses were adjusted for smoked pack-years by age of 46 years to detect possible sensitive time period of smoking. The follow-up time in all Cox regression analyses started from birth. Subjects were censored at date of stroke, date of death, or end of study (31 st of December 2015). IBM SPSS Statistics 24 were used for these statistical analyses.

Results
The summed amount of years cohort members were followed up from their birth date until the end of follow-up period was in total 542,140 person-years. Model fit parameters of trajectory models with 1-7 classes are presented in Table 2. The seven-class model failed to converge. The six-class model provided the most appropriate interpretation of the data showing better fit than models with 1-5 classes and having sufficient number of participants in each class. In the six-class model, the first smoking trajectory class were 'non-smokers' (n = 3223, 26.9% of 11,999) and were used as the reference group in the analyses. Class 2 included 'quitters in their twenties' (mean smoking age 16-26 years, n = 2034, 17.0%), class 3 included 'quitters in their early thirties' (mean smoking age 16-31 years, n = 3356, 28.0%), class 4 included 'quitters in their middle thirties' (mean smoking age 17-34 years, n = 1128, 9.4%), class 5 included 'quitters in their forties' (mean smoking age 18-43 years, n = 961, 8.0%), and class 6 included 'continuing smokers' (n = 1297, 10.8%). The starting ages of smoking were similar in all trajectory classes, and the trajectory classification was based mainly on the ending ages of smoking (Fig 1).
In total 8941 (74.5%) cohort members had smoked at some point in their lives. Of them, 779 were considered as non-smokers in the smoking trajectory model, resulting in 8162 persons (68.0%) who had smoked regularly. Mean starting age of smoking was 16.5 years (standard error 0.21 years). The presence of stroke risk factors, means of smoked pack-years by age 31 and 46, means of smoked years, and means of starting and ending age of smoking in each trajectory class are shown in the Table 3. Logically, subjects with longer smoking history had more smoked pack-years than those with shorter history.
During the follow-up period, 352 (2.9%) persons had a stroke resulting in an incidence of 64.9/100,000 person-years ( Table 4). The incidences were similar between women (65.4/

Association between smoking habits and risk of stroke
The incidences and covariate-adjusted HR for any stroke, ischemic stroke, or haemorrhagic stroke according to smoking trajectory classes are shown in the

Discussion
In this study we found that a smoking trajectory of continuing smoking was associated with an increased risk of stroke and that a smoking trajectory of those who quit smoking in their early thirties was associated with increased risk for haemorrhagic stroke when compared to nonsmokers. Further, the number of smoked pack-years was associated with risk of stroke. We did not find associations between smoking starting age or ending age and stroke risk.
The results of this study show that accumulation of smoked pack-years might be more crucial to stroke risk than starting or ending age of smoking. This suggests that the harmful effects of smoking depend on dose and duration of smoking and are irrespective of age when smoking occurred. In this present study, the intensity of smoking, e.g. daily cigarette consumption, was not studied separately from pack-years. A previous study examining young women found that not the duration of smoking but the dose of daily smoking and smoked pack-years were strongly associated with increased risk for ischemic stroke [16]. Another previous study also showed the association between smoking and risk of stroke to be dose-dependent [17]. The findings of this study and previous studies suggest that measuring pack-years may be optimal when assessing the smoking-related stroke risk. In a previous Finnish study comparing different ways to estimate longitudinal risk factors for cardiovascular disease mortality, a model representing lifetime accumulation of smoking had better predictive ability than a model using only the most recent measured information of smoking status [35].
In this study, smoking trajectory of continuing smoking was related to stroke risk, but also to higher amount of pack-years. High amount of pack-years increased the stroke risk, and this association might be irrespective of the smoking trajectory. We did not adjust analyses of smoking trajectories and stroke risk with pack-years to avoid multicollinearity. Other  trajectories than continuing smoking represented smoking histories where participants quit smoking during follow-up. It is known that stroke risk declines when an individual quits smoking [14,15]. Previous studies investigating the associations between smoking and preconditions for stroke risk have found that harmful arterial changes among those who quitted smoking reversed into similar levels to those who never smoked [36]. In the Bogalusa Heart Study, the duration of smoking years since childhood, but not smoking at age 8 to 17, was associated with changes in arterial thickness that can be considered as a preclinical marker to ischemic stroke risk [37]. Furthermore, a recent study showed that the effect of adolescence smoking on future stroke risk might be lower than expected from previous studies, due to previous studies' failure to follow the changes in smoking habits after baseline testing [38]. The current study was able to follow the changes in smoking habits from childhood to adulthood.
In the current study we also found that smoking trajectory of those who quitted smoking in their early thirties was associated with an increased risk for haemorrhagic stroke in particular.

Smoked pack-years
This suggests that there might be sensitive period during adolescence and young adulthood when susceptibility for effects of smoking is greater than in later life and that the timing of smoking might play a role in development of haemorrhagic stroke risk. Nonetheless, it should be noted that mean age of onset of haemorrhagic strokes, in particular subarachnoid haemorrhages, is younger than that of ischemic strokes which might partly explain the finding [39][40][41][42]. Future studies are needed to investigate if early smoking associates to haemorrhagic strokes despite the smoked pack-years. This current study had some limitations. The main limitation of this study was that the age intervals of follow-up questionnaires (14,31, and 46 years) were rather long apart. Smoking status was asked only in these three time points and was estimated for the remaining age points that were used in smoking trajectory model (between 5 and 46 years) based on starting and ending age. To reduce recall bias due to gaps between follow-up questionnaires, the 31 years follow-up assessment was selected as the primary source for data on smoking starting age and the 46 years follow-up for smoking ending age.
Second, in this population most smokers started smoking around the same age. In the sixclass smoking trajectory model the mean starting ages were similar in all classes. Therefore, the trajectories mainly represent the ending age and the duration of smoking, and no comparisons between classes with same duration but different starting and ending age of smoking could be made. Some trajectory classes, e.g. quitters in their early thirties and quitters in their middle thirties, were similar to each other with respect to starting and ending age of smoking and smoked pack years. This might challenge the clinical interpretation of the results.
As another limitation, the sample sizes of the stroke groups (e.g. the group of haemorraghic strokes) were rather small, which might have underpowered this study to detect differences. Stroke at young age is a rare phenomenon [42], and despite the relatively high incidence of in particularly subarachnoid haemorrhages in Finland [39,43], and the large number of personyears in this follow-up study, only 352 people had a stroke. Furthermore, due to the limited number of stroke cases, different subtypes of ischemic or haemorrhagic strokes were not studied separately. Previous studies have shown that smoking increases especially the risk of SAH and ischemic stroke but might not increase the risk of ICH [17,[44][45][46]. Additionally, it should be noted that use of smokeless tobacco was not studied, though it might increase the stroke risk [47].
A further limitation of this study is the loss to follow-up in clinical and questionnaire surveys that assessed the smoking status and covariates. In addition to loss to follow-up of subjects, there were missing measurements of available subjects, and in total 27.5% of all original data were missing. The multiple imputation method was used in this study to complete the missing data of smoking statuses and covariates, as it reduces selection bias due to selective loss to follow-up, and also increases statistical power [48,49]. A previous study from the same birth cohort data has shown that high alcohol consumption, low educational level, unemployment, and being single at age 31 predicted lower participation at follow-up examination and questionnaires [50]. This might have affected the results if the multiple imputation was not used. However, it should be noted that the selection of variables in multiple imputation model might also affect the results [51].
The current study also had several strengths. First, it had three different approaches to study the association between smoking and stroke risk: 1) smoking trajectories, 2) smoked pack-years, and 3) starting and ending age of smoking. Second, a large and unselected population-based birth cohort was used in this study with over 500,000 person-years of follow-up. Results of this naturalistic real-world data set are highly generalizable to Finnish population. The data collection started from the second trimester of cohort members' antenatal period and follow-up lasted up to 50 years of age. The data collection was prospective reducing the potential for information bias, and questionnaire and clinical examination data were combined with comprehensive nationwide registers. Information on stroke diagnoses were collected from nationwide registers that were complete and continuous without loss of follow-up. Third, the smoking status was measured at several age-points with several characteristics of smoking and the information on stroke diagnoses from nationwide registers was complete for the whole cohort. Finally, multiple imputation was conducted to reduce selection bias and loss of statistical power due to missing data.

Conclusions
This study showed that accumulation of smoking history is associated with increased risk of stroke until age of 50 years. The increased stroke risk does not depend on the age at which smoking started. Given that the majority starts smoking at young age, primary prevention of strokes should focus on adolescent smoking.