Outcome after stroke attributable to baseline factors—The PROSpective Cohort with Incident Stroke (PROSCIS)

Background The impact of risk factors on poor outcome after ischemic stroke is well known, but estimating the amount of poor outcome attributable to single factors is challenging in presence of multimorbidity. We aim to compare population attributable risk estimates obtained from different statistical approaches regarding their consistency. We use a real-life data set from the PROSCIS study to identify predictors for mortality and functional impairment one year after first-ever ischemic stroke and quantify their contribution to poor outcome using population attributable risks. Methods The PROSpective Cohort with Incident Stroke (PROSCIS) is a prospective observational hospital-based cohort study of patients after first-ever stroke conducted independently in Berlin (PROSCIS-B) and Munich (PROSCIS-M). The association of baseline factors with poor outcome one year after stroke in PROSCIS-B was analysed using multiple logistic regression analysis and population attributable risks were calculated, which were estimated using sequential population attributable risk based on a multiple generalized additive regression model, doubly robust estimation, as well as using average sequential population attributable risk. Findings were reproduced in an independent validation sample from PROSCIS-M. Results Out of 507 patients with available outcome information after 12 months in PROSCIS-B, 20.5% suffered from poor outcome. Factors associated with poor outcome were age, pre-stroke physical disability, stroke severity (NIHSS), education, and diabetes mellitus. The order of risk factors ranked by magnitudes of population attributable risk was almost similar for all methods, but population attributable risk estimates varied markedly between the methods. In PROSCIS-M, incidence of poor outcome and distribution of baseline parameters were comparable. The multiple logistic regression model could be reproduced for all predictors, except pre-stroke physical disability. Similar to PROSCIS-B, the order of risk factors ranked by magnitudes of population attributable risk was almost similar for all methods, but magnitudes of population attributable risk differed markedly between the methods. Conclusions Ranking of risk factors by population impact is not affected by the different statistical approaches. Thus, for a rational decision on which risk factor to target in disease interventions, population attributable risk is a supportive tool. However, population attributable risk estimates are difficult to interpret and are not comparable when they origin from studies applying different methodology. The predictors for poor outcome identified in PROSCIS-B have a relevant impact on mortality and functional impairment one year after first-ever ischemic stroke.


Background
The impact of risk factors on poor outcome after ischemic stroke is well known, but estimating the amount of poor outcome attributable to single factors is challenging in presence of multimorbidity. We aim to compare population attributable risk estimates obtained from different statistical approaches regarding their consistency. We use a real-life data set from the PROSCIS study to identify predictors for mortality and functional impairment one year after first-ever ischemic stroke and quantify their contribution to poor outcome using population attributable risks.

Methods
The PROSpective Cohort with Incident Stroke (PROSCIS) is a prospective observational hospital-based cohort study of patients after first-ever stroke conducted independently in Berlin (PROSCIS-B) and Munich (PROSCIS-M). The association of baseline factors with poor outcome one year after stroke in PROSCIS-B was analysed using multiple logistic regression analysis and population attributable risks were calculated, which were estimated using sequential population attributable risk based on a multiple generalized additive regression model, doubly robust estimation, as well as using average sequential population attributable risk. Findings were reproduced in an independent validation sample from PROSCIS-M.

Results
Out of 507 patients with available outcome information after 12 months in PROSCIS-B, 20 Introduction A number of studies identify prognostic factors for death or poor functional outcome after stroke [1][2][3][4][5]. Most studies report these associations using rate ratios, odds ratios or hazard ratios, hereby estimating risk increase for exposed compared to non-exposed patients. Only few studies take into account the prevalence of these prognostic factors in the stroke population [3][4][5][6][7]. Development of prevention programs, however, requires considering the prevalence of risk factors and their impact on outcome to prioritize the best targets to reduce mortality and morbidity. Population attributable risk (PAR), as a function of risk factor prevalence and relative risk, is used to calculate the amount of poor outcome that is attributable to a prognostic factor [8,9]. The few recent studies on mortality and functional outcome after ischemic stroke providing PAR estimates have used diverse estimation approaches [3,[5][6][7]. It is known that the use of diverse methodology affects the magnitudes of the resulting estimates [10], which restricts the comparability of PAR estimates from different studies applying different methodologies.
In this study, we aim to compare PAR estimates obtained from different statistical approaches regarding their consistency to evaluate the impact of the underlying statistical model on the PAR estimates. This is the first study that simultaneously assesses and directly compares PAR values obtained by different methodologies. We use a real-life data set from the PROSCIS study to identify baseline prognostic factors for death or functional impairment one year after first-ever ischemic stroke and assess the extent to which each prognostic factor contributes to poor outcome. We validate the findings within an independent data set.

Materials and methods
The data that support the findings of this study are available to all interested researchers at Harvard Dataverse (https://doi.org/10.7910/DVN/REBNRX).

The PROSCIS study
The Prospective Cohort with Incident Stroke (PROSCIS) is a prospective, observational, hospital-based cohort study of patients with first-ever stroke conducted independently at two centers in Germany (PROSCIS-B: Center for Stroke Research Berlin, Charité University Hospital, ClinicalTrials.gov identifier: NCT01363856; PROSCIS-M: Institute for Stroke and Dementia Research, Klinikum der Universität München, Ludwig-Maximilians-University, Clinical-Trials.gov identifier: NCT01364168). Details have been published previously [11]. Briefly, patients with ischemic stroke, primary hemorrhage or sinus venous thrombosis were recruited for PROSCIS-B at stroke units of three tertiary care university hospitals at the Charité -Universitätsmedizin Berlin since March 2010. For PROSCIS-M, patients with ischemic stroke and documented duration of disturbance of cerebral function were recruited since March 2011. Identical core protocols and data collection methods were applied by both centers, allowing studies to serve as a validation sample for each other. The main objective of PROSCIS is to determine prediction models of different complexity for the combined vascular end-point of stroke, myocardial infarction, and vascular death at three-years after first-ever stroke [11]. Patients were interviewed within the first seven days after symptom onset. An extensive clinical examination was conducted, and functional outcome was documented. Stroke survivors were followed-up annually for three years after recruitment by telephone interviews assessing i.a. patient's vital status, mood and cognitive function. Information on vital status was complemented by contacting the registry office, if patients were lost to follow-up.

Study population
Patients 18 years or older who suffered from stroke according to the WHO criteria were included into the PROSCIS-B cohort. Exclusion criteria were prior stroke and participation in an intervention study. We restrict our analyses to patients with ischemic stroke and known survival status and functional outcome one year after stroke. A subsample of 200 patients randomly selected from the PROSCIS-M cohort serves as an external validation data set.

Patient characteristics
Factors that are already known to have an impact on poor outcome after stroke and that were collected at baseline in the PROSCIS-B study were analyzed: sociodemographic parameters [age, sex, graduation (no graduation/ 10 years/ >10 years school attendance), education (total number of years spent in general, professional or university training), migration background (both parents or at least one parent and the patient are foreign-born), institutionalization pre-stroke (living in care home, retirement home or assisted living)], stroke related risk factors [body mass index (BMI), active smoking (current smoking vs. never or former smoking), regular alcohol consumption (!20cl of wine/champagne or 50cl of beer or 2cl of hard liquor at least once per week), degree of physical activity pre-stroke (no / sparse / 1-2x20 minutes strong / !3x20 minutes strong physical activity), physical disability, hypertension, dyslipidemia, diabetes mellitus type I or II (DM), atrial fibrillation (AF), myocardial infarction (MI) or angina pectoris (AP), transient ischemic attack (TIA), peripheral arterial disease (PAD)], etiologic subtype of ischemic stroke according to the Trial of ORG 10172 in Acute Stroke Treatment (TOAST) classification [12] and stroke severity according to National Institute of Health Stroke Scale (NIHSS) [13,14].

Poor outcome one year after stroke
Poor outcome was defined as death or functional impairment (modified Rankin Scale (mRS)> 2 or Barthel Index (BI)<60) one year after ischemic stroke [15][16][17]. Barthel Index and modified Rankin Scale were validated for German language [14,18] and for the use in telephone interviews [18,19].

Statistical analysis
Univariable binary logistic regression analysis was conducted to model the chance of poor outcome comparing exposed to non-exposed individuals. Parameters with univariable p-values <0.1 were transferred to multiple binary logistic regression analysis with Firth's maximum likelihood penalization method and backward selection (α stay = 0.05) to select predictors for further analysis in order to identify the strongest and mutually independent risk factors while keeping the number of covariables to the minimum. Firth's likelihood penalization method [20] was used to reduce small sample bias by applying the R package logistf [21]. Multicollinearity was examined using variance inflation factor statistics.
Analyses were carried out in R software environment for statistical computing and graphics, Version 3.5.1.

Ethics approval
The study was approved by the ethics committee of the Charité -Universitätsmedizin Berlin (EA1/218/09) and the ethics committee of the Ludwig-Maximilians-University Munich (project 366-10, 20.12.2010). Trained physicians of the trial teams carried out recruitment of patients. Patients or their legal representative gave written informed consent for study participation. Patients had to be awake and responsive to be enrolled into the trial.
Consent for publication of raw data was not obtained from the participants. We have followed guidelines on preparing clinical data for publication [26]. Resultantly, we have blocked the indirect identifiers sex and BMI, and dichotomized the further indirect identifiers age and education, in order to preserve the privacy of the participants. The dataset is fully anonymous. Publication of the dataset clearly and obviously presents minimal risk to confidentiality of study participants.

Study population
Between March 2010 and May 2013, 690 patients were recruited for PROSCIS-B. Of these, 627 patients were included in the study due to ischemic stroke. Thereof, 17 (2.7%) patients were lost to follow-up and 103 (16.4%) were alive, but had incomplete data on mRS or BI at one year. Overall, 507 (80.9%) patients were available for our analyses. Complete patient flow is shown in Fig 1.

Patient characteristics
Patient characteristics at baseline are shown in Table 1. Within one year, 24 (4.7%) patients died and 80 (15.8%) reported relevant functional impairment. Overall, 104 (20.5%) patients had poor outcome one year after stroke and 403 (79.5%) patients were alive and did not have severe functional impairment (S1 Table). Categorization of age and NIHSS was oriented towards [3,4,25], and categorization of BMI towards [27]. Patients included in our analyses had a lower degree of pre-stroke physical activity, a lower rate of hypertension, higher smoking rates and lower NIHSS on admission, compared to patients excluded from analyses due to loss to follow-up or missing data on functional impairment after 12 months (S2 Table).

Logistic regression analysis
In unadjusted logistic regression, age, sex, graduation, education, etiologic subtype of stroke, NIHSS, and pre-stroke regular alcohol consumption, degree of physical activity, physical disability, hypertension, DM, AF, MI/AP, TIA and PAD were associated with poor outcome (p<0.1) ( Table 1). Independent predictors for poor outcome one year after first-ever ischemic stroke were age, education, NIHSS, pre-stroke physical disability and DM (Fig 2). Fig 3 shows the overlap of patients with poor outcome and the exposition to each of the risk factors in a Venn diagram. The grey-shaded area represents all patients with poor outcome. Of patients with poor outcome, 14 (13.5%) were not exposed to any of the predictors, 24 (23.1%) were exposed to exactly one risk factor, whereas 62 (59.6%) patients had multiple exposures and 4 (3.8%) patients had missing values in one of the predictors.

Population attributable risk
Results from PAR estimation for PROSCIS-B are presented in Table 2. PAR calculated using Coughlin's method (estimation requires regression parameters from the generalized additive regression model, which are presented in S3 Table) result in estimates almost twice the size of those gained from average PAR estimation, PAR estimates from doubly robust estimation are   Table). Parameters associated with poor outcome in univariable analysis (p<0.1) were age, sex, graduation, education, NIHSS and pre-stroke active smoking, regular alcohol consumption, degree of physical activity, hypertension, dyslipidaemia, DM,   Outcome after stroke attributable to baseline factors Results from PAR estimation are shown in Table 3. As observed before in PROSCIS-B, the magnitudes of PAR estimates differ noticeably between the methods. Sequential PAR estimates from Coughlin's approach are considerably higher compared with estimates gained from average PAR estimation and PAR estimates from doubly robust estimation are situated in between. Despite these differences, the ranking by magnitudes of PAR is similar in all methods.
However, the impact of risk factors on poor outcome differs between PROSCIS-B and PROSCIS-M, attributing the highest amount of poor outcome to physical disability and age!75 years in PROSCIS-B and to NIHSS>4 and age!75 years in PROSCIS-M.

Discussion
The planning of an intervention study requires knowledge of the approximate amount of poor outcome attributable to the risk factor at target. PAR is a methodological tool to prioritize targets for modification according to their assumed contribution to reduce the outcome of interest, in our case patient's mortality and morbidity. While different estimation methods were used in previous studies [3,[5][6][7], to the best of our knowledge, this is the first publication that simultaneously assesses and directly compares PAR values obtained by different statistical approaches in a real-life data set. In our analysis, the ranking by magnitudes of PAR was almost similar for all methods. However, we observed a relevant variation of PAR values gained by different estimation methods. Average PAR estimation yielded the smallest PAR values and Coughlin et al.'s approach yielded the highest estimates of PAR, being the latter roughly twice the size of the average PAR. The reason for this variation is probably the stepwise elimination approach of the sequential procedures (Coughlin et al.'s approach and doubly robust estimation), resulting in dependency of PAR estimates on the order of elimination of the risk factors from the population: based on confounder-adjusted relative risks, PAR for one risk factor is obtained by identifying the excess amount of poor outcome in exposed individuals and remove it from the population [8]. For the next risk factor, the procedure is repeated. To obtain PAR values for each risk factor from a multivariable model, the sequential procedure starts with the risk factor of interest being eliminated at first. Hence, the result is the proportion of poor outcome that can be prevented, if each risk factor is eliminated from the population at first. Especially in cases of multimorbidity, however, poor outcome can often be attributed to more than one factor and is then assumed to be eliminated multiple times. A large proportion of patients with poor outcome in PROSCIS-B (grey-shaded area in Fig 3) suffered from multiple risk factors, i.e. has multiple options to prevent poor outcome. This causes a relevant source of over-estimation of PAR when applying sequential PAR estimation methods. Coughlin's approach seems more vulnerable to multimorbidity than average PAR, i.e. yields notablely higher estimates. By contrast, average PAR is obtained by calculating sequential PAR for every possible order of risk factor removal from the population, and subsequently averages the results for each risk factor [8]. As a result, average PAR is independent from the order of risk factor removal. Therefore, PAR values gained by different approaches are not directly comparable, in particular in scenarios where multimorbidity is present. Furthermore, PAR values depend on the definition of exposure categories, since the application of software for PAR estimation required dichotomization of the prognostic factors. Consideration of continuous factors or factors with multiple categories was not possible. In addition, elimination of all expositions from the population is unrealistic and hence, the PAR can only be interpreted as the in theory maximum avoidable amount of poor outcome that might be preventable, if in the best scenario all expositions could be eliminated from the population.
In the present study, we estimated the population impact of known risk factors for death or functional impairment one year after ischemic stroke in a large prospective cohort (PROS-CIS-B) and validated the findings in an independent cohort (PROSCIS-M). Both cohorts were designed identically, applied the same study core protocol, and the same statistical approaches were used for analysis. The associations determined from the validation cohort PROSCIS-M were similar to those from PROSCIS-B, except pre-stroke physical disability. Although we observed nominal higher probabilities in disabled compared to non-disabled patients for death (10.7% vs. 4.9%, respectively), but not for dependency (12.0% vs. 14.6%, respectively), no statistically significant associations could be observed in univariable or in multivariable analysis. This fact is probably caused by the small sample size of PROSCIS-M. The effects of age, education and NIHSS were on the same range as observed in PROSCIS-B. The effect of diabetes mellitus pointed in the same direction, but was estimated higher with less precision. The PAR magnitudes for age!75 years (17.29% vs. 20.89%), and DM (7.61% vs. 10.03%) were comparable between both cohorts. The size of the estimated PAR values was different for physical disability pre-stroke (average PAR 18.48% vs. 2.89%), NIHSS>4 (10.90% vs. 26.13%) and education 10 years (10.39% vs. 18.86%) for PROSCIS-B and PROSCIS-M. This might be a result of the heterogeneity of both cohorts: in PROSCIS-M, pre-stroke physical disability was less prevalent compared to PROSCIS-B and no significant association with poor outcome was found. In addition, patients in PROSCIS-B had less severe strokes and were younger, which possibly explains the notably smaller amount of poor outcome attributable to NIHSS. Moreover, patients from PROSCIS-M were better educated, which at least partly explains the higher impact of education on poor outcome. All prognostic factors for death or functional impairment after first-ever ischemic stroke in our cohort are in line with those found in previous studies [1][2][3][4][5][28][29][30][31][32]. Furthermore, we found similar magnitudes of average PAR for age!75 years (average PAR in PROSCIS-B: 17.29%, PROSCIS-M 20.89%), compared to previous studies applying the same PAR estimation methodology [3,4]. Values of average PAR for DM varied between previous studies [3,4], we found an average PAR of 7.61% in PROSCIS-B and 10.03% in PROSCIS-M. NIHSS>4 attributed 27.5% in [3], 26.13% in PROSCIS-M, but only 10.90% in PROSCIS-B. PAR of physical disability was attributable for between 9% and 17% of death and poor outcome, respectively, in [4], and for 11% and 15% in [3], which suggests that the PAR gained from PROSCIS-M might have been under-estimated. Educational level was not considered in previous average PAR estimations, but is a known risk factor for poor outcome [25]. Comparability of PAR estimates with estimates gained in previous studies, however, might still be limited, since different definitions of poor outcome were used, and the amount and types of risk factors considered varied. In [4], Barthel Index was not considered in the functional assessment, and in [3], rehospitalization due to stroke was included in the definition of poor outcome. Both studies investigated early complications in addition to sociodemographic and clinical characteristics, and results were reported stratified for mortality and poor outcome within 7 days and 3 months after stroke, respectively. Some limitations of our study have to be considered. First, the study represents a secondary analysis of a prospective study originally designed to develop prediction models for recurrent vascular events in ischemic stroke patients. Thus, we only considered factors on hospital admission for analyses and no in-depth information on pre-stroke patient conditions, in-hospital treatment or comorbidities beyond cardiovascular diseases were available. Hence, we cannot exclude that residual confounding might be present in this study, possibly leading to an over-estimation of PAR. Nevertheless, the comprehensive and well-documented data collection process as well as the standardized conduction of all examinations helped countering information bias for the collected data. Second, the recruitment was regionally restricted to patients treated only in stroke units in Berlin and regional conditions might have had influence on the study collective, which may impede the generalizability the results. Third, we have observed a selection of study patients to less severe strokes within PROSCIS-B (median baseline NIHSS of 2 points). This might be caused by several factors: (1) recruitment took place in stroke units only, excluding more severe strokes treated at neurological intensive care units, (2) only patients able to consent or with a legal guardian could be enrolled into the study, (3) patients with available outcome after 12 months had lower values of NIHSS compared to patients with unavailable outcome information. This selection probably comes along with selective clinical characteristics such as young age, less comorbidity, and lower rates of cardioembolic strokes as well as rather good outcomes one year after stroke and according to this, the PAR estimates from this population can only be extrapolated to similar minor stroke populations. Fourth, due to the low incidence of poor outcome of 104 out of 507 patients we used a covariable selection procedure to reduce the number of variables in the model. However, this could potentially have led to residual confounding, as we might not have enough power to detect an association of the classical risk factors like atrial fibrillation, hypertension or transient ischemic attack with poor outcome, possibly leading to an over-estimation of PAR.
To conclude, PAR is a supportive tool to make a rational decision on which risk factor to target in intervention studies. Ranking of risk factors regarding their population impact by magnitudes of PAR was independent of the estimation method. However, PAR values are difficult to interpret and are not comparable when they origin from studies applying different methodology.