The Novel 10-Item Asthma Prediction Tool: External Validation in the German MAS Birth Cohort

Background A novel non-invasive asthma prediction tool from the Leicester Cohort, UK, forecasts asthma at age 8 years based on 10 predictors assessed in early childhood, including current respiratory symptoms, eczema, and parental history of asthma. Objective We aimed to externally validate the proposed asthma prediction method in a German birth cohort. Methods The MAS-90 study (Multicentre Allergy Study) recorded details on allergic diseases prospectively in about yearly follow-up assessments up to age 20 years in a cohort of 1,314 children born 1990. We replicated the scoring method from the Leicester cohort and assessed prediction, performance and discrimination. The primary outcome was defined as the combination of parent-reported wheeze and asthma drugs (both in last 12 months) at age 8. Sensitivity analyses assessed model performance for outcomes related to asthma up to age 20 years. Results For 140 children parents reported current wheeze or cough at age 3 years. Score distribution and frequencies of later asthma resembled the Leicester cohort: 9% vs. 16% (MAS-90 vs. Leicester) of children at low risk at 3 years had asthma at 8 years, at medium risk 45% vs. 48%. Performance of the asthma prediction tool in the MAS-90 cohort was similar (Brier score 0.22 vs. 0.23) and discrimination slightly better than in the original cohort (area under the curve, AUC 0.83 vs. 0.78). Prediction and discrimination were robust against changes of inclusion criteria, scoring and outcome definitions. The secondary outcome ‘physicians’ diagnosed asthma at 20 years' showed the highest discrimination (AUC 0.89). Conclusion The novel asthma prediction tool from the Leicester cohort, UK, performed well in another population, a German birth cohort, supporting its use and further development as a simple aid to predict asthma risk in clinical settings.

Data Availability: The authors confirm that, for approved reasons, some access restrictions apply to the data underlying the findings. Data from the MAS-90 birth cohort used for this manuscript cannot be published directly as part of this manuscript or online supplement, for participating families gave consent only for analyzing their data within scientific projects -which cannot be guaranteed when making it unrestrictedly available for download. However, for scientific evaluations, all data can be requested free of charge and restrictions from Mr. Andreas Reich at the Institute for Social Medicine, Epidemiology and Health Economics at Charité -Universitätsmedizin Berlin, Germany, after agreeing to use data only for research.
Funding: The MAS study was funded by grants from the German Federal Ministry of Education and Research (07015633, 07 ALE 27, 01EE9405/5, 01EE9406) and the German Research Foundation (KE 1462/2-1). The funders had no role in the design, management, data collection, analysis or interpretation of the data or in the writing of the manuscript or the decision to submit for publication.
Competing Interests: The authors have declared that no competing interests exist.

Introduction
Our understanding of modifiable and non-modifiable determinants influencing the onset and development of asthma in adolescence advanced in recent years, but it has not lead to improved prevention strategies [1][2][3][4][5][6].
While primary prevention is lacking, the knowledge about essential parameters gathered in research can at least be used to predict future asthma in the clinical setting. Screening children in preschool age to identify those with a high probability of later asthma opens the opportunity for interventions aiming to slow or stop progression or modify disease severity at pre-clinical stages. Prediction or early diagnosis allows for learning about the immunologic processes at work before obvious symptoms occur.
For example, crude definitions of suggestive symptoms and aspects of patient's history are already used in current guidelines to evaluate risk and to target trial treatment (eg, inhaled corticosteroids, [7,8]). Several formalized prediction algorithms have been proposed to quantify the information content of symptoms, behavioural patterns and heredity for estimating probabilities of later asthma [9][10][11][12].
In a recent issue of the Journal of Allergy and Clinical Immunology, Pescatore et al. [13] introduced a new 10-item asthma prediction tool, based mainly on indicative symptoms, and accounting for age, sex, comorbid eczema and parents' history of asthma/bronchitis. Unlike prior studies, final selection of variables/ predictors was run by a LASSO-penalized regression model, homing on fewer factors and higher external validity (Least Absolute Shrinkage and Selection Operator).
None of these scoring systems is widely used in primary care [14]. This is mainly due to the lack of external validation and impact studies, requisites for a general recommendation of such tools [15]. The predictive performance outside the population from which the tool was developed is usually lower than estimated from internal validation, both in retrospective as well as prospective external validation studies [16][17][18][19]. Pescatore et al. validated the new asthma prediction tool in the cohort used for development itself [13,20], which now requires replication in other populations [21] as the most rigorous assessment of a model's validity [22].
Our aim was therefore to externally validate Pescatore's asthma prediction tool by estimating measures of discrimination, calibration and performance in the MAS-90 birth cohort [23,24], supported by sensitivity analyses considering different inclusion criteria for validation sample, scoring items, and outcome definitions, and assessing asthma phenotypes up to 20 years.

Methods
For external validation of the novel asthma prediction tool we retrospectively applied the suggested scoring to a subsample of the MAS-90 birth cohort. Exploring the robustness of the primary model and accounting for differences in data collection, we reiterated analyses with various definitions of inclusion, scoring and outcome criteria as described below.
Setting From all 7,609 children born during 1990 in 6 participating hospitals across Germany, a population-based birth cohort was recruited (n51,314, The German Multicentre Allergy Study, MAS-90). Newborns to allergic parents, based on history or positive Immunoglobulin E screening were partly oversampled (19% in all children vs. 38% in the recruited cohort, details in [23,25,26]). Development of allergic diseases including asthma as well as information on living environment and lifestyle were traced at nineteen time points up to age 20 years through interviews, questionnaires, clinical investigations including blood sampling and assessment of lung function, achieving a long-term response of 72% at 20 years. The project was approved by local institutional review boards (ethics committees Charité -Universitätsmedizin Berlin; Technical University Munich, Faculty of Medicine; Landesärztekammer Rheinland-Pfalz; University Medical Hospital Freiburg). All parents and later all adult participants provided written informed consent for data collection and analysis.

Population and inclusion criteria
The sample used for external validation was limited to children who participated in assessments at 3 and 8 years (7 th and 12 th follow-up). Resembling the development cohort [13], our primary inclusion criteria were further restricted to participants reporting wheeze ('Has your child had wheezing or whistling in the chest in the last 12 months?') or cough ('In the last 12 months, has your child had a dry cough at night, apart from a cough associated with a cold or a chest infection?') in 3 years' interview. For sensitivity analyses, secondary sample inclusion criteria comprise the whole initial sample irrespective of symptoms at 3 years, and those reporting either wheeze or cough only.

Scoring variables
Six of the 10 items of the original asthma prediction tool refer to parent-reported symptoms (questions [3][4][5][6][7][8]. For the (primary) scoring definition we manually identified corresponding items from the interview at 3 years in the MAS-90 cohort. Child's sex was documented at recruitment (baseline assessment), age was calculated from date of birth and interview (questions 1, 2). Information on comorbid eczema and parents' allergies were derived from the interview at 3 years (questions 9, 10). Several secondary scoring approaches were assessed. Questions on eczema and parents' allergies asked at 3 years of age covered only the previous 12 months, and were thus complemented with data from earlier follow-ups and baseline. Secondly, all missing information (answer 'Always' in question 6, question 8 entirely) was imputed at random. Finally, actual scores were shuffled at random between study participants, giving reference measures of discrimination and performance.

Outcomes
As used for model development, parent-reported current wheeze ('Has your child had wheezing or whistling in the chest in the last 12 months?') in combination with use of asthma drugs ('Did your child take any drugs against asthma during the last 12 months?') both at age 8 years was defined as the primary outcome.
As secondary outcomes, wheeze or asthma drugs only, and a physician's diagnosis of asthma were used as reported at the eight-year-follow-up. Another set of secondary outcome definitions used information collected in the MAS-90 cohort in later assessments up to age 20 years. For these outcome definitions asthma was defined as satisfying 2 of the following 3 criteria at any follow-up from age three years and above: physician's diagnosis of asthma ever; asthma drugs in last 12 months; any indicative symptom in last 12 months (wheezing, shortness of breath, dry cough at night). Allergic asthma further included a positive serum specific immunoglobulin E$0.35 kU/l (kilo Units per litre) to at least one regularly assessed aero-allergen (dust mite, dog, cat, birch, timothy), determined at nine time points (ImmunoCAP -Phadia GmbH, Freiburg, Germany). Lung function at 20 years was assessed through escalating-dose Methacholine challenge, a$20% drop of FEV 1 (Forced Expiratory Volume in 1 second) from baseline was considered as increased airway responsiveness.

Statistical methods
Data management, data cleaning and statistical evaluation was carried out using the SAS system (version 9.3, SAS Institute Inc., Cary NC, USA). Children with missing information on one of the two items used for inclusion or one of the two used for the primary outcome definition were excluded from this analysis. Missing questionnaire items used for scoring were set to the baseline value of zero (items 3 to 10) for the primary approach. Random imputation of missing questions/ answer categories (eg question 6,8) in the validation cohort used frequency estimates from the development dataset. For random imputation of the total score we reshuffled the validation cohort's actual scores between individuals. Both imputations were run 100 times, measures of performance averaged, and inbetween imputation variation accounted for the calculation of confidence intervals. These approaches will not add further insight to the performance of the score, but allows assessment of robustness. Reshuffling the scores randomly gives non-informative point and precision estimates, facilitating interpretation of real performance measures. The actual score distribution in the development cohort was inferred from tabulated sensitivity and specificity, as it was not published in the original article.
A univariate logistic model was used to derive measures of test performance (sensitivity, specificity, predictive values, likelihood ratios) and disease probability for each score, resembling what was reported originally. Odds ratio for developing asthma per 1-point increase of the score at 3 years as well as Nagelkerke's R 2 (maximum rescaled R 2 , coefficient of determination standardized to its maximum) and maximum rescaled Brier score were calculated as overall performance measures [24]. The Brier score is a measure of how well the predicted and the actual outcomes overlap: with a value of 0 the model adds no information to the a-priori prevalence, and a value of 1 indicates best-possible prediction. Discrimination was reported using the c statistic (AUC, area under curve), calibration/agreement was assessed graphically plotting predicted disease frequency in eight groups against observed disease frequency.

Study population
Our sample from the MAS-90 cohort is similar to the cohort population the score was developed in, with respect to prospective regular assessments. Unlike in the original cohort from Leicester, UK, where children were enrolled aged 1 to 3, the German MAS-90 participants were all recruited at birth and allergic parents were slightly oversampled (table 1). 841 of 1,314 (64%) study participants completed follow-up assessments at 3 and 8 years, of which 140 (17%) met the primary inclusion criteria: wheeze or cough in the previous 12 months at age 3 years. This primary study sample was similar to all followed, in terms of parental education and overweight, family's smoking habits, and atopic heredity (parents' self-reported allergies and cord blood Immunoglobulin E, table 2). 121 children from the primary sample (86%) were successfully traced up to the age of 20 years, out of which 93 (77%) underwent lung function testing.

Scoring
We identified corresponding items in the 3-year-questionnaire for 9 of 10 original scoring questions, 5 with perfect/very good and 1 with good comparability. One question (item 5, wheeze interfering with daily activities) was not asked in the validation cohort, but could be substituted by a proxy question on sleep disturbance by wheeze. One question (item 8, cause of wheeze/cough) could not be replaced by a meaningful alternative, which was assessed in sensitivity analyses as described later (table 3).
Frequency of answers was similar in the validation cohort compared to the original population for most items, except for shortness of breath at 16% vs. 35% in development cohort and wheezing caused by 'physical stress' at 19% vs. 39% in development cohort, which asked for wheezing caused by 'exercise, laughing,  crying or excitement'. Furthermore, parents' respiratory illness was more common in the Leicester population asking for wheeze, asthma and bronchitis (mother 22%/father 17%), compared to MAS-90 where the question was limited to asthma only (8%/6%). The distribution of actual scores in the validation cohort (mean score 4.2, median 4) was very similar to what we derived from the report of Pescatore et al.
Outcomes 28 of 140 children (20%) who wheezed or coughed at 3 years of age met the primary outcome definition of asthma at 8 years (wheeze combined and asthma medication, both parent-reported). Regarding specific symptoms, 33 (26%) reported wheeze, 33 (26%) recent use of asthma drugs, and 27 (19%) a physician's diagnosis of asthma. 51 of 121 children (42%) followed up to 20 years developed asthma between 3 and 20 years, the majority was sensitized to aero-allergens (84%). 29 of 93 (31%) participants at age 20 reacted to Methacholine challenge in lung function testing. In the low risk group (score #5) asthma prevalence at 8 years was 9% (vs. 16% in original cohort), in those at medium risk (score 6-9) 45% (vs. 48%). Items used for inclusion and the primary outcome definition were comparable between the original and the validation cohort (table 4). Question in asthma prediction score [13] (%)

Model performance
Performance of the primary model in our sample was similar to the original cohort with sensitivity of 82% (original cohort 72%) and specificity of 69% (original cohort 71%) at score 5. Predicted disease probability at score 5 was 20%, by chance the same as the a priori disease frequency (Fig. 2). Overall performance of the primary model resembled the original analysis with a max-rescaled Brier score of 0.22, Nagelkerke's R 2 (max-rescaled) of 0.32 and an odds ratio of 1.7 (95% confidence interval 1.4-2.1). Discrimination between asthma vs. no asthma at age 8 years was better in our sample with an AUC (area under the curve) of 0.83 (95% confidence interval 0.75-0.91), compared to 0.78 in the Leicester sample. Replacing the primary case definition with a physician's diagnosis of asthma led to an even higher AUC of 0.89 (Fig. 3). Graphical assessment of agreement between observed and predicted disease frequencies revealed very good calibration of the original model (Fig. 4).

Sensitivity analyses
We reiterated discrimination and performance assessment for various inclusion, scoring and outcome criteria. The model performed best using cough as the only inclusion criterion at 3 years (AUC 0.91, Brier score 0.45) or a physician's diagnosis of asthma as outcome definition at 8 years (AUC 0.89, Brier score 0.34).
The model performed poorly in predicting response to the Methacholine

Key results
We externally validated the recently developed asthma prediction tool [13] retrospectively on follow-up data of the German MAS-90 birth cohort study, which resembles major design aspects of the original cohort from Leicester, UK [20]. 9 of 10 scoring items were successfully mapped to our questionnaire at 3 years of age, with similar answer frequencies compared to the development cohort. The final score distribution and asthma frequencies at 8 years within low (9% reported asthma), middle (45%) and high risk (67%) score groups were close to the original sample. Measures of performance were similar (max-rescaled Brier score 0.22, max-rescaled/Nagelkerke's R 2 0.32) and discrimination slightly better (AUC 0.83) compared to the Leicester sample (AUC 0.78). Sensitivity analyses revealed robust prediction for various definitions of inclusion, scoring and External Validation of Asthma Prediction Tool outcome criteria, with even better discrimination using the stringent outcome definition of asthma diagnosed by a physician (AUC 0.89).

Strengths and limitations
This retrospective validation analysis was made possible because follow-up assessments in the MAS-90 cohort included all data necessary to derive inclusion criteria, information for scoring according to the ten original asthma prediction tool questions, and outcome definitions. Such rare opportunity is an ideal setting for external validation: a similar population unrelated in terms of location and sampling, assessed with the same or similar tools [15]. In the original cohort inclusion was based on the report of wheeze or cough at 3 years, for the assessment at 2 years (median age of original cohort) did not include these and other information necessary for scoring, and used the exact same questionnaire wording as we did in our birth cohort. The original sample was further limited to those having recently seen a physician in response to these symptoms. Such information was not collected in the MAS-90 cohort and could not be replaced by proxy data. But model performance and discrimination was similar applying different (secondary) inclusion criteria, with the subsample of children reporting wheeze giving the poorest discrimination (AUC 0.72) and those with cough the highest (AUC 0.91). Compared to recommended sample sizes for validation studies, the validation sample yielded a limited number of events and non-events [27], with slightly less precise estimates in the validation cohort.
Only little information necessary to calculate the exact score for each child was missing in our sample, either because it was missing by item or the according question was not asked or phrased differently. We approached the latter by mapping related questionnaire items containing proxy information. The two secondary definitions used for scoring gave almost identical performance and discrimination, one complementing data from earlier follow-up assessments, the other iteratively imputing missing items with random values. The minor difference in risk distribution with lower numbers for the highest scores in our sample and the predominantly Caucasian population limit generalizability of this External Validation of Asthma Prediction Tool validation. The different age distributions at risk assessment including 1-2 years old children in the original cohort may explain the lower performance measures. Of note, prevalence and severity of asthma-like symptoms are higher in the United Kingdom compared to Germany [28].
Wheezing in the past 12 months along with the use of asthma inhalers at 8 years was the outcome used for model development. Our 8-years' questionnaire asked non-specifically for any drugs against respiratory disease including those administered orally, leading to wider inclusion. A noteworthy feature of our analysis was the assessment of various (secondary) outcome definitions. Those criteria based solely on information from the 8-years' questionnaire gave similar performance and discrimination with the highest AUC of 0.89 for predicting asthma supported by a physician's diagnosis at 8 years. The asthma prediction tool performed worse predicting any asthma up to age 20 years (AUC 0.75), allergic asthma up to 20 years (AUC 0.80) and did not predict airway responsiveness at 20 years (AUC 0.64). External Validation of Asthma Prediction Tool

Appraisal
As expected for retrospective external validation, there was no perfect resemblance of tools and criteria. But robustness of performance and discrimination against changes of inclusion, scoring and outcome criteria gave confidence for the applicability of the asthma prediction tool in settings outside the development cohort.
Prediction usually performs best in the setting where the model is developed in, either internally on the same or a subset of the original sample, or externally in the same setting shifted geographically and/or in time [16,18]. To our surprise, the external validation that we performed in a different setting, the MAS-90 birth cohort from Germany, gave even better performance, discrimination and prediction. This could be due to differences in questionnaire administration: filled by parents in the original cohort vs. based on face-to-face interviews in MAS-90, the latter presumably with higher discriminatory power. Additionally, the LASSO method (Least Absolute Shrinkage and Selection Operator) used for predictor selection while developing the score aims at fewer factors in the final model and higher external validity [29].

A single prediction tool
Several independent attempts to predict asthma development from preschool symptoms and presumed risk factors have been made, but none reached widespread application in clinical practice [9]. Most tools were never validated externally, never updated and refined further, and their health benefit never assessed. Instead of incrementally improving existing prediction tools, new but similar models with comparable sets of predictors were suggested. Validating this tool externally, we see our analysis as a first step towards a single and robust tool for the prediction of asthma development. This should be done in other cohorts retrospectively, and ongoing or future longitudinal research on asthma should consider collecting information detailed enough to facilitate external validation of existing prediction tools. Comparing performance between the model development cohort and various external settings provides valuable insight into the influence of sampling and inclusion, selectiveness of scoring information and outcome criteria.
From this general understanding, the existing model should be updated. This includes the identification of additional factors not yet incorporated, and the refining of weights from re-running the model. As the search for risk-factors has a focus on true modifiable causes of the disease, this is not the case for estimating the individual probability to develop asthma. Non-modifiable traits such as parents' allergy status, indicators of pre-clinical disease (eg, early wheezing), sensitization, and environmental as well as behavioural exposures provide valuable information for prediction modelling and should be used to improve the model further.
Re-weighting prediction factors in external settings is often hampered by the lack of detailed descriptions of the model building process, which should always be made available online along initial publications, just as done in Pescatore et al. [13]. In-depth sensitivity analyses are inevitable for robust re-weighting of parameter estimates, accounting for inclusion, score coding and different outcomes.
As a final step, impact analyses are essential to predict health benefits of applying asthma prediction tools in clinical practice. They could prove useful for early interventions, targeting prevention strategies, or improve sampling strategies for research.

Conclusion
This first external validation of the newly developed asthma prediction tool supports applicability outside the development cohort from Leicester, UK. Performance and discrimination were better in our external sample compared to internal validation, and robust against changes of inclusion, scoring and outcome definitions. Being a tool for timely diagnosis based primarily on early symptoms, we support its development by incorporating risk factors, externally refining the underlying model and assessing the impact on health outcomes.