A modern approach to identifying and characterizing child asthma and wheeze phenotypes based on clinical data

‘Asthma’ is a complex disease that encapsulates a heterogeneous group of phenotypes and endotypes. Research to understand these phenotypes has previously been based on longitudinal wheeze patterns or hypothesis-driven observational criteria. The aim of this study was to use data-driven machine learning to identify asthma and wheeze phenotypes in children based on symptom and symptom history data, and, to further characterize these phenotypes. The study population included an asthma-rich population of twins in Sweden aged 9–15 years (n = 752). Latent class analysis using current and historical clinical symptom data generated asthma and wheeze phenotypes. Characterization was then performed with regression analyses using diagnostic data: lung function and immunological biomarkers, parent-reported medication use and risk-factors. The latent class analysis identified four asthma/wheeze phenotypes: early transient wheeze (15%); current wheeze/asthma (5%); mild asthma (9%), moderate asthma (10%) and a healthy phenotype (61%). All wheeze and asthma phenotypes were associated with reduced lung function and risk of hayfever compared to healthy. Children with mild and moderate asthma phenotypes were also more likely to have eczema, allergic sensitization and a family history of asthma. Furthermore, those with moderate asthma phenotype had a higher eosinophil concentration (β 0.21, 95%CI 0.12, 0.30) compared to healthy and used short-term relievers at a higher rate than children with mild asthma phenotype (RR 2.4, 95%CI 1.2–4.9). In conclusion, using a data driven approach we identified four wheeze/asthma phenotypes which were validated with further characterization as unique from one another and which can be adapted for use by the clinician or researcher.


Introduction
Asthma is a heterogeneous disease often characterized by wheeze, cough, chest tightness and shortness of breath caused by multiple triggers, and changes over the life course [1]. There has been a recent focus on disentangling the heterogeneity in order to identify specific phenotypes and endotypes for the purposes of better management and treatment of asthma and wheezing illnesses [2][3][4][5][6][7][8]. A number of modern data-driven machine learning approaches have been used to identify phenotypes such as latent class analysis (LCA) [9,10]. The data-driven approach is hypothesis-free relying on the statistical model to generate clusters of phenotypes based on the variables added to the model rather than pre-formulated hypotheses, and has been shown to be appropriate for use in complex diseases such as asthma [9].
To date, the variables used for LCA analysis in children have consisted of wheeze patterns [6,8,11], growth patterns [12], atopic status [13][14][15] or a range of diagnostic criteria [10,16,17]. However, the majority of these studies are based on detailed longitudinal information from selected cohorts that while useful in understanding disease progression, can be difficult to generalize to the average patient seen in the clinic on an irregular basis. Therefore, it is of value to focus on wheeze and asthma symptoms as well as symptom history that would be typically used in a clinician-led history, or in a questionnaire by researchers. The aim of this study was to first use data driven approach to identify asthma and wheeze phenotypes based on symptom history data and secondly to confirm that these phenotypes were relevant for clinicians and researchers by further characterization using diagnostic tests, biomarkers, asthma medication and risk factor history information.

Study population
The Childhood and Adolescent Twin Study in Sweden cohort (CATSS) study is a continually recruiting cohort that recruits all 9 and 12 year old twins born in Sweden from July 1992 onwards for participation in interviews on health and development [18]. The Swedish Twin study on Prediction and Prevention of Asthma (STOPPA) cohort is an asthma rich cohort recruited from the CATSS. [19] STOPPA has been described and reported on previously [19]. In brief, the goal of the STOPPA cohort was to identify an asthma rich cohort from CATSS that could be studied in more depth with clinical and biometric examination. Based on questions validated through the International Study of Asthma and Allergies in Childhood (ISAAC) [20] in the CATSS interview material at age 9 or 12, an algorithm was created to identify same sex twins born 1997-2004 who: both had asthma (concordant asthma), one had asthma and the other did not (discordant), or both had no asthma (healthy concordant). In total, 6,174 twins were eligible for STOPPA, however, since discordants only made up 13% of all eligible twins and the objective was to recruit equal numbers of twin pairs with concordant asthma, concordant healthy and discordant, a sample of 1,448 were contacted, 870 agreed to participate and 752 came to the clinical examination, a response rate of 52%.
The STOPPA cohort participated in clinical testing and their parents completed questionnaires when the children were 9-15 years of age. Data in STOPPA was linked by personal identity number to nation-wide registers held in Sweden by the National Board of Health and Welfare and Statistics Sweden, including the Medical Birth Register (MBR) and the Longitudinal Integration Database for health insurance and labour market studies (LISA) [21].
The study was approved by the Regional Ethical review board in Stockholm, Sweden. Informed written consent for the study was obtained from all children and their parents.

Variables used in the latent class analysis
The 17 variables used for the LCA were based on wheeze and asthma symptoms. The justification for these variables was that they are representative of the questions a clinician may ask patients and their parents when taking a 'patient history' in order to determine an asthma phenotype and subsequent disease management. The symptom and symptom history variables were based on ISAAC questions [20] and the Global Initiative for Asthma (GINA) guidelines [1] (S1 Table). Ever asthma, age of FIRST asthma, wheezing or breathlessness attack and age of LAST breathlessness attack were reported by parents during the initial CATSS interview when the twins were 9 years old. The other asthma and wheeze questions were parent-reported at STOPPA recruitment. These included: ever wheeze, wheezing episode in the last 12 months-if 'yes', how many times?, wheeze due to a cold, current snoring and current asthma. If current asthma was 'yes' then the following variables were also asked about: asthma diagnosed by a doctor and the age of asthma diagnosis; and in the last 12 monthslimited speech to one/two words per breath due to asthma; breathing difficulties due to asthma; woken by asthma; disturbed in daily activity due to asthma; acute visit to emergency or general practitioner for asthma; admitted to hospital for asthma. For all who answered 'no' to current asthma, these variables were all set to 'no'.

Variables used to further characterize asthma and wheeze phenotypes
Possible risk factors. Parental history of asthma, environmental tobacco smoking exposure, breast-feeding exposure and history of dog ownership were parent-reported in STOPPA. Birth weight, gestational age and parity were obtained from the MBR, and highest parental education gained from the LISA.
Allergy and immunological variables. The child ever having eczema and ever having hayfever were parent-reported. Allergic sensitivity was recorded as positive if sera measured � 0.35kU/l for Phadiatop 1 (Thermo Fisher Scientific, Uppsala, Sweden). Those positive were then further analysed for specific IgE (sIgE) antibodies to the single allergens of cat, dog, horse and birch. A sIgE result � 0.7kU/l was considered to be positive. Those participants who were negative for Phadiatop 1 were assigned the value 0.09 kU/l (the level below quantification) for each sIgE. [22] A complete blood count of blood samples provided information on eosinophil and neutrophil granulocyte concentrations (cells x10 9 /L) as well as lymphocyte particle concentrations (cells x10 9 /L).
Asthma medication. History of the child's asthma medication was parent-reported in STOPPA. Parents were initially asked if the child currently took any asthma medication. If they answered 'yes', they were then asked if the child had taken a short-acting medication (β2 agonist) in the last week, and details about whether the child took regular or periodic asthma medication in the last 12 months including: inhaled corticosteroids (ICS), long-acting β2 agonists, combination medications of ICS and fast-acting β2 agonists, leukotriene receptor antagonists (LTRA) or systemic corticosteroids.
Lung function and fractional exhaled NO (FeNO). FeNO (parts per billion), a noninvasive biomarker of airway inflammation, was measured with a hand-held electrochemical analyzer (NIOXMino, Aerocrine, Solna, Sweden) or FeNO test analyzer (Ecomedics, Duernten, Switzerland). Each subject performed the test at least twice, if there was >5% difference between the first two measurements a third attempt was performed [19].
Participants performed spirometry [19] to ascertain forced expiratory volume in the first second after a maximal inhalation (FEV1) and forced vital capacity-total volume expired after maximal inhalation (FVC) before and after 15 minutes of inhaling a bronchodilator (0.5mg of terbutalin) to test reversibility. At least three attempts with high reproducibility (<0.15L between two highest values) were required for each procedure and the maximum value of the attempts was used for the analyses. Lung function values were converted to z-scores based on the Global Lung Function Initiative reference values, taking sex, age, height and ethnicity into account. [23] Reversibility was calculated as a percentage change in FEV1 from baseline: (post-fev1-prefev1) /prefev1 � 100.

Statistical analysis
The LCA was conducted in MPLUS version 7.31 (Muthen & Muthen, Los Angeles, CA) to determine phenotypes of wheeze and asthma based on the unsupervised associations found between variables. Individuals were assigned to the class for which they had the highest posterior probability of belonging. Starting with a model assuming 2 phenotypes we compared model fit of increasing numbers of phenotypes (up to 7) using the Bayesian information criteria (BIC), Aikake information criteria (AIC) and the Lo Mendell Rubin test (LMR). Entropy index was used to determine the goodness of fit of the data to the number of classes. Different starting values for the algorithm iterations were used to avoid local maxima. Results in the LCA are presented as conditional probabilities (CP). These represent the probability of an individual in a given class of the latent variable being at a particular level of the observed variable. Based on CP within each latent class, a label for each class was determined by the authors.
In order to characterize the phenotypes further, proportions and means of potential risk factors, allergy, immunological markers, medication use, FeNO and lung function were calculated. Supervised analysis involved generalized linear models to calculate relative risks (RR) for dichotomous variables and β-estimates for continous variables comparing each latent class with the 'healthy' phenotype. Only significant associations are presented. For allergy, immunological markers, asthma medication and respiratory function comparisons with RR, β-estimates and 95% CI were also made between the 'moderate asthma' phenotype and the 'mild asthma' phenotype to provide further differentiation between these two asthma phenotypes. We accounted for clustering of observations within twin pairs by using the robust sandwich estimator for standard errors. Any missing data was assumed to be missing at random.
Data management and statistical analyses (apart from the LCA) were conducted using SAS 9.4 (SAS Institute Inc., USA) and STATA 15.1 (StataCorp, USA).

Results
The best fitting LCA was the 5 class model based on the lowest BIC and AIC, the LMR which suggested that this model was significantly different to the 4 class model, and the entropy index approaching 1 (S2 Table). Table 1 displays the conditional probabilities of each symptom or symptom history that were included in the LCA for each latent class. The five phenotypes were given labels to best describe the profile of conditional probabilities: 'Healthy', 'Early transient wheeze', 'Current wheeze/asthma', 'Mild asthma' and Moderate asthma'. The class probability, i.e. the probability of belonging to the class the individual was assigned to, was higher than 0.5 for all individuals.

Early transient wheeze
Children with this phenotype did not report current wheeze or asthma (Table 1). However they had a history of wheeze or asthma (CP 82% ever wheeze and 79% ever asthma), most of the asthma cases beginning before two years of age. Characterization: children with 'early transient wheeze' phenotype did not differ in risk factors to the 'healthy' phenotype, although 61% of this group were boys (p = 0.05), Fig 1 and Table 2. They had a similar allergy and immune profile to the 'healthy' phenotype (Table 3 and Fig 2), and were unlikely to use any asthma medication ( Table 4). The most outstanding feature of this phenotype was despite having no reported current asthma or wheeze the children were more likely to have reduced lung function than 'healthy' children having lower pre and post FEV1. They also had a greater degree of reversibility, 4.87 ±5.14%, p<0.01 (Table 5).

Current wheeze/asthma
Children with this phenotype typically had wheeze in the last 12 months (Table 1). Half attributed their wheeze to having a cold, some reported ever asthma (CP 32%) but not current asthma. Characterisation: these children did not differ in risk factors to the 'healthy' phenotype (Fig 1, Table 2). They were twice as likely to have hayfever and had an increased risk of horse and dog specific allergy compared to 'healthy' phenotype (Table 3, Fig 2). In addition, these children had a slightly lower lung function than the 'healthy' phenotype (Table 5) and minimal asthma medication use (Table 4). Taken together this phenotype seems to represent wheeze due to allergy or viruses or mild undiagnosed asthma.

Mild asthma
Children with this phenotype reported current asthma but no wheeze in the last 12 months (Table 1). However, some children reported symptoms due to asthma including; breathing difficulties and disturbance in daily activities. Characterisation: children with 'mild asthma' were   more likely to have a family history of asthma, and more likely to be born preterm than the 'healthy' phenotype (Fig 1, Table 2). These children had a notable allergic profile. They were more likely to have ever eczema, hayfever and allergic sensitivity compared to 'healthy' and had IgE sensitivity to each of cat, dog, horse and birch (Fig 2, Table 6). In addition they were    Table 3). 56% of children with 'mild asthma' phenotype used asthma medication in the last 12 months. This included 29% using regular preventers and 37% using periodic preventers (Table 4). Only 15% had taken a short acting beta-agonist (SABA) more than twice in the last week. Average exhaled NO concentration in these children was higher than 'healthy' by 6.05 parts per billion (Table 5). PreFEV1 and pre FEV1FVC ratio were reduced, but improved to 'healthy' phenotype lung function after taking terbutaline (Table 5).

Moderate asthma
Children with this phenotype had current asthma and wheeze that was disturbing their lives. Approximately half had experienced more than four episodes of wheeze in the last twelve months ( Table 1). All reports of uncontrolled asthma symptoms in the last 12 months were found in this cluster-asthma limiting speech to one or two words between asthma attacks (n = 5), admission to hospital (n = 2), unscheduled visit to the emergency or general practitioner for asthma (n = 8). In addition, there was a high probability of having breathing difficulties due to asthma more than once per week, or disturbance to daily activities by asthma in the last 12 months. Characterisation: similar to 'mild asthma', children with 'moderate asthma'  5 (1.1, 1.9 phenotype were more likely to have a family history of asthma compared to 'healthy' phenotype (Fig 1, Table 2). A notable feature of this group was the risk of also having hayfever was higher than both the Table 6). This pattern was similar for allergic sensitivity as well as for each of the specific allergens and eosinophil count (Table 3).  -0.36 (-0.61, -0.12 -0.32 (-0.62, -0.02) a   -0.78± 1.04  β-0.38 (-0.68, -0.08 62± 1.10  β -0.39 (-0.69, -0.09) a   -0.70± 1.08  β -0.47 (-0.86, -0.07) a   -0.96± 1.05  β -0.72 (-1.08, -0.037) c   -0.81± 1.15  β -0.58 (-0.92, -0.24  In comparison with 'mild asthma' phenotype, the 'moderate asthma' phenotype were more likely to be using: any asthma medication (RR 1.5 95%CI 1.1, 1.9), SABA twice a week (RR 2.4, 95%CI 1.2, 4.9), and regular (RR 1.6, 95%CI 1.0, 2.5) or periodic (RR 1.7, 95%CI 1.1, 2.5) preventers in the last 12 months (Table 4). They had higher exhaled NO concentration than 'healthy' by 11.14, SE 2.9 parts per billion ( Table 5). The respiratory function results were decreased but not significantly different to the 'mild asthma' phenotype (Table 5).

Discussion
Using hypothesis-free testing based on asthma and wheeze symptoms and symptom history we identified four clusters of disease in children and young adolescents: early transient wheeze, current wheeze/asthma, mild asthma and moderate asthma. Further characterization of these phenotypes based on risk factors, allergy and immune profiles, asthma medication use and lung function testing in comparison to a healthy group reinforced the uniqueness of each of these disease states which can be used to better understand and manage asthma and wheezing illness.
'Early transient wheeze' is a well-recognized phenotype and our results are consistent with others findings: 'early transient wheezers' have no or low atopy, no current wheeze, minimal asthma medication use, and stable FeNO [5,6,8,11]. However, despite no current reported symptoms, lung function in this group is reduced and is more prevalent in boys [5][6][7]11]. An explanation for 'early transient wheeze' with decreased lung function could be that these children have smaller lungs and more restricted airways in early life. The Tucson study found that children with diminished lung function at birth were more likely to have early wheeze [24] and that these same children continued to have impaired lung function at age 22 [25]. This finding that early wheeze phenotype is associated with long term impaired lung function has been confirmed by other studies [7,26]. Taken together, this may mean that children with early wheeze may be at risk of obstructive lung disease later in life even if the wheeze is transient and they appear to have no other asthmatic or allergic symptoms.
The 'current wheeze/asthma' phenotype appears to be a generally healthy cluster with occasional wheeze triggered by a range of sources such as viruses and/or allergens. However, it may be useful to identify wheeze triggers and avoidance strategies within this group. A similar phenotype in slightly older adolescents was identified in the Isle of Wight study as 'undiagnosed wheezers' with strong associations with paracetamol use and smoking [27].
Our study found two phenotypes of current asthma identified as 'mild asthma' and 'moderate asthma' in the LCA. The different labels were based on the effect of asthma on day to day life in terms of: disturbing daily activity, waking up from asthma, exacerbations and recent wheeze, all of which were higher in the disturbing asthma phenotype. Both these asthma phenotypes have similarities with 'persistent' wheeze found in other longitudinal wheeze studies, including allergic sensitization, reduced lung function, increased reversibility, parental asthma, and higher FeNO [5,8,17]. It may be therefore that we have identified two phenotypes within the persistent wheeze phenotype. Belgrave et al. used a longitudinal latent class item response model to identify clusters based on parental and clinician reported wheeze over 8 years. They also were able to split persistent wheeze into two further groups 'persistent controlled wheeze' and 'persistent troublesome wheeze' [11]. Those with persistent troublesome wheeze have a similar profile to our 'moderate asthma', that is, they were more likely to be sensitized, have eczema, use ICS medication, have hospital admissions and asthma exacerbations compared to persistent controlled wheezers. A concerning issue is that a large proportion of the children with 'moderate' asthma identified in the STOPPA cohort have regular wheeze and asthmarelated disturbances, and use reliever medication regularly suggesting the asthma is not well controlled. However, less than half take regular medication for their asthma, most using their preventer medication periodically. Taken together, this would suggest that the health and quality of life of those with moderate asthma phenotype would benefit from better medication adherence and monitoring of disease. This is the first study that we know of that has applied latent class analysis to asthma and wheezing symptoms and symptom history taken at a single point in time rather than longitudinally. Although longitudinal analysis is superior in many ways to the cross-sectional study, we sought to identify and characterize phenotypes based on a series of questions that can be used either by a clinician taking a patient history, or for a researcher conducting a survey. The question included in the LCA variables were based on the asthma and wheeze questions in ISAAC [20] and GINA [1] and follow questions commonly asked by clinicians during a consultation with a child and their parent. The value of using LCA is that the cluster groups are not chosen a priori but are determined statistically based on the assumption that all associations between the included variables are due to unobserved latent classes representing disease-specific mechanisms or endotypes, that is, the analysis is 'unsupervised' [10]. The variables used in the characterization or 'supervised' analysis align with further follow up questions regarding risk factors, allergies, medication use and diagnostic tests that the clinician may apply. These confirmed that the phenotypes the LCA had revealed are unique, have clinical relevance and overlap with those found in other studies.
In regards to limitations, this study was cross-sectional which although makes it relevant to apply to patients seen in an irregular manner in clinics it does not capture disease trajectory, nor is it possible to assess the predictive validity of the observed classes for any outcomes. Secondly, there may be bias in the LCA due to parent recall-over or under reporting of the variables, or because of symptom modification from medication usage. Another possible limitation with our study is that it is not as large as other studies and may lack further breakdown into even more specific clusters. However, as the STOPPA cohort is an asthma-rich cohort with asthma pairs selected for inclusion we had increased power to discover asthma and wheeze clusters. Finally, there may be issues with generalizability to singletons as twins which are generally are born earlier and smaller (as can be seen in Table 2). However, the risk of asthma conferred by smaller birth weight and gestational age is observed in young twins, and has dissipated by older childhood on which our study is based. [28] In conclusion, unsupervised analysis of data from respiratory symptom and symptom history questions identified four wheeze/asthma phenotypes and one healthy phenotype in children and adolescents that were shown to have unique physiological, immunological and medication profiles. These phenotypes are largely similar to others found in literature based on longitudinal data, therefore supporting the validity of symptom and symptom history data as a means of identifying clinical and research relevant phenotypes. Further characterization of these phenotypes highlighted that children and adolescents with moderate asthma may be underutilizing preventer medication and over-utilizing acute medications, therefore, reinforcing the continued need for monitoring and treatment management of children with moderate asthma whose asthma.
Supporting information S1