Risk stratification integrating genetic data for factor VIII inhibitor development in patients with severe hemophilia A

Replacement therapy in severe hemophilia A leads to factor VIII (FVIII) inhibitors in 30% of patients. Factor VIII gene (F8) mutation type, a family history of inhibitors, ethnicity and intensity of treatment are established risk factors, and were included in two published prediction tools based on regression models. Recently investigated immune regulatory genes could also play a part in immunogenicity. Our objective is to identify bio-clinical and genetic markers for FVIII inhibitor development, taking into account potential genetic high order interactions. The study population consisted of 593 and 79 patients with hemophilia A from centers in Bonn and Frankfurt respectively. Data was collected in the European ABIRISK tranSMART database. A subset of 125 severely affected patients from Bonn with reliable information on first treatment was selected as eligible for risk stratification using a hybrid tree-based regression model (GPLTR). In the eligible subset, 58 (46%) patients developed FVIII inhibitors. Among them, 49 (84%) were “high risk” F8 mutation type. 19 (33%) had a family history of inhibitors. The GPLTR model, taking into account F8 mutation risk, family history of inhibitors and product type, distinguishes two groups of patients: a high-risk group for immunogenicity, including patients with positive HLA-DRB1*15 and genotype G/A and A/A for IL-10 rs1800896, and a low-risk group of patients with negative HLA-DRB1*15 / HLA-DQB1*02 and T/T or G/T for CD86 rs2681401. We show associations between genetic factors and the occurrence of FVIII inhibitor development in severe hemophilia A patients taking into account for high-order interactions using a generalized partially linear tree-based approach.


Introduction
For severe hemophilia A (HA) patients, the current standard of care includes regular prophylactic infusions of factor VIII (FVIII) products in order to prevent spontaneous bleeds or on demand infusions to treat bleeds. The main concern nowadays is the development of inhibitors that neutralize the activity of the FVIII molecule, which occurs mainly in the first 20 days of exposure for approximately 30% of the patients. In this context, the search for risk factors for immunogenicity of FVIII products is of primary concern in order to understand the mechanisms leading to the development of inhibitors and ultimately to prevent their development.
Many factors (patient-, disease-or product-related) could influence the potential risk for immunogenicity of biotherapeutics, but the relative contributions of these factors to the development of neutralizing antibodies is currently not completely understood. Several risk factors of inhibition against FVIII products are well recognized, such as factor VIII gene (F8) mutation type, a family history of inhibitors, ethnicity, intensity [1], but others are still under debate. Concerning the product type, it was shown in a randomized prospective trial (SIPPET) that patients treated with plasma-derived factor VIII containing von Willebrand factor had a lower incidence of inhibitors than those treated with recombinant factor VIII [2].
In this search for risk factors of immunogenicity, the genetic diversity of immune regulatory genes, which may have a role in the immunogenicity of FVIII products, has been the subject of recent investigations [3,4]. Table 1 gives a summary of recently published results, which have focused on specific HLA alleles and immune genes.
For the purpose of risk prediction, researchers have also investigated the potential of classic regression models for quantifying an individual's risk using a clinical scoring system. In this context, two studies have investigated immunogenicity risk factors and proposed prediction scores [16,17]. The first study was based on 332 previously untreated patients (PUPs) born in the 90s and the method was validated on an external cohort of 64 patients. Predictive factors were positive family history, high risk of F8 mutation and initial intensive treatment [16]. The second study included 825 PUPs born 1990-2007 and was validated internally. The predictors were the same as in the first study with an improved definition of treatment intensity combining dose and duration of intensive treatment [17].
Our main objective was to identify bio-clinical and genetic markers associated with the development of FVIII inhibitor taking into account potential genetic high order interactions. The reason for also considering non-linear approaches is that high order interactions between genetic variants are to be expected. We thus decided to consider a recursive, partitioning model (tree-based model) which is better suited for exploring such interactions than standard regression models.
The increasing interest in using regression tree-based models for studying hemophilia A has been clearly discussed by Henrard et al. in a recent article [18]. Compared to a generalized linear model (linear, logistic), the authors underlined the fact that regression trees can easily cope with interactions and identify them in the final model. Such interactions are of primary concern when dealing with HLA markers, and the use of regression trees could provide meaningful subdivisions of HLA markers that are in strong linkage disequilibrium [19].
In this research, we used a hybrid strategy that combined a linear structure for the variables that were not expected to interact with each other and a tree-based structure for those that were expected to interact. In practice, the tree-based model is considered in a second step after adjustment on the variables that are found to be significant in the multivariate logistic model. The main aim is to be able to detect important genetic interactions that could have been overlooked by the logistic model. This tree-based model is a hybrid multivariate structure called Generalized Partially Linear Tree-based Regression (GPLTR) that integrates the advantages of generalized linear regression and tree-based models. The linear part is used to model the effect of the known classic risk factors and the nonparametric tree part is used to capture the distributional shape of the data using the complex joint effects of multiple genetic explanatory variables.
The analyses were performed on cohorts of hemophilia A patients with a long follow-up from two German centers that are reference laboratories for anti-FVIII antibody testing. From this dataset, we evaluated the combined role of clinical and genetic factors in the immunogenicity of FVIII products on a selected population of severe HA PUPs in order to reduce the magnitude of any potential confounding variables associated with therapeutic changes over time.

Design and study population
The population eligible for inclusion in this historical cohort analysis was selected from two German sites (Bonn and Frankfurt) under the leadership of the ABIRISK EU-consortium. For Bonn, follow-up data was collected prospectively for HA patients (children and adults) treated with FVIII products since 1967. The Bonn database was created in 1978 and became an electronic system in 1990. For Frankfurt, data for HA patients having entered services since 1979 was collected in an electronic database created in 2005. For both databases, patients could have been treated previously elsewhere before registering at the current sites.
In all, 593 patients with severe HA from the Bonn database and 79 from the Frankfurt database were included in the present study. Data were fully anonymised prior to access for the analysis.
SNP variants for IL-10 1082A>G (rs1800896), CTLA4 CT60A>G (rs3087243), TNF 308G>A (rs1800629), CD32 500 A>G (rs1801274), MAPK9 (rs4147385) were genotyped. For the CD86 gene, four biallelic SNPs were investigated: rs2715267 in the promoter region, rs2681417 in the exon 4 region, rs1129055 in the exon 7 region and rs2681401 in the untranslated transcribed region (UTR). These four biallelic SNPs were selected as candidate SNPs for the analysis, since a team from the ABIRISK consortium has previously shown that antigen presenting cells are activated when exposed to specific FVIII concentrates and that this activation correlates with CD86 expression levels [20].
All these biallelic polymorphisms were detected by PCR amplification and direct sequencing.
Regarding HMOX-1, the 5'-flanking region of the HO-1 gene containing the (GT) n dinucleotide repeat was amplified as described elsewhere [21]. Each repeat number was calculated with GeneMapper (Applied Biosystems). To confirm the size of (GT) n repeats, selected samples were subjected to a sequence analysis.

Candidate variables
Data on patient and disease characteristics, such as the mutation type of the F8 gene classified as "high risk" (large deletions, nonsense mutation, intron 22 or intron 1 inversions) versus "low risk" (small deletions/insertions of <200 base pairs, missense mutations, and other Pfizer for research support and speakers bureau, grants and personal fees from Shire for research support and speakers bureau, grants and personal fees from Sobi for research support and speakers bureau, personal fees from Roche for expert testimony, outside the submitted work. The research leading to these results has received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement no [115303], resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies. J.E. Davidson, A. Hincelin-Mery: belong to EFPIA (European Federation of Pharmaceutical Industries and Association) member companies in the IMI JU and costs related to their part in the research were carried by the respective companies as in kind contributions under the IMI JU scheme. This does not alter his adherence to PLOS ONE policies on sharing data and materials. All the company-employed authors declare that this does not alter their adherence to PLOS ONE policies on sharing data and materials. mutations including splice site defects), family history of factor VIII inhibitors, and blood group, were considered in the present study. The type of the first FVIII product (plasma-derived versus recombinant product) was available for a restricted sub-cohort of patients who were first treated in the center hosting the database after its creation, or who had a reliable documentation of their first treatment if they were treated elsewhere. For readability, this sub-cohort is referred to by the term "well-documented" population.
The genetic factors mentioned in the previous section were also considered as potential markers.
Data was standardized into a common format and entered into the ABIRISK tranSMART database, which integrates immunogenicity data from a number of European countries in a well-structured format. The data is harmonized across different cohorts and the format is based on the CDISC standards (http://www.cdisc.org/). Where no previous variable description can be found in CDISC, a local variable description is used. Data was prepared by data custodians using a data load plan describing the variable semantics and format. Anonymized data was uploaded into the ABIRISK database, which is based on the tranSMART platform [22]. The tranSMART platform is an open-source knowledge management platform for translational science, supported by a large number of organizations (http://www.transmart foundation.org/).

Outcome definition
The inhibitor status of hemophilia patients was defined as being positive (inhibitor patient group, inh+) or negative (control patient group, inh-) from laboratory results of current Bethesda assays, or historical results for earlier patients. Positivity in laboratories is defined by two consecutive assays with a Bethesda assay result of 0.6 BU or more and no detectable FVIII activity assessed by FVIII activity assay below 1%. All the patients included had been treated for at least 50 days (50 days of exposure).

Statistical analyses
The selection of patients for the different analyses is described in a flowchart in Fig 1. In the Bonn entire cohort, people were born from 1920 to 2012. This interval was cut in three smaller intervals: before 1978, 1978-1995 and after 1995. Inhibitor positivity rates were different in these three categories.
A descriptive analysis of patient, disease and treatment characteristics for the entire dataset was performed with univariate logistic regressions taking into account birth cohort effect and "well-documented" status. To avoid blurring any association between a variable and the outcome, these analyses were run separately on Bonn and Frankfurt datasets: a first exploration was carried out on Bonn, then a confirmation on Frankfurt's dataset. If results were consistent, analyses were performed on the pooled cohort.
For each polymorphism, genotype frequencies were determined. Descriptive analyses of genetic markers were performed with chi-square tests (or Fisher's exact test as needed) and univariate logistic regression. A classic additive genetic model was considered.
The multivariate analyses were performed only on the "well-documented" population as previously defined, in order to obtain unbiased analyses. Risk stratification of factor VIII inhibitors Two multivariate models were considered. The first model was a classic multivariate logistic regression model where both clinical and genetic factors were candidate variables. The model was constructed using a stepwise forward strategy, starting with variables having univariate pvalues below 0.2, and proceeding with entry and removal criteria at p-values of 0.05.
The second model was the Generalized Partially Linear Tree-based Regression (GPLTR). This tree-based model provides a classification of patients in homogeneous groups in terms of risk of inhibitor development and identifies relevant genetic interactions. This procedure is a hybrid multivariate approach combining both GLM (generalized linear models) and CART (classification and regression trees). The GPLTR modeling procedure is described elsewhere [23]. Briefly, an iterative procedure is used in a first step to build trees that are grown to maximum size and adjusted on selected variables. A forward procedure is used in a second step to compute a set of nested subtrees. The optimal GPLTR tree is finally selected via the Bayesian Information Criterion (BIC).
As regression-trees are prone to instability, i.e. a small change in the data set can result in different series of splits, thus making variable selection somewhat precarious, we also constructed multiple trees using a bootstrapping approach [24]. This procedure is not intended to give prediction since our dataset is underpowered for reaching such objective. However, it is a way to evaluate the stability of the association results. More precisely, the bagging procedure enables computation of variable importance measures that assess the relevance of each variable across the set of trees. It provides a way to rank the variables according to their discriminative power. Variables that are associated with the outcome have large values as compared to those who are not associated with. These latter results provide us some arguments regarding the reliability of the selected optimal GPLTR tree.
Statistical analyses were performed using R software (version 3.3.1. software). Haplotype analyses were performed using the 'HaploStats' R package. Regression tree analyses were performed with the 'GPLTR' R package [25].

Results
The dataset comprised 593 and 79 patients with severe HA from the Bonn and the Frankfurt databases respectively. Univariate analyses for well-established risk factors were performed separately. A pooled analysis was performed when results exhibited the same trend.
As shown in Fig 1, the analyses excluded 442 patients for whom we had no reliable information on the first exposure to FVIII products.
Of the 586 patients from Bonn and the 79 patients from Frankfurt, respectively 113 (19%) and 32 (41%) developed inhibitors. The analysis of birth cohort effect showed that, in Bonn, the incidence of inhibitors increased over time: 12% of the patients born before 1978 developed inhibitors, 23% between 1978 and 1995 and 44% after 1995. There was no significant difference between birth cohorts in Frankfurt (22% before 1995 and 19% after 1995).

Univariate analysis
Patients with an F8 mutation risk considered as "high risk" had a 3.61 times greater risk (95% CI 2.13-6.40) of developing inhibitors compared to patients with a "low risk" according to the pooled analysis of the Bonn and Frankfurt data ( Table 2). Among patients with a history of hemophilia A, the presence of a family history of inhibitors was associated with a 5.94 times greater risk (95% CI 2.73-13.29) of developing inhibitors in the Bonn dataset. This information was not available in the Frankfurt dataset. Patients with a blood group other than O were more likely to develop inhibitors than those with group O. This association did not reach statistical significance in the pooled dataset, with patients with blood group other than O having a 1.46 times greater risk (95% CI 0.94-2.31). As shown in the flow-chart (Fig 1), the type of FVIII at first exposure could only be analyzed in the "well-documented" subgroup of the patients. In the Bonn subgroup, patients first exposed to recombinant FVIII developed more inhibitors than those exposed to plasma-derived products (OR = 2.18, 95% CI 0.97-4.99). This association was not observed in the Frankfurt database.
HLA class II markers and immune response genes IL-10, TNF-a, CTLA-4, CD32, MAPK9 and CD86 were analyzed for their association with inhibitor development in 142 patients from the Bonn database. (Fig 1). We found no statistically significant deviation from the Hardy Weinberg equilibrium for any of the candidate SNPs. Table 3 gives the univariate odds ratios for each genetic factor.
For immune genes, only IL-10 and CD86 showed a significant association. IL-10-1082 G>A was associated with a lower risk of inhibitor development (OR = 3.36, 95% CI 1. 25-9.45 for the A/A genotype compared to G/G). One SNP from the UTR region of gene CD86, rs2681401, was associated with a lower risk of inhibitor development (OR = 0.25, 95% CI 0.08-067 for the T/T genotype compared to G/G).

Multivariate analysis
The variables associated with inhibitor development in univariate analyses remained statistically significant in multivariate logistic regression ( Table 4). The HLA-DRB1 � 15 allele, which is in linkage disequilibrium with DQB1 � 06, was the one retained in the final model as it was the one with the strongest association with inhibitor development. The AIC (Akaike Information Criterion) for the final multivariate logistic model was of 142.31. For the analyses based on the optimal GPLTR tree, the F8 mutation type, a family history of inhibitors and the type of FVIII product were included in the linear part of the model. For the tree part, all genetic factors were candidates to build the hybrid tree-based model. The final model selected the following variables: HLA-DRB1 � 15, CD86, IL-10, and HLA-DQB1 � 02 as risk factors in the tree part (Fig 2).
The optimal GPLTR tree identified three groups of patients according to their category of HLA-DRB1 �   Patients found negative for HLA-DRB1 � 15 and HLA-DQB1 � 02 and genotype T/T or G/T for CD86 (rs2681401) were at low risk for immunogenicity (OR = 0.13, 95% CI 0.04-0.37) whereas patients found positive for HLA-DRB1 � 15 and genotype G/A or A/A for IL-10 (rs1800896) were at high risk for immunogenicity (OR = 3.30, 95% CI 1.14-10.24).
Results from the random forest (bagging procedure) associated with the tree-based model provide a ranking of the variables based on deviance importance scores (Fig 3). HLA-DRB1 � 15, IL10, CD86 and HLA-DQB1 � 02 which were the variables selected in the optimal GPLTR tree are among the five variables with the higher scores.

Discussion
In this study, we investigated the association of bio-clinical and genetic markers with the development of FVIII inhibitor taking into account potential genetic high order interactions.
The classic bio-clinical factors selected by our analyses were in line with those reported in previous studies. The F8 mutation type and a family history of FVIII inhibitors were confirmed as being associated with inhibitor development in severe hemophilia A patients. The type of FVIII product (recombinant versus plasma-derived) was also selected in our analyses, this result being consistent with those of SIPPET. These bio-clinical variables were considered in the linear part of the model for both the logistic and the tree-based model. Intensity of the first treatments was not available in the dataset and could not be assessed. Taking into account these three variables, the GPLTR model identified three groups of patients according to their category of HLA-DRB1 � 15, CD86, IL10, and HLA-DQB1 � 02 with probabilities equals to 0.18, 0.47 and 0.77 to develop inhibitors.
In the multivariate logistic model, the classic categorization of F8 mutation type as "low" versus "high" risk shows an odds ratio in the same range (around four) as in the previously published methods [16]. A family history of FVIII inhibitor was also associated with inhibitor development. Interestingly, even with adjustment on genetic factors, the odds ratio remains high (around seven) suggesting that other genetic risk factors contribute to this hereditary risk.

Risk stratification of factor VIII inhibitors
The multivariate logistic regression model showed that HLA-DRB1 � 15, CD86, and IL-10 variants had an impact on the occurrence of neutralizing inhibitors. HLA-DRB1 � 15 has already been shown to be associated with inhibitor development [5]. In the present study, other HLA markers, such as DRB1 � 01, DQB1 � 06, were associated with inhibitor development in the univariate analysis but not in the multivariate analysis. As these markers are in strong linkage disequilibrium, the final multivariate logistic model only selected the one that made the highest contribution. Previous studies reported an association between IL-10 polymorphisms and inhibitor development [3,[8][9][10][11]. We showed that IL-10 (1082A>G) is a risk factor of inhibitor development. The protein encoded by the IL-10 gene is a cytokine that has pleiotropic effects in immune regulation and inflammation. In vitro studies indicate that the 1082G allele is associated with higher IL-10 production and the A allele with lower IL-10 production [26]. We observed that the 1082A allele was associated with a higher probability of inhibitor development, suggesting that low IL-10 production is associated with higher inhibitor risk. For the first time to our knowledge, we report that one SNP in the UTR region of CD86 gene (rs2681401) was associated with a lower risk for T/T and G/T genotypes. Interestingly the tree representation suggests that these three genes, HLA-DRB1 � 15, IL10 and CD86, are part of a same immune cascade, with CD86 involved in activation of the antigen presenting cells [20], HLA in the presentation to T-cells and IL-10 in the immune regulation. We could not confirm any association of SNPs in the TNFalpha, CTLA4 and HMOX1 genes with inhibitors, but the sample size may be too small to detect it. Taking into account F8 mutation type, family history of inhibitors, and type of FVIII product, the optimal GPLTR hybrid tree-based model also selected HLA-DRB1 � 15, CD86, and IL-10 and provided additional information by also selecting HLA-DQB1 � 02. Of note, based on these genetic markers, the optimal GPLTR tree provided a partition structure of the whole population under study that formed more homogeneous groups for inhibitor development. The hybrid tree-based model showed that the highest risk for immunogenicity was observed in patients with positive HLA-DRB1 � 15 and IL-10 genotype G/A and A/A. Of the 30 patients (24% of the total) in this group, 23 experienced FVIII inhibitors. In contrast, the lowest risk group for immunogenicity was defined by negative HLADRB1 � 15/ HLADQB1 � 02 and CD86 (TT or G/T). In this group, among the 33 patients (26% of the total), only 4 experienced inhibitors. The other groups formed by the tree-based model had an intermediate risk. Results obtained from the bagging procedure confirmed the importance of the selected variables and suggest that the final tree-based model is sufficiently reliable. It is also worth noting that the GPLTR model provided a better fit for the data according to the Akaike information criterion compared to a classic multivariate logistic regression approach.
Concerning blood group, patients in blood group O were less likely to develop inhibitors but the association was not statistically significant in the univariate analysis combining the Bonn and Frankfurt cohorts (p = 0.10). In the eligible subset of the Bonn cohort, blood group was also not significant in the multivariate analysis. This could result from a lack of power. It is worth noting that this variable is classified as the nineth in terms of fit scores by the bagged GPLTR procedure. An association between blood group and inhibitors has previously been reported [27]. Given the data on the potential role of von Willebrand factor (VWF) in immune recognition of FVIII [28] and inhibitor development [2], it is of note that levels of VWF differ in individuals with different blood groups. This association warrants further investigation.
The ABIRISK project, as a collaborative initiative, enables data from different sites to be pooled. Investigation of the main clinical factors (patient-, disease-and treatment-related) was based on the historical cohorts of Bonn and Frankfurt. Datasets were however rather different in terms of data availability, especially concerning genetic information. In addition, a selection of population in terms of homogeneity in birth cohort and follow-up was necessary. While the descriptive analyses were carried out on the whole population, a small subpopulation from the Bonn cohort was used for the multivariate analyses. Mostly patients treated previously elsewhere were excluded as some were patients treated a long time ago and as there were no reliable information on first treatment for these patients. With this unbiased but reduced population, however, comes a decrease in power for the statistical analyses. Our study has some limitations that should be mentioned. Since it is an observational study, we could not exclude some bias between negative and positive inhibitor patients, these latter been more well documented. The majority of the population are Caucasian but we could not exclude some heterogeneity in the ancestry background of our population. Moreover, our dataset has a reduced sample size and additional studies with larger sample size are required to strengthen our findings.

Conclusion
The present study investigates the relationship between genetic factors and FVIII inhibitor development in severe hemophilia A patients, together with F8 mutation type, a family history of inhibitors and FVIII product type. It relies on a hybrid tree-based model which is well suited to investigate high-order interactions. The final optimal tree distinguishes two groups of patients: a high-risk group for immunogenicity with positive HLA-DRB1 � 15 and IL-10 genotype G/A and A/A, a low-risk group for immunogenicity with negative HLADRB1 � 15/ HLADQB1 � 02 and CD86 genotype T/T and G/T.