Hidden noise in immunologic parameters might explain rapid progression in early-onset periodontitis

To investigate in datasets of immunologic parameters from early-onset and late-onset periodontitis patients (EOP and LOP), the existence of hidden random fluctuations (anomalies or noise), which may be the source for increased frequencies and longer periods of exacerbation, resulting in rapid progression in EOP. Principal component analysis (PCA) was applied on a dataset of 28 immunologic parameters and serum IgG titers against periodontal pathogens derived from 68 EOP and 43 LOP patients. After excluding the PCA parameters that explain the majority of variance in the datasets, i.e. the overall aberrant immune function, the remaining parameters of the residual subspace were analyzed by computing their sample entropy to detect possible anomalies. The performance of entropy anomaly detection was tested by using unsupervised clustering based on a log-likelihood distance yielding parameters with anomalies. An aggregate local outlier factor score (LOF) was used for a supervised classification of EOP and LOP. Entropy values on data for neutrophil chemotaxis, CD4, CD8, CD20 counts and serum IgG titer against Aggregatibacter actinomycetemcomitans indicated the existence of possible anomalies. Unsupervised clustering confirmed that the above parameters are possible sources of anomalies. LOF presented 94% sensitivity and 83% specificity in identifying EOP (87% sensitivity and 83% specificity in 10-fold cross-validation). Any generalization of the result should be performed with caution due to a relatively high false positive rate (17%). Random fluctuations in immunologic parameters from a sample of EOP and LOP patients were detected, suggesting that their existence may cause more frequently periods of disease activity, where the aberrant immune response in EOP patients result in the phenotype “rapid progression”.


Introduction
Periodontitis is a complex disease with multiple causal factors (bacteria and viruses, life style, (epi)genetic background, systemic diseases, tooth and dentition related and most likely stochastic factors) interacting simultaneously in an unpredictable and nonlinear manner [1][2][3]. However, as local interactions are in general chaotic (sensitive to initial conditions and aperiodic), the system i.e. the disease, eventually evolves and self-organizes; it results in the ultimate emergence of a pattern that allows us to evaluate the system using statistical methods and mathematical modelling [4][5][6]. The old classification scheme for two decades recognized two clinical forms of periodontitis: chronic (CP) and aggressive (AgP) periodontitis [7]. The identification of AgP cases was based on rapid attachment loss and bone destruction, the absence of systemic factors to explain this progression rate and familial aggregation [8]. The age of 35 years was used arbitrarily as a cut-off point to discriminate between AgP and CP [9]. However, AgP and CP share genetic and other risk factors and it has been long recognized that cases of AgP can occur also in people aged over 35 years and that cases of CP can occur in people below this age [8][9][10]. An aberrant immune response (hypo-or hyper-response and/or lack of resolution) has been described to associate with advanced periodontitis, irrespective of being AgP or CP [1,2]. Also, limited differences between the gingival tissue transcriptional profiles of AgP and CP have been reported [11]. There is little consistent evidence that AgP and CP are different diseases [12]. The new periodontitis classification scheme [13] recognizes AgP and CP as one entity with 4 stages of severity and 3 grades of prognosis. Empirical evidence-driven thresholds of attachment loss were used to differentiate levels of periodontitis severity [14], while grades recognize risk factors that influence periodontitis progression and classify initially patients by a historybased analysis as patients with slow (grade A), moderate (grade B) and rapid progression rate (grade C).
The immune response to the invading periodontal pathobionts and viruses triggers a nonlinear destructive process for periodontal ligament and alveolar bone loss [2,15]. Nonlinearity means that a small change in them may have disproportionally large effects on their final behavior. Random fluctuations in a complex system are found inevitable. Their significance to gene expression and cell function are well recognized [16], however, they have not yet been explored in the pathogenesis of periodontitis.
In biological systems random fluctuations (also called anomalies or noise) might be responsible for certain phenotypes, as added anomalies to a nonlinear system might change its behavior with unexpected aberrant activity [17,18]. It is often observed in bistable systems, i.e. the existence of two stable states, such as the alternation between periods of exacerbation and remission in susceptible and chronically diseased subjects [19]. There is evidence that a small part of the population exhibits severe periodontitis while the majority of patients show mild to moderate periodontitis [20]. In a longitudinal study on a sample of unlabeled periodontitis patients followed over 5-8 years [6], we found possible evidence of two groups of patients on the basis of longitudinal radiographic bone loss. One out of 5 patients showed almost 5 times higher progression rate. Gene networks can generate bistable states [17] and bistability is a finding that supports the importance of random fluctuations (noise) to the emergence of a phenotype of periodontitis with rapid progression rate.
We hypothesize that random fluctuations in immunologic parameters of periodontitis patients might constitute the host response extra vulnerable to the bacterial challenges and might explain more frequent and longer periods of exacerbation resulting in the advanced tissue destruction found in the rapid progressive form with severe breakdown (new classification stages 3 or 4, grade C [13]) i.e. often the early-onset form of periodontitis (EOP). We aimed to investigate this hypothesis on a group of EOP and late-onset periodontitis patients (LOP), whobased on disease history-are characterized as either having a rapid progression rate (EOP stage 3-4, grade C) or having a slow progression rate (LOP stage 3, grade A). Another group of severe periodontitis patients suspected for EOP (i.e. grade C) served as a validation cohort.

Results
Patient demographic (Table 1) and other characteristics have been described before [5][6]. The validation cohort has also been described and presented in a previous publication [21]. Table 2 presents the data for immunologic parameters. Mean values of IL-1, IL-4, IFN-γ and IgG titer for C.o. were statistically significantly lower in LOP compared to EOP, whereas CD8, CD20, CD4/CD8 ratio and IL-2 and were significantly higher in LOP compared to EOP. The remainder of the immunologic parameters did not show differences between EOP and LOP patients ( Table 2).
The workflow for the final detection of a "rapid progression" phenotype is presented in Fig  1. Principal component analysis (PCA) showed IgG titer against P.g. (SU63), monocyte IL-2 production, CD3 lymphocyte counts, IgG titer against P.g. (FDC381) and monocyte IL-4 production as the principal components explaining 75% of the variance in the aggregate EOP and LOP sample. The subspace analysis aimed at identifying anomalies in the parameters that contribute zero at explaining the variance of the dataset (showing eigenvalue 0 in the scree plot of the PCA analysis) (Fig 2). There were 17 parameters comprising the residual PCA-subspace. They were leukocyte adhesion and neutrophil chemotaxis test results, CD4, CD8, CD20 lymphocyte counts and CD4/CD8 ratio, IFN-γ and IL-1 monocyte production and IgG titers against E.c., P.i., P.n., F.n., T.d., C.o., A.a. (Y4), A.a. (ATCC29523) and A.a. (SUNY67). These 17 parameters were evaluated for anomalies in their structure, firstly by sample entropy estimation and secondly by clustering importance by the two-step clustering method.
Entropy values indicated possible data anomalies for neutrophil chemotaxis, CD4, CD8 and CD20 counts and IgG titer against A.a. (ATCC29523) that might explain more regularly occurring disease exacerbations in EOP patients than in LOP patients (Table 3). These 5 parameters showed squared entropy values �3 (Table 3). Based on the second step of the unsupervised clustering of patients into two groups, we found for these five latter parameters a low clustering importance, also indicating that these parameters are possible sources of anomalies (Table 3, Fig 3). Sample entropy values in the validation cohort showed for these five parameters squared entropy values from 0.15 to 0.76, except for neutrophil chemotaxis that showed a squared entropy value 1.9, being the highest in this cohort with the possible highest value at 2.92 (Table 3). Thus the latter results indicate neutrophil chemotaxis as a parameter with possible anomalies in the validation cohort.
The distribution of local outlier (LOF) scores is given in Fig 4. We found 32% of LOP patients to score between 2 and 2.7, while 35% of EOP patients scored between 3.5 and 4.1 ( Fig  4A). By separating localized from generalized EOP patients we found LOF score distributions to be similar in the two categories, with the generalized EOP category showing a higher maximum value (Fig 4B). Using the identified 5 predictor parameters in the subspace, i.e.

Table 2. Median values [means ± standard deviations] of immunologic parameters and IgG a titers for patients with late-onset periodontitis (LOP) or early-onset periodontitis (EOP), as well as in patients of the validation cohort.
Comparisons between LOP and EOP were made by the Mann-Witney U test (in bold statistically significant results). Data derived from a previous study [21]. neutrophil chemotaxis, CD4, CD8, CD20 counts and IgG titer against A.a. (ATCC29523), for an aggregated LOF, gave 94% sensitivity and 83% specificity in identifying EOP by a k-NN classifier (k = 5 chosen by 10-fold cross-validation), but with lower sensitivity in a 10-fold cross-validation (CV) of the model (87% sensitivity and 83% specificity).

Discussion
We aimed to detect anomalies (random fluctuations) in immunologic parameters from a sample of EOP (stage 3-4, grade C) and LOP patients (stage 3, grade A). We aggregated the two samples to perform LOF measurements that could possibly discriminate EOP from LOP. PCA found IgG titer against P.g. (SU63), monocyte IL-2 production, CD3 lymphocyte counts, IgG titer against P.g. (FDC381) and monocyte IL-4 production as principal components in explaining the variance of the aggregate EOP and LOP sample. On the opposite side, the analysis on the PCA-subspace parameters suggested evidence for anomalies in neutrophil chemotaxis, CD4, CD8, CD20 counts and serum IgG titers against A.a., that might explain more regularly   occurring exacerbations in EOP patients than in LOP patients. Our strategy in anomaly detection was based on large sample entropy values and low clustering importance scores detected by unsupervised clustering of the patients. The two methods have no elements in common, but were found to be in concordance in detecting hidden complexity in the datasets.
Anomalies are difficult to detect in a dataset. Systems evolve over time and what qualifies as an anomaly first might change later. Anomalies of a given size will tend to be harder to detect in parameters with large variance, as compared to parameters with small variance [22]. The boundaries between normal and abnormal behavior are often not precise. The advantage of the current study is the relative "clear" labeling of the patients, which in general requires substantial effort to obtain. Sample entropy calculations in the validation cohort dataset were suggestive for anomalies in the neutrophil chemotaxis parameter. Other anomalies either never existed or if existed, they were no longer identifiable. The validation cohort is certainly a group of patients with severe disease (stage 3), but with a mean age higher than the EOP group. We can assume that anomalies can be found for a period of time and over the years the situation might change, perhaps due to treatment interventions. The smaller number of patients in the validation cohort might have prevented anomalies to be revealed. On a population level, bistability is observed by two modes (peaks) in probability density distributions. We found in a previous study [6] on unlabeled periodontitis patients well-maintained over 5 to 8 years, possible evidence of periodontitis being a bistable system (showing  two main stable states). The smaller cluster showed radiographic bone loss level change 5 times more at average than the bigger cluster. Random fluctuations in immunologic parameters might push a nonlinear system (like periodontitis) from one state to the other [16]. Thus our current findings support the concept that EOP patients with rapidly progressive periodontal breakdown, having their "basal" set of causality factors, might convert more often and more severely in an exacerbation phase before the system regresses in a resolution (remission) phase [2]. A recent study identified three clusters of periodontal patients (phenotypes) on the basis of clinical, radiographic and microbiological data [23]. Finding pathophysiological pathways and our understanding of the periodicity of the disease, might identify endotypes within phenotypes, which in turn might enhance our prognostic and therapeutic abilities in clinical practice.
The hypothesis that stochastic gene expression has a significant effect on the biology of organisms was based on the observation that genetically identical organisms, maintained in identical environments, diverge phenotypically [16,17]. Fundamentally, this is because the expression of a gene involves the discrete and inherently random biochemical reactions involved in the production of mRNAs and proteins [16]. Fluctuations do not average away, but rather lead to differences in the function of otherwise identical cells [17]. In an alternative hypothesis, the stochastic kinetics of gene activity may be genetically determined by the promoter variation, which dictates various regulatory elements like histones and transcription factors, how to bind and unbind to their corresponding binding sites [24]. In this respect, epigenetic modifications of the genome, can equally be contributing to altered promoter activity and cause genes to behave in an aberrant way [25]. It must be noted that the current study was conducted on patients with a distinct genetic/epigenetic background (Japanese) and therefore extrapolating the results further to other populations needs to be performed with caution.
Predictive models when properly trained and tested (validated) can be applied in detecting anomalies [22] and thus identify potential periodontal patients to develop EOP or patients in an early stage of EOP. This could be helpful in a clinical setting, where EOP patients are considered more difficult and demanding to treat. Subtle changes detected in an early phase might give a warning signal of what could follow and preventive and treatment protocols may be started. Future studies on a wider array of parameters might reveal anomalies from unexpected sources. However, supervised modes of detection are less flexible in catching new anomalies as they cannot automatically adapt to new patterns [22]. We showed in previous studies on the sample used in the current study, that a supervised classification by decision trees [4] and artificial neural networks [5] could discriminate EOP from LOP. However, a correlation of predictive parameters to periodontitis, does not imply causation [26] and it only reflects the clinical status of the patients without providing prognosis. The current study suggests that we can go one step further and predict an ongoing or upcoming exacerbation of periodontitis. However, our LOF approach in predicting EOP provided results that could be generalized with caution due to a relatively high false positive rate (17%). Nonetheless, the high false alarm rates are always a problem in detecting anomalies [22]. P.g. has been reported as a keystone pathogen in periodontitis [27] and IgG titer against P.g. is reported in the current study as the first of the principal components in explaining the variance of the aggregate EOP and LOP sample. Monocyte IL-2 and IL-4 production are also found among the principal components in PCA and are reported IL-2 as significantly higher and IL-4 as significantly lower in LOP patients compared to EOP by mean values ( Table 2). The central roles of IL-2 in regulating lymphocytes and of IL-4 in suppressing inflammation have been well studied [28]. The fact that an aberrant immune response in periodontitis constituting a state of hypo-or hyper-response or the inability to resolve properly inflammation, is connecting with the current identified parameters in a nonlinear fashion, explains the complex picture we are receiving [2]. In another example, IFN-γ considered the main phagocyte-activating cytokine, was found in the current study significantly higher in EOP patients, but also found to belong to the sub-space parameters contributing zero in explaining the variance in the sample. The same situation applies to IgG titer for C.o., which was significantly higher in EOP, but also was found to belong to the PCA-subspace parameters. No indications for anomalies were found for all tested PCA-subspace IgG titers except for IgG titer against A.a. (Fig 3). A.a. has been associated with EOP and especially with the localized form of the disease [29]. The presence of A.a. in the oral cavity of young individuals increases the risk for initiation and progression of the disease [30]. However, it is accepted that the microbial composition of the subgingival biofilm cannot discriminate EOP from other periodontitis cases [31]. Antibodies against suspected periodontal pathogens are thought to clear out bacteria and significantly elevated levels of serum antibodies against A.a. have been found in EOP cases [32]. A pre-clinical role of A.a. has also been described. As periodontitis advances, the subgingival ecosystem becomes more anaerobic and more diverse [2,33]. Thus, A.a. may become more prevalent in the subgingival ecosystem, and an anomalous IgG titer against A.a. leaves space for A.a. to exert its pathogenic potential to host immune cells (e.g. via leukotoxin activity) resulting in worsened inflammation and concomitant tissue destruction.
Neutrophils are in the first line of defense against the dental biofilm bacteria and they express a large variety of cell surface receptors to sense the inflammatory environment [34]. The importance of CD4 lymphocytes in the immune response has been extensively studied, while the role of CD8 lymphocytes is not fully understood [35]. We found suggestive evidence in the current study that fundamental immune protective mechanisms like neutrophil chemotaxis and lymphocyte counts of CD4, CD8 and CD20 might be subject to random fluctuations that might result in the rapid progression of EOP. One obvious limitation of the current study originates from the fact that it is cross-sectional and as of that it is unknown how parameters might change in time. The changes that might appear in the anomaly status as a result of treatment is unknown, and therefore a confounding factor in the study might be a history of previous treatment.
This study introduces to periodontitis pathogenesis the well-accepted phenomenon of noise induced phenotypic variation due to stochasticity. By better understanding the mechanisms underlying the clinical expression of periodontitis and by developing predictive models that intercept incoming disturbing anomalies, we might be able to enhance our ability to cope with EOP. When biologically relevant combinations of microbial/immunological/genetic biomarker packages will be available for use in the future, overlaying artificial intelligence algorithms might warn patients to visit the periodontist since an exacerbation with rapid progression of periodontal support is upcoming or ongoing. The personal prediction of risk for disease exacerbation by applying artificial intelligence is currently being explored in other chronic diseases [36,37].

Ethics statement
The Okayama University Dental Hospital committee approved the study [21]. Periodontitis patients were recruited as they presented at the Okayama University Dental Hospital over a period of 10 years. Informed written consent for taking blood for laboratory examination was obtained from each subject.

Study population
We derived data from 162 Japanese periodontitis patients [21] (48 male and 114 female systemically healthy with a mean age 34.6 ± 12.2 years). The raw data set of the 162 patients was used before in studies to explore mathematical models for periodontitis [5,6]. The following parameters were available: neutrophil chemotaxis, phagocytosis and adhesion to nylon fibers, T-cell blastogenesis against anti-CD3 monoclonal antibodies and pokeweed mitogen, as well as counts of CD3, CD4, CD8, CD4/CD8 ratio and CD20 lymphocytes in peripheral blood. In addition we used data of IL-1, IL-2, IL-4, IL-6, TNF-α and IFN-γ levels produced by mononuclear cells from peripheral blood. We also retrieved data from the same patients for serum IgG titers (assessed by enzyme-linked immunosorbent assay (ELISA)) against and Fusobacterium nucleatum (ATCC25586) (F.n.). We obtained 68 EOP (localized and generalized cases aggregated) (mean age 26.2 ± 7.0 years) (stage 3 or 4 with grade C) and 43 LOP (mean age 47.0 ± 11.0 years) (stage 3 with grade A) cases for the discovery analysis. Another group of 51 patients were declared "suspected for EOP"; they had periodontitis stage 3 with grade C (mean age 36.0 ± 9.2 years). These patients were used as a validation cohort.

Laboratory procedures
Cytokine productivity by T-cells was measured after in vitro stimulation with anti-CD3 monoclonal antibody. The amounts of secreted cytokines in the culture supernatants were made using radioimmunoassay for IL-1, IL2 and IFN-γ and ELISA for IL-4, IL-6 and TNF-α. Two color flow cytometric analysis using panels of monoclonal antibodies was employed to determine lymphocyte subsets. T-cell blastogenesis was evaluated by the uptake amount of thymidine ( 3 H). Antibody responses to periodontal bacteria were assessed by the ELISA technique. The correlation coefficient for the line fitting was above 0.90. Neutrophils were isolated from heparinized peripheral venous blood by discontinuous density gradient centrifugation. Neutrophil chemotaxis was assessed using N-formyl-methionyl-leucyl-phenylalanine, neutrophil phagocytosis was estimated by the number of bacteria internalized by 100 neutrophils and neutrophil adhesion was determined using a tuberculin syringe nylon fiber column that allowed blood to flow through by gravity.

Statistical analysis
We compared means of immunologic parameters between EOP and LOP patients using the Mann-Whitney U test with a level of statistical significance set at < 0.05.

Subspace analysis
Each dataset has its typical variation. However, there might be unusual conditions deviating from the typical variation [38]. We searched for collective anomalies, which is the term used when data instances (i.e. collected parameter values) are anomalous with respect to the entire dataset. The cut-off level of the typical variation and therefore the subspace region, can be determined by principal component analysis (PCA) [39]. Therefore PCA was applied on the cellular and humoral (serum IgG titers) immunologic parameters. After extracting the principal parameters that explain the vast majority of the variance of the data (EOP and LOP aggregated) and thus designating the normal variation, i.e. overall susceptibility, the remainder of the parameters were considered part of the residual subspace into which anomalies can be detected [39].
Deviation from the normal was searched by computing the sample entropy for each parameter in the residual PCA-subspace (after normalizing the data), a metric that captures the degree of dispersal or concentration of a distribution [40]. Sample entropy is a sensitive metric for detecting and classifying changes in parameter distributions with a very low false positive rate. When all observations are the same, sample entropy takes the value of 0. On the other hand, high sample entropy values indicate anomalies. To calculate sample entropy we used the formula [40], where x = {n i , i = 1. . ...,N} and S the total number of observations. The maximum value it can take is log (N). Entropy tends to increase as sample sizes increase. We tested the performance of this approach of anomaly detection, through grouping the patients into two classes by the two-step clustering method using the newly identified PCAsubspace parameters as predictors [41]. The two-step clustering method uses both partitional (k-means) for an initial separation of patients and subsequently hierarchical (agglomerative) algorithms. The idea is that parameters with anomalies will confer lower overall clustering importance scores in unsupervised grouping of patients based on log-likelihood distance [41].
Additionally we computed the sample entropy of the residual PCA-subspace parameters for the validation cohort. The purpose of using this cohort was to disclose trends in sample entropy on parameters identified in the discovery cohort belonging to the residual PCA-subspace.
We finally set out to test the performance of the local outlier factor approach (LOF) in parameters with anomalies, to correctly classify EOP and LOP patients. The LOF algorithm assigns an aggregate "outlier" score for each individual in the dataset based on local density calculations [42]. Values outlying relative to their local neighborhoods, particularly with respect to the densities of the neighborhoods, are regarded as "local" outliers. LOF scores are ratios of the density of the neighborhood over the density of local outliers. Anomalies in data result in larger than 1 LOF scores, because outliers show low local densities compared to their neighbors [42]. A k-nearest neighbor classifier (k-NN) was used to identify EOP and LOP patients on the basis of the aggregate LOF scores.
We used SPSS version 20.0 programme (IBM, Chicago, IL, USA) to carry out the above described analyses and WEKA software (version 3.8.1; The University of Waikato, Hamilton, New Zealand) for LOF and k-NN.