Using association rule mining to jointly detect clinical features and differentially expressed genes related to chronic inflammatory diseases

Objective It is increasingly common to find patients affected by a combination of type 2 diabetes mellitus (T2DM), dyslipidemia (DLP) and periodontitis (PD), which are chronic inflammatory diseases. More studies able to capture unknown relationships among these diseases will contribute to raise biological and clinical evidence. The aim of this study was to apply association rule mining (ARM) to discover whether there are consistent patterns of clinical features (CFs) and differentially expressed genes (DEGs) relevant to these diseases. We intend to reinforce the evidence of the T2DM-DLP-PD-interplay and demonstrate the ARM ability to provide new insights into multivariate pattern discovery. Methods We utilized 29 clinical glycemic, lipid and periodontal parameters from 143 patients divided into five groups based upon diabetic, dyslipidemic and periodontal conditions (including a healthy-control group). At least 5 patients from each group were selected to assess the transcriptome by microarray. ARM was utilized to assess relevant association rules considering: (i) only CFs; and (ii) CFs+DEGs, such that the identified DEGs, specific to each group of patients, were submitted to gene expression validation by quantitative polymerase chain reaction (qPCR). Results We obtained 78 CF-rules and 161 CF+DEG-rules. Based on their clinical significance, Periodontists and Geneticist experts selected 11 CF-rules, and 5 CF+DEG-rules. From the five DEGs prospected by the rules, four of them were validated by qPCR as significantly different from the control group; and two of them validated the previous microarray findings. Conclusions ARM was a powerful data analysis technique to identify multivariate patterns involving clinical and molecular profiles of patients affected by specific pathological panels. ARM proved to be an effective mining approach to analyze gene expression with the advantage of including patient’s CFs. A combination of CFs and DEGs might be employed in modeling the patient’s chance to develop complex diseases, such as those studied here.


Please clarify whether this publication was peer-reviewed and formally published.
If this work was previously peer-reviewed and published, in the cover letter please provide the reason that this work does not constitute dual publication and should be included in the current manuscript. The publication Corbi et al. (2020), Scientific Reports, was peer-reviewed and formally published. Although the patients were the same, the analyses and results were different. The present study submitted to the PLOS One does not constitute dual publication by the following reasons:

Characteristics
The present study submitted to the PLOS ONE Notice that, in the previous paper of Corbi et al. (2020), we used a distinct set of bioinformatics tools to identify DEGs, and we have not made any association with the clinical features of the patients, thus being a more restricted approach.
In the present study, we utilized a different and more powerful bioinformatics tool, more specifically the process of association rule mining (ARM), a highly interpretable approach which is very effective in identifying complex multivariate patterns when both clinical and molecular profiles of patients are considered. Moreover, the combination of CFs and DEGs can be utilized to better estimate the patient's chance of developing complex diseases, such as those studied here. We biologically validated five of the DEGs prospected by the association rules and four of them exhibited a significant difference when compared to the control group.

Reviewers' comments:
REVIEWER #1: Dear Editor, I carefully read the manuscript by Veroneze and collaborators, which regards and interesting and timely study. My comments to improve the paper: -English language is low-quality and needs to be carefully revised before resubmission. The manuscript has undergone a careful proofreading.

-Table 1 -Classification of "Dyslipidemia" is substantially wrong and should be reformulated in accordance with the latest international guidelines
We reformulated the analyses according to the 2018 AHA / ACC / AACVPR / AAPA / ABC / ACPM / ADA / AGS / APhA / ASPC / NLA / PCNA Guideline on the Management of Blood Cholesterol, as it can be observed in Table 1 and throughout the text of our manuscript. In the previous version of our manuscript, we considered 3 levels for the total cholesterol (TC) attribute: 1. TC < 200, 2. TC ∈ [200, 240), and 3. TC ≥ 240. In the current version, we are considering 4 levels: 1. TC < 150, 2. TC ∈ [150,200), 3. TC ∈ [200,240), and 4. TC ≥ 240. So, it is now in accordance with the latest international guidelines.
We also included the non-HDL-cholesterol (N-HDL-C) attribute, where N-HDL-C = TC -HDL, in our analysis, since it has been advocated as a good predictor of cardiovascular disease (CVD) risk.
Therefore, we also redid the analysis with the association rule mining (ARM) method and updated the Results and Discussion section of our manuscript.
-References are obsolete and should be updated. We replaced many references throughout the text, and we also included new and appropriate ones, as recently published as possible. Only two references were not replaced (American Academy of Periodontology [1999] and Löe H. [1993]), because they were used to classify individuals with periodontitis and are classical in the area.
-The flowcharts are unclear. The flowcharts are now better explained along the text and in the caption of the respective figures.
REVIEWER #2: Dear Authors, there are no doubts in the high actuality of the topic related to exploring an interaction patterns of type 2 diabetes mellitus, dyslipidemia and periodontitis based on its inflammatory background. The use of rule-based machine learning methods for identifying an interaction of clinical and molecular patients profiles adds an additional value. The manuscript is well written, clear and properly referenced. There are no major issues that could affect the value of the research paper. If to talk about minor issues, that probably could be taken into consideration for further scientific analysis are: -relatively small size of the samples and the reasonability to increase the number of patients enrolled in order to get more convincing results for its extrapolation Regarding the number of patients investigated here, it was necessary three years to appropriately select them because we had many criteria to meet before enrolling each of them. Certainly, increasing the number of patients tends to enhance the strength and impact of the study, and we intend to do that in future studies. Nonetheless, the high confidence of the obtained association rules, based on a data-driven approach, brings additional robustness to the concluding remarks.
-probably, adding in the analysis patients group with type 2 DM and dyslipidemia with TG > 200 mg/ dL and decreased HDL-C (< 38 mg/dL and 46 mg/dL for men and women respectively) could be interesting in order to identify discrepancies with a group of patients with classic diabetic dyslipidemia.
To comply with this, we included an additional analysis comprising only T2DM patients presenting diabetic dyslipidemia, which are patients from Groups 1 and 2 having TG ≥ 204 mg/dL and HDL < 38 mg/dL. The rules found for this pathologic condition are presented in Table 6. We highlight the rule: FPG = 3, HOMA-IR = 2, TC = 2, HDL = 1, TG = 3 ￫ BOP = 3, as it demonstrates that diabetic dyslipidemia is associated with more than 50% tooth site bleeding, one of the main significant signals of periodontium inflammation. Periodontitis is the most common cause of chronic inflammation in diabetic patients. Both periodontitis and diabetes have detrimental effects on each other in terms of alveolar bone destruction and poor metabolic control, by continuous inflammatory mediator activation .
Moreover, not only quantitative, but qualitative changes in lipoproteins could be analyzed as well as adiponectin and leptin levels in order to better characterize clinical profiles.
Indeed, we have interesting results from a previous study enrolling the same patients studied here that showed significantly higher mRNA levels of leptin in dyslipidemic individuals (Groups 1, 2 and 3). Moreover, those leptin mRNA levels were significantly correlated with periodontal parameters such as BOP, suppuration and mainly CAL ≥ 5 mm . We included those findings in the Results and Discussion section of our manuscript.
The authors.