Application of simulation-based CYP26 SNP-environment barcodes for evaluating the occurrence of oral malignant disorders by odds ratio-based binary particle swarm optimization: A case-control study in the Taiwanese population

Introduction Genetic polymorphisms and social factors (alcohol consumption, betel quid (BQ) usage, and cigarette consumption), both separately or jointly, play a crucial role in the occurrence of oral malignant disorders such as oral and pharyngeal cancers and oral potentially malignant disorders (OPMD). Material and methods Simultaneous analyses of multiple single nucleotide polymorphisms (SNPs) and environmental effects on oral malignant disorders are essential to examine, albeit challenging. Thus, we conducted a case-control study (N = 576) to analyze the risk of occurrence of oral malignant disorders by using binary particle swarm optimization (BPSO) with an odds ratio (OR)-based method. Results We demonstrated that a combination of SNPs (CYP26B1 rs887844 and CYP26C1 rs12256889) and socio-demographic factors (age, ethnicity, and BQ chewing), referred to as the combined effects of SNP-environment, correlated with maximal risk diversity of occurrence observed between the oral malignant disorder group and the control group. The risks were more prominent in the oral and pharyngeal cancers group (OR = 10.30; 95% confidence interval (CI) = 4.58–23.15) than in the OPMD group (OR = 5.42; 95% CI = 1.94–15.12). Conclusions Simulation-based “SNP-environment barcodes” may be used to predict the risk of occurrence of oral malignant disorders. Applying simulation-based “SNP-environment barcodes” may provide insight into the importance of screening tests in preventing oral and pharyngeal cancers and OPMD.


Introduction
Genetic polymorphisms and social factors (alcohol consumption, betel quid (BQ) usage, and cigarette consumption), both separately or jointly, play a crucial role in the occurrence of oral malignant disorders such as oral and pharyngeal cancers and oral potentially malignant disorders (OPMD).

Material and methods
Simultaneous analyses of multiple single nucleotide polymorphisms (SNPs) and environmental effects on oral malignant disorders are essential to examine, albeit challenging. Thus, we conducted a case-control study (N = 576) to analyze the risk of occurrence of oral PLOS

Introduction
Oral malignant disorders include cancers of the oral cavity and pharynx and oral potentially malignant disorders (OPMD), such as leukoplakia, oral submucous fibrosis (OSF), verrucous hyperplasia, and erythroplakia [1]. In the world, oral and pharyngeal cancers is the sixth most prevalent cancer [2], and patients with OPMD may experience malignant transformation into oral and pharyngeal cancers [3][4][5][6]. Environmental carcinogens and genetic polymorphisms (single-nucleotide polymorphisms (SNPs)) are essential components in oral malignant disorder expression. Alcohol consumption, betel quid (BQ) use, and cigarette consumption are significantly related to the risk of developing oral malignant disorders, and a joint effect has been observed when several of these factors are expressed simultaneously [7][8][9][10]. Previous studies have demonstrated that environmental and genetic factors are associated in the carcinogenesis of malignant disorders [11,12]. For example, some oral cancer studies have demonstrated that phase I cytochrome P450 (CYP) enzymes may be associated with the metabolic activation of environmental carcinogens [10,[13][14][15][16].
In the human genome, SNPs are the most common sequence variations [17]. SNPs are widely used as suitable markers in association studies of several diseases [18], cancers [19], and in pharmacogenomics [20]. Recently, studies have suggested that SNPs of CYP26 family genes, such as CYP26A1, CYP26B1, and CYP26C1, were associated with susceptibility to oral and pharyngeal cancers [21,22]. Although genes encoding CYP enzymes and environmental factors have been separately evaluated for their effects on the risk of developing oral malignant disorders in previous studies, the combined effect of these genes and environmental factors requires further investigation.
Our previous studies have illustrated that evolutionary algorithms (such as binary particle swarm optimization (BPSO) with an odds ratio (OR)-based method and genetic algorithm) can be used to analyze the associations of multiple SNPs [23][24][25][26][27][28]. Evolutionary algorithms have been used to combine SNPs with genotypes, namely SNP barcodes, e.g. GG, GT, and TT for a SNP with G/T polymorphism [23][24][25][26][27][28]. These SNPs are aligned with environmental factors, which provide the maximal risk difference of occurrence between the case and control groups and can forecast the susceptibility of the risks of oral diseases (e.g., oral malignant disorders) [23,24]. Indeed, we refer to these associations as "SNP-environment barcodes," which can be used to predict the risks of disease. Our method is based on the concept of simulation barcoding to evaluate the risks of oral malignant disorders using SNP-environment combinations [23][24][25][26][27][28].
Moreover, BPSO of the OR-based method can decide the best SNP-environment barcodes without computing each combination separately and can present the optimal SNP-environment barcodes, which are regarded as a new quantitative measure with maximal statistical difference between case and control groups [27]. This study aimed to apply a powerful OR-based BPSO method to solve the issue related to the simultaneous analysis of multiple independent SNPs and environmental factors that are associated with oral malignant disorders.

Participants and data collection
A case-control study (N = 576) was conducted to analyze the risk of occurrence of oral malignant disorders. This study included oral and pharyngeal cancers (n = 242), OPMD (n = 70), and health controls with high prevalence of BQ chewing (n = 264). We identified polymorphisms of CYP26 family genes by searching the SNP database and conducted a hospital-based case-control study to verify these SNP variants and environmental factors are associated with oral malignant disorders. Data from male patients diagnosed with oral/pharyngeal cancers were collected from the Kaohsiung Medical University Hospital in Taiwan. Based on an oral health survey, healthy male subjects were recruited to comprise a control group from a community with high prevalence of BQ chewing. Written informed consent were signed for all subjects and voluntarily provided whole blood samples. This study was approved and ascertained by the Institutional Review Board of Kaohsiung Medical University Hospital (KMU-H-IRB-950315 and KMUH-IRB-20110031). Ethnic categories were classified as Minnan, Hakka, and Taiwanese aborigines. Education levels were classified as �6 and >6 years of education. Data pertaining to substance use status, including alcohol drinking, BQ chewing, and cigarette use, and demographic characteristics were collected by professionally trained interviewers. An alcohol drinker was defined as someone who drank alcoholic beverages (regardless of volume) at least once per week for more than 1 year, BQ chewers were defined as those who chewed at least one quid per day for more than a year, and cigarette smokers were defined as those who smoked at least 10 cigarettes per week for more than 1 year.

SNP-SNP association problem
An OR is one of the main tools for quantifying the association between exposure and outcome in a given population. Furthermore, OR is most commonly applied in case-control studies to assess the combination of risk factors. In this study, the OR was estimated as follows: where D E is the number of exposed individuals with disease, H N is number of healthy individuals who were not exposed, H E is number of healthy individuals who were exposed, and D N is the number of individuals with a disease who were not exposed. The Pearson chi-square statistic is used to identify factor × factor interaction, which in turn was used to assess significance (p < 0.05) in this study. This model measures the interaction between factors (SNPs or environmental factors) and status (genotype or situation) in a two-way contingency table. The estimation value of the test-statistic is shown below: Binary particle swarm optimization (BPSO) BPSO [29,30] is a swarm intelligence-based optimizer inspired by the social behaviors of birds in finding food. In this method, a particle simulates a bird seeking the best solution in an established space for a particular problem. Assuming that a swarm comprises N particles that are moving in a D-dimensional search space, the population position and velocity vectors are represented as X2(x 1 , x 2 , . . . x N ) and V2(v 1 , v 2 , . . . v N ), respectively. The velocity and position of the ith particle are described as . ., v iD ) and are constrained by x i 2 [x min , x max ] D and v i 2 [v min , v max ] D , respectively. The particle moves according to the individual best solution pBest i and global best solution gBest in each iteration. The particles of BPSO X and V were initialized by uniform random values and each particle is a candidate solution for a particular optimization problem. Each particle searches for optimized solutions according to the update formula in the topological space neighborhood in each generation.
Originally, BPSO was developed for the continuous optimization of problems; however, it has not been suitable for discrete optimization of problems. Therefore, the BPSO algorithm, which is an improved form of BPSO, was proposed for solving discrete optimization problems [31]. In the binary version, each particle position x i has been redefined as encoding an integer, e.g. x i 2 {0, 1}. The position update formula is described as follows: where d = 1, 2, . . ., D. c 1 and c 2 are the constant for acceleration coefficients. r 1 , r 2 , and r 3 are the unique random values between [0, 1] at each iteration for each individual dimension. S () is a sigmoid function, which limits the transformation value from 0 to 1.

BPSO for gene (G×G)/environment (E×E) interaction analysis
This section introduces a BPSO algorithm to address the SNP-SNP interaction problem. We improve the fitness function to identify the interaction between OPMD vs. normal and oral cancer vs. normal. The description is written as follows and includes the encoding schemes, fitness function, and BPSO procedure.
where M represents the number of factors and (F m1 , F m2 ) shows the selection of the m th factor in the i th particle in which F m1 and F m2 denote the expression of the particle. These two expressions are based on 0 and 1 variables; thus, four different combinations: (0, 0), (0, 1), (1, 0) and (1, 1) are included. Each particle as a candidate solution denotes the status of whole factors (selected or non-selected) and separates into three different statuses. The four different combinations are shown below: Based on formula (7), four different combinations of (F m1 , F m2 ) can display the selection states of a factor. For instance, a particle in position x i = {(0, 1), (0, 0), (1, 1), (1, 0), (0, 0)} indicates the selection state to be the 1 st , 3 rd , and 4 th factor in status 1, 3 and 2, respectively. Consequently, this expression can be used to represent multiple order interactions.

Fitness function
The best combination of clinical diagnoses for oral cancer and OPMD can be found as a reference to confirm the fitness function of each candidate. We utilized OR, chi-square model, and the OR difference between oral cancer vs. normal and OPMD vs. normal to identify their risk factors. The fitness function formula can be written as: where S () is the sigmoid function, which limits [0, 1] each weight and establishes the weight equilibrium, OR CvN represents the OR value in oral cancer and normal cases, and OR PvN represents OR value in OPMD and normal cases. p indicates the Pearson chi-squared module to assess the significance in OR CvN and OR PvN , which is shown as formula (9).

BPSO procedure
This study utilized BPSO to analyze interaction processes. The detailed BPSO procedure is described as follows.
Step 1) Initialize the population of particles with a uniform random position within {0, 1} and velocity within [v min , v max ].
Step 2) Calculate the fitness of each particle using the OR value, p-value, and the OR difference according to the Eq (8).
Step 3) Update the individual and global best solution pBest and gBest according to the fitness estimation results.
Step 4) Calculate the particle velocity and update the position of each particle according to Eqs (3)-(6).
Step 5) Repeat Steps 2-4 until the stop criterion has been met. The best combination of clinical diagnosis for risk factor of oral cancer and OPMD is consequently obtained.

Genotyping of CYP26 families
Peripheral blood (8 mL) was drawn from the subjects in ethylenediaminetetraacetic acid (EDTA) tubes. Using standard protocol, genomic DNA was extracted from peripheral blood lymphocytes and immediately stored at -20˚C for further analysis. SNPs of CYP26A1, CYP2 6B1, and CYP26C1 with minor allele frequency (MAF) were selected from a public reference Chinese HapMap database, CHB. Therefore, CYP26A1, CYP26B1, and CYP26C1 SNPs were selected with MAF > 10% variants from HapMap-CHB. In addition to MAF >10%, our previous study also indicated that the variants of CYP26A1 rs4411227 were significantly related to an increased risk of malignant oral disorders [21]. As mentioned above, we only selected one rs4411227 for CYP26A1 SNP analysis in this study.
The assay of TaqMan SNP genotyping (Applied Biosystems, Foster City, CA, USA) was used to test genotypes. DNA samples and negative controls were loaded and examined in 96-well plates using the real-time polymerase chain reaction (PCR) system of ViiA™ 7 Biosystems (Applied Biosystems, Foster City, CA). Using the Sequence Detection System (SDS) 2.1 software (Applied Biosystems) fluorescence data were evaluated, and the fluorescence signals were plotted to determine the genotype of each sample.

Statistical analysis
We conducted an OR-based BPSO (python algorithm program) and chi-square test to examine the association between the habits of substance use (alcohol drinking, BQ chewing, and cigarette smoking) and disease groups (OPMD, oral/ pharyngeal cancer and normal controls). Chi-square statistical analyses were performed using the SAS statistical package (version 9.1.3, SAS Institute Inc.). The application performance of SNP-environment barcodes was ascertained by receiver operating characteristic (ROC) curve and maximized Youden's index (sensitivity + specificity-1) was used to determine optimal cut-off points of OR scores for oral malignant disorders between low and high risk populations [32,33].

Characterization of the study population
The gene structure and SNP information of CYP26 gene families are shown in Tables 1 and 2. and genetic landscape of these SNPs was described in Fig 1. There were no non-synonymous variants (amino acid changes) from these SNPs. In the aspect of CYP26B1 (2p13.2), four candidate SNPs are indicated. SNP rs887844 is located in 500B downstream variant, followed by rs707718 in 3' UTR, rs3768647 in 3' UTR, and rs9309462 in intron. We selected only one SNP rs4411227 for CYP26A1 on chromosome 10q23.33 region and rs4411227 is located in 2KB upstream variant. In the region of CYP26C1 (10q23.33), SNP rs8211 is located in 2KB upstream and SNP rs12256889 is located in intron.
The demographic characteristics of and CYP26 polymorphism in individuals with oral/pharyngeal cancers (N = 242), OPMD (N = 70) and in control subjects (N = 264) with high prevalence of BQ chewing are described in Table 3. Five hundred and seventy-six individuals participated in our study. All male participants were over 18 years of age (mean age 48.29 ± 9.88 years). The mean age of control individuals was 44.49 ± 8.53 years, 51.01 ± 11.82 years for patients with OPMD, and 51.64 ± 9.17 years for patients with oral and pharyngeal cancer. Results demonstrated statistically significant age differences among the three groups. In terms of age, the oral and pharyngeal cancer group contained more patients (52.89%) aged over 50, which was higher than that in the controls (21.97%) significantly. Moreover, 55.71% patients with OPMD were aged over 50, which was higher than the controls group (21.97%). The percentage of individuals with low education level was 34.30% in patients with oral and pharyngeal cancers, which was significantly higher than that of the controls (22.35%). The prevalence of consuming alcohol, chewing BQ, and smoking cigarettes was not statistically significant among patients with oral/pharyngeal cancer and OPMD and the control group.
According to the results of genotyping analysis, there was no deviation for all seven SNPs from the Hardy-Weinberg equilibrium in both the oral malignant disorder and control groups. The results showed that genotype frequency for CYP26A1 (rs4411227), and CYP26B1 (rs887844, rs3768647, and rs9309462) were significantly different among the three groups (p < 0.05).

Association between combined polymorphisms of CYP26 and oral malignant disorders
After considering age, gender, race, education, alcohol consumption, BQ chewing, and cigarette smoking, the joint effect (adjusted OR and 95% CI) of specific SNP combinations and environmental factors on the occurrence of oral malignant disorders were obtained (Table 4). Specific SNP (CYP26B1 rs887844 (A/G), CYP26A1 rs4411227 (C/G), CYP26C1 rs12256889 (A/C), and CYP26B1 rs707718 (G/T)) and environmental factor (> 50 years old) combinations significantly increased the risks (5.75-, 3.92-, 4.07-, and 3.34-fold, respectively) for patients with OPMD compared to health controls with high prevalence of BQ chewing. Patients with oral and pharyngeal cancers and specific SNP (CYP26B1 rs887844 (A/G), CYP26A1 rs4411227 (C/G), CYP26C1 rs12256889 (A/C), and CYP26B1 rs707718 (G/T)) and environmental factor (> 50 old) combinations exhibited significantly increased risks (6.76-, 4.16-, 3.94-, and 2.80-fold, respectively) compared to health controls with high prevalence of BQ chewing. Elder subjects with rs887844 (A/G) or rs4411227 SNP (C/G) were more at risk of developing oral and pharyngeal cancers than OPMD. Inversely, elder subjects with rs12256889 (A/C), rs4411227 SNP (G/G) or rs707718 (G/T) SNPs were at lower risk of developing oral and . SNP rs887844 is located in 500B downstream variant, followed by rs707718 in 3' UTR, rs3768647 in 3' UTR, and rs9309462 in intron. (B) One candidate SNP is indicated in genomic structure of CYP26A1 (10q23.33). SNP rs4411227 is located in 2KB upstream variant. (C) Two candidate SNPs are indicated in genomic structure of CYP26C1 (10q23.33). SNP rs8211 is located in 2KB upstream and SNP rs12256889 is located in intron. Exons are denoted by black boxes. pharyngeal cancers than OPMD. Thus, our results suggest the presence of a genome-wide cross-talk between polymorphisms of several genes and SNPs of CYP26, which may be involved in the occurrence of oral malignant disorders.

Best combination model of SNP-environment with maximum risk differences between cases and controls
We used the BPSO-generated barcodes to compute the relative strength of SNP-environment combination effects on the risk of developing oral malignant disorders and determined the d The genotype information of case and control was partly derived from our previous work [21,22] and variables were available at https://github.com/kuochuanwu/ SNPbarcode (S1 Table).
https://doi.org/10.1371/journal.pone.0220719.t003 Table 4. The top ten best risk association models of 2 factors combinations among oral and pharyngeal cancers, OPMD, and controls. risk effects of subjects with oral malignant disorders who exhibited specific SNP-environment combinations or other combinations (Table 5). After using controls (high prevalence of BQ chewing) as a reference group, we found that the elderly people within the Minnan population with SNP rs887844 In the risks assessment, the odds ratio (OR) was used to evaluate the impact of risk association of SNP-environment combinations for oral malignant disorders between high and low risk populations. We used ROC curve to predict the discriminatory power of OR risks of SNPenvironment barcodes for population with oral malignant disorders (Fig 2). A significant application performance (area under the curve (AUC) = 0.755; 95% confidence interval (CI) = 0.750-0.760; p<0.001) of SNP-environment barcodes for oral malignant disorders was found by ROC curve. Maximized Youden's index was used to determine optimal cut-off points of OR scores for predicting high risk population with oral malignant disorders. Optimal sensitivity (91%) for predicting high risk population was achieved with a OR cut-off of greater than or equal to 2.39. Indeed, we provided the best discrimination of OR < 2.39 and OR � 2.39 for oral malignant disorders between low and high risk populations.

Discussion
The occurrence of cancers was implicated in the combined effects between genes and environment, but confirming joint effects was a computational and mathematical challenge. In terms Table 5. Estimated joint effects on models of environmental-SNP combinations associated with oral and pharyngeal cancers, OPMD, and controls. of multiple SNPs, genome-wide association studies (GWAS) have been indicated commonly to discover the associative effects of diseases [34][35][36][37][38][39]. A number of GWAS [40][41][42] and non-GWAS [43,44] studies have recognized the interaction of SNPs. Nevertheless, the association effects based on SNP-SNP interaction between CYP families SNPs and the risk associations in oral malignant disorders are less frequently mentioned. In terms of the computational biological challenge, potential and substantial associations are often hidden in the large number of possible combined effects on numerous SNPs genotypes. A number of studies have indicated computational approaches to ameliorate the effects of association of multiple SNPs analysis [45][46][47][48][49]. Nevertheless, these strategies have been confronted with computational intensity as the number of SNPs increases [50]. Major issues arise from this challenge, along with calculating SNP-SNP relationships in terms of combinations of SNPs corresponding to genotypes and SNPs combined with environmental factors. The approaches of conventional statistics, machine learning, and data mining have been established to evaluate potential effects in GWAS association studies [24,25,[51][52][53][54][55][56]. Therefore, mathematical optimization algorithm of the OR-based method was applied to examine the susceptible risks of several cancers and disorders [23,27,28,53,55,[57][58][59]. The strength of BPSO was used to ascertain the simulation-based association models with rapid and easy statistically analysis.

OPMD vs controls Oral and pharyngeal cancers vs controls
Epidemiological studies indicated that OPMD and oral/pharyngeal cancers have a close relationship with environmental exposure to BQ, alcohol, and cigarettes, particularly in heavy BQ users [7,8,22,[60][61][62]. In 2004, IARC indicated that the areca nut and chewable BQ, particularly those without tobacco, comprised group 1 human carcinogens of the oral cavity [9]. In Taiwan, the 2016 cancer report indicated that oral cavity and pharyngeal cancers were the fourth most prevalent cancers with an incidence rate of 42.43 per 100,000 individuals among males [63]. In addition, it ranked as the fourth leading cause of death due to cancer, and the mortality rate was 15.71 per 100,000 individuals [63].
BQ alkaloids, arecoline and arecaidine can cause bacterial mutagenicity, and in mammalian cells, in vivo or in vitro tests can result in sister chromatographic exchange, chromosomal aberrations, and micronuclei formation [9]. In addition to the areca nut extract, arecoline also induces the dysregulation of oral epithelial cells, leading to cell cycle arrest [64]. We previously reported that arecoline (a major alkaloid in BQ) significantly induced CYP26B1 expression [65]. CYP26 has three isoforms-namely, CYP26A1, CYP26B1, and CYP26C1-which mainly metabolize retinoic acid (RA)-related compounds [66]. RA is a biologically active derivative of vitamin A, which regulates growth, differentiation, and apoptosis of several cell types, and plays an important role in visual physiological function, embryonic development patterns, and adult physiological mechanisms [67,68].
The risk of developing oral malignant disorders may be related to the functionally relevant combined effects of SNPs such as those in CYP26A1, CYP26B1, and CYP26C1 within and between different cancer pathways. Since interactions among multiple genes influenced the risk component of oral and pharyngeal cancers, our rationale for identifying interactions between genes was justified. We developed new and feasible analytical methods to systematically examine the interactions between genome-wide SNPs and various environmental factors. In particular, we investigated the role of combinational SNPs in three metabolism-related genes (CYP26A1, CYP26B1, and CYP26C1) in oral malignant disorders. In association studies of disease predisposition, the interaction analyses of SNPs increased the performance [19,28,58,[69][70][71].
In oral and pharyngeal cancers and OPMD disorders, we also explored the risk factors for genetic variation of complex traits. We hypothesized that two important SNPs (CYP26B1 rs887844 and CYP26C1 rs12256889) within CYP26 may significantly elevate genetic susceptibility to oral and pharyngeal cancers and OPMD. The association between the risk of developing oral/pharyngeal cancers and OPMD and CYP26 SNPs was detected by a robust BPSO algorithm combined with statistical analysis. The BPSO algorithm optimally evaluated the risk effects of CYP26 SNPs for oral and pharyngeal cancers and OPMD. Complex multifactor associations are difficult to analyze statistically [72]. Accordingly, comprehensive approaches to evaluate the best association models with disorder-related factors were used in several studies [54,56,73,74], and these methods have suitable power to test the potential model of associations.
SNPs and environmental factor combinations were calculated using the BPSO method, and their relationships with disease risk were examined by selecting prominent SNPs. Indeed, this algorithm assisted in comprehensively recognizing the genetic basis of complex diseases/traits. Our published study presented that CYP26 is a candidate gene family for assessing the risks of occurrence of oral/pharyngeal cancers and OPMD [10,21,22], and specific SNP combinations of CYP26 may be associated with increased risk of developing oral/pharyngeal cancers and OPMD. Nevertheless, the joint effects of CYP26 SNP-environmental factor combinations on the risk of developing oral malignant disorders were not examined previously using SNP-environmental factors interaction methods.
In this study, we revealed a strong correlation between CYP26 SNP-environmental factor interaction (old age, Minnan ethnicity, BQ chewers, and smoking) and the risk of oral malignant disorder occurrence. Two significant CYP26 SNPs (CYP26B1 rs887844 (A/G) and CYP26C1 rs12256889 (A/C)) were selected using BPSO analysis, which was the best performance model for oral malignant disorder occurrence risk analysis. The analysis suggested that several combinations of environmental factors and candidate SNPs had the highest risk for susceptibility to oral malignant disorders. The risks were more prominent in the oral and pharyngeal cancers group (OR = 10.30; 95% CI = 4.58-23.15) than in the OPMD group (OR = 5.42; 95% CI = 1.94-15.12). This finding implies that older individuals with SNP rs887844 and rs12256889 of Minnan ethnicity who chewed BQ were more likely to suffer from oral and pharyngeal cancers than from OPMD.
In this study, we conducted a powerful binary particle swarm optimization (BPSO) of an odds ratio (OR)-based method to evaluate the joint effect of gene-gene-environment (SNP-SNP-environment) in oral malignant disorders. The BPSO-based SNP-SNP and SNPenvironment interactions were not limited to SNPs on the same chromosome, and our algorithm could control potential confounders and diverse numbers of SNP. Moreover, the BPSO approach calculated the best performance of the SNP or SNP-environment model with the risk of maximum difference between cases and control groups. The ROC curve was used to discriminate OR risks of SNP-environment barcodes for oral malignant disorders as shown in Fig 2. Certainly, the best cutoff points of OR < 2.39 and OR � 2.39 were provided for oral malignant disorders between low and high risk populations. The idea of simulation-based CYP26 SNP-environment barcodes and barcode concepts for population with oral malignant disorders were presented in schematic diagram (Fig 3). Our SNP-environment data can be converted to create barcodes to distinguish between the low and high risk group (e.g., 2-1, and 2-1-1-1-1-1) of oral malignant disorders.

Study limitations
Although we are do not create a barcode system for evaluating the risks of occurrence of oral malignant disorders, the term "barcodes" has been applied in a number of our published papers [23][24][25][26][27][28]. Owing to the difficulty of the statistical analyses in evaluating complex multifactor associations (particularly gene jointed with environment factors), we render the OR-based BPSO method to generate SNP-environment barcodes as a simulation proxy of barcodes to predict the risks of oral malignant diseases susceptibility.
After calculations and a statistical analysis using BPSO approach, and we render combined associations of each high-risk group from SNP-environment data ( Table 5). We clarified the proposed concept of SNP-environment barcodes as shown in Fig 3. SNP-environment data can be converted to create SNP-environment barcodes from each subject and render them prone to the high risk group (OR � 2.39) of oral malignant disorders.

Conclusions
We demonstrated the significant joint effects of CYP26 SNPs and environmental factors on the risk of oral malignant disorder occurrence by BPSO method for the first time. Therefore, the BPSO-based SNP-SNP and SNP-environment approaches for assessing the combined effects of the novel SNP-environment approach may potentially assist in identifying complex biological relationships among cancer processes during the development of oral malignant disorders.
In summary, the combined effects of the novel CYP26 SNP-environment approach may predict the risk of occurrence of oral malignant disorders. Its application as a proxy of SNPenvironment barcodes will provide insights into the importance of establishing screening tests for BQ chewers to promote the prevention of oral malignant disorders.
Supporting information S1