Partial Least Square Discriminant Analysis Discovered a Dietary Pattern Inversely Associated with Nasopharyngeal Carcinoma Risk

Evidence on the association between dietary component, dietary pattern and nasopharyngeal carcinoma (NPC) is scarce. A major challenge is the high degree of correlation among dietary constituents. We aimed to identify dietary pattern associated with NPC and to illustrate the dose-response relationship between the identified dietary pattern scores and the risk of NPC. Taking advantage of a matched NPC case–control study, data from a total of 319 incident cases and 319 matched controls were analyzed. Dietary pattern was derived employing partial least square discriminant analysis (PLS-DA) performed on energy-adjusted food frequencies derived from a 66-item food-frequency questionnaire. Odds ratios (ORs) and 95% confidence intervals (CIs) were estimated with multiple conditional logistic regression models, linking pattern scores and NPC risk. A high score of the PLS-DA derived pattern was characterized by high intakes of fruits, milk, fresh fish, vegetables, tea, and eggs ordered by loading values. We observed that one unit increase in the scores was associated with a significantly lower risk of NPC (ORadj = 0.73, 95% CI = 0.60–0.88) after controlling for potential confounders. Similar results were observed among Epstein-Barr virus seropositive subjects. An NPC protective diet is indicated with more phytonutrient-rich plant foods (fruits, vegetables), milk, other protein-rich foods (in particular fresh fish and eggs), and tea. This information may be used to design potential dietary regimen for NPC prevention.

Introduction than 75 years with incident, primary, histologically confirmed NPC and for age (within 5-years), gender, and residence area-matched controls with no history of NPC. In total, there are 378 cases and 372 controls identified. Of these, risk factor questionnaires were obtained from 375 (99%) cases and 327 (88%) controls. Most cases (>90%) were diagnosed as the WHO Types 2 or 3 (non-keratinizing and undifferentiated carcinomas), whereas the remaining cases were diagnosed as the WHO Type 1 (squamous cell carcinomas) [20]. The dietary analyses were limited to 319 incident NPC cases and 319 matched controls with complete information. Excluded were 56 cases and 8 controls with missing values. Baseline socio-demographic characteristics of individuals excluded and those remaining in the study were similar. Institutional Review Boards at the National Taiwan University in Taiwan and the National Cancer Institute in the United States (both under the NCI Special Studies IRB) approved the study protocol and informed consent. Written inform consent was obtained from every study participant.

Data collection
Information on socio-demographic characteristics, smoking habit, alcohol consumption, residential history, and medical history were collected during face-to-face interviews by trained nurses using a structured questionnaire. For cases, interviews were conducted at the time of biopsy for histological confirmation of NPC and before treatment.
We used a 66-food-item quantitative food-frequency questionnaire (FFQ) to query about the dietary history for the time period 10 to 3 years before the diagnosis for cases or before the interview date for controls. For data analysis, these food items were grouped into 24 food groups (S1 Table), based on similar food group characteristics and nutrient contents [21][22][23]. Validation of a similar simplified FFQ designed for the Elderly Nutrient and Health Survey in Taiwan has been previously reported [24]. The major difference between this FFQ and the validated one is that this FFQ included additional questions concerning tea and preserved foods. Participants were asked to indicate their average intake frequency per day, week, month, or year or less than once per year. Using the Taiwan food composition database, total caloric intake was estimated by multiplying the intake frequency for each food item by the nutrient content of the food in an estimated usual portion size [23]. All the food frequency data of the above 24 food groups have been adjusted for total energy intake by the residual method [25] and standardized into a N (0, 1) distribution before being subjected to the dietary pattern analyses. The outliers located 5 standard deviations beyond the mean were excluded.

Statistical analysis
SAS software, version 9.4 (SAS Institute Inc.) was used for all statistical analyses. Demographic variables between cases and controls were compared using Student's t-test for continuous variables and Pearson's χ 2 test for categorical variables. Conditional logistic regression was used to estimate odds ratios (ORs) and 95% confidence intervals (CIs).
For dietary pattern discovery, we used the PLS regression method. PLS is a method that works with two different sets of variables, called predictors and responses. This approach aims to identify the linear combinations of predictor variables Xi (that is, frequencies of the 24 food groups) which best discriminates the dependent variable, i.e., NPC status (disease vs no disease). Detailed description of the methods can be found from Hoffman et al. work [15]. The PLS regression method we used is the PLS-DA [17]. The PROC PLS statement in SAS was used to conduct PLS-DA analyses. The SAS code for the applied SAS procedure PLS is given in the S1 Text.
The dietary pattern identified was constructed further to a score by weighing each food frequency with factor loading values. Individual factor scores were then categorized into quartiles. For interpreting the dietary pattern, we emphasize those food groups with absolute values of loadings ≧0.20.
Finally, conditional logistic regression analysis was performed to determine whether the dietary pattern scores obtained by applying PLS-DA were significantly associated with the risk of being a NPC case. Three multivariable models were constructed with three sets of covariates (see the result section). The dietary pattern scores were treated both as continuous and as categorical variables, based on the quartiles in all study subjects.
In order to observe the characteristics of the 4 dietary groups (first to fourth quartiles of factor score, Q1-Q4), the age-adjusted mean values were obtained for certain potential confounders of NPC such as age, socio-demographics, lifestyle variables, and EBV infection status. Weekly consumption frequencies of food groups were additionally adjusted for total energy intake by the residual method [25].
Because greater than 95% of NPC cases were seropositive for EBV infection and because EBV is considered a necessary risk factor for the development of NPC, we repeat the above analysis after excluding those without EBV infection.

Results
The distributions of the selected demographic characteristics of 319 NPC cases and 319 matched controls are shown in Table 1. The risk profile was similar to those reported in our previous studies [11,18,19,29] based on similar subjects. The male proportion was around 69% for both cases and controls. There was no significant difference in cases and controls regarding age and gender, all of which were matched during recruitment. Cases tended to have more with ancestral origin of Fukienese and Hakka compared with that of the controls. The cases had lower education level compared with the controls. Smokers for more than 25 years had a significantly increased risk of NPC (OR adj. = 1.64; 95% CI = 1.01-2.68). A significantly increased risk was found to be conferred by a family history of NPC (yes vs. no, OR adj. = 6.96; 95% CI = 2.07-23.44). Exposures to formaldehyde, wood dust, and high intake of nitrosamines were associated with increased risks of NPC. Majority of cases (99.4%), but minority of controls were seropositive for anti-EBV markers compared to controls. No association was found between NPC risk and alcohol drinking habit.
The factor loadings of the identified primary dietary pattern obtained from PLS-DA are shown in Table 2. Food groups with absolute values of loadings ≧0.20 were emphasized in the interpretation. We observed higher loading values from fruits, milk, fresh fish, vegetables, tea, and eggs in that order, each of which was inversely associated with the NPC; suggesting that these foods were less frequently consumed by NPC patients.
We constructed a dietary pattern score by weighing food frequency with factor loadings values of all food groups. The distribution of potential confounders by quartiles of the dietary factor scores is shown in Table 3. Those in upper quartiles of factor scores were educated more. Amount of nitrosamine from foods and EBV seropositive proportion decreased across increasing quartiles. Individuals in the fourth quartile of the dietary pattern scores consumed on average 1-2 times of fruits, 5 times of vegetables, and 5 times of protein-rich foods per day, and 4 times of milk per week. Among the protein-rich foods, these people consumed 1 time of fresh fish and 1 time of eggs per day. In contrast, individuals in the first quartile consumed on average much less vegetables (4 times per day), and protein-rich foods (3 times per day); fruits (4 times per week), and milk (around twice per month). Among the protein rich foods, these people consumed much less fresh fish (3 times per week) and eggs (3 times per week). Table 4 presents the ORs and corresponding CIs for NPC by the calculated dietary pattern scores or by the quartiles. Three different multivariate models were constructed. We observed a significant and inverse association between the dietary pattern scores and risk of NPC after controlling for age (M1). This finding was observed when the dietary pattern scores were treated either as a categorical variable (quartiles) or as a continuous variable. One unit increase in dietary pattern scores was associated with approximately 30% lower risk of being NPC patient in the age-adjusted model (OR adj. = 0.70, 95% CI = 0.61-0.82 in M1). Similar results were found in M2 (adjusting for age, ethnicity, educational level, NPC family history, total calories, years of cigarette smoking, and exposures to formaldehyde and wood dust) and M3 (adjusting for covariates used in M2 and additionally for green tea and nitrosamine) [11,29]. The association between the dietary pattern factor scores and NPC was only slightly attenuated in M2 (OR adj. = 0.71, 95% CI = 0.60-0.84) and M3 (OR adj = 0.73, 95% CI = 0.60-0.88), respectively. The subjects whose diet conformed most closely to the dietary pattern (fourth quartile, Q4) had a NPC risk reduction by 62% compared with the subjects whose diet was most different from this pattern (first quartile, Q1) after controlling for all potential confounders (OR adj. = 0.38, 95% CI = 0.21-0.69 for Q4 versus Q1; p for trend 0.001 in M3).

Discussion
The aim of this case-control study was to identify dietary pattern and to document the doseresponse relationship between the identified primary dietary pattern scores and the risk of NPC. We identified a dietary pattern inversely associated with NPC risk, which is characterized by high consumption frequencies of fruits, milk, fresh fish, vegetables, tea, and eggs (in that order). The dietary pattern pinpoints the importance of the phytonutrient-rich plant foods (fruits, vegetables), milk, the protein-rich foods (in particular fresh fish and eggs), and drink (tea). Previous studies investigating the effects of individual foods (or food groups) have consistently found that fresh fruits and leafy vegetables are protective factors against NPC in Chinese [13,[30][31][32][33][34], North Africans [35], and some low risk populations [8,36], which were also confirmed by a meta-analysis study [37]. Many kinds of nutrients or food constituents rich in fruits and vegetables have been associated with reduced risk of cancer in general, including fiber, vitamins, folate, and carotenoid [6]. An inverse association between NPC risk and consumption of fresh fish and green tea was observed in our previous report [11]. The inverse relation between fish and cancer had been observed and described in reviews and meta-analyses [38,39]. The potential mechanism of the protective effect of fish consumption maybe the wellknown anti-inflammation effect of n-3 fatty acids [40]. In addition, tea consumption has been associated with a decreased occurrence of NPC in another case-control study conducted in southern China [41]. There is considerable evidence that extracts of tea and tea polyphenols have been shown to inhibit the formation and development of tumors at different organ sites in animal models [42]. However, apart from the above the above-mentioned protective  associations (with vegetables, fruits, tea, and fish) and the previously reported risky association with Cantonese-style salted fish and other preserved foods in high incidence areas [6], the evidence on dietary components/dietary pattern and NPC is still limited. Our study has derived a NPC protective dietary pattern characterized by two additional dietary items (milk and eggs) which were not previously reported. However, previous evidences on the effects of milk and eggs on NPC showed opposite results. The association between eggs and cancer mortality in general also showed inconsistent results [43] in epidemiological studies. The majority of the single-food-item studies reported null results [11,32,33]. Similar to our study, consumption of boiled or fried eggs in childhood was inversely associated with NPC risk in Algeria [44]. In contrast to our study, one study assessed the role of dietary patterns through a factor analysis and showed the animal product pattern, loading highly on milk, cheese, desserts, and eggs, was positively associated with NPC risk [14]. Likewise, Polesel et al. [36] reported a twofold increase in NPC risk in people within the highest quartile of consumption of eggs. Some cohort studies have reported positive associations of milk consumption with coronary heart disease [45] and prostate cancers [46].
The main concern when referring to adverse influence of milk consumption has been related to its saturated fat content. In addition, various exogenous hormones such as insulinlike growth factor 1 (IGF-1) are present in milk, which was suggested as a possible promotional effect for milk as a risk of hormone-dependent cancer, such as breast cancer in women and prostate cancer in men [47]. With the afore-mentioned results and conclusions, it is important to consider the U-shaped association between intake of milk and subsequent risk of cancers. The right-hand limb of the U is likely the well-known higher risk of cancers due to promotional effect of saturated fat and other hormone-like factors from excessive milk consumption. The left-hand limb of the U may indicate higher risk of cancers due to the lack of protective and nutritious effect from milk consumption. Milk consumption among Taiwanese has been much lower than that in western countries because of the cultural reasons and lactose intolerance. Therefore, subjects whose diet conformed with good pattern (with frequency of milk consumption around 1-4 times per week for those in the second to the fourth quartiles of factor scores) in the present study might be considered as light-to-moderate milk consumers by western standard. Our finding suggested that light-to-moderate milk consumption may protect people from NPC in region low in milk intake. With respect to egg consumption, Benn et al [48] used Mendelian randomization approach to show that very low plasma levels of low-density lipoprotein (LDL) cholesterol may be causally associated with an increased risk of cancer. Therefore, U-shaped association between the level of blood total or LDL-cholesterol and subsequent mortality has been suggested [49]. That is, higher levels of blood cholesterol have higher risk of death from coronary heart disease, but lower levels of blood cholesterol is also a risk marker for future cancer [50]. The inverse egg consumption association with NPC we found is consistent with these findings concerning cholesterol metabolism, although it is not clear whether it is the cholesterol or other food constituents in egg playing the central roles.
Given the strong evidence supporting a causing role of EBV in the development of NPC [5], EBV status may confound the observed association between dietary habits and NPC. We have assessed the association between dietary pattern and NPC among EBV seropositive individuals (313 cases and 98 controls). The association between NPC and dietary pattern scores remained statistically significant (data not shown). In other words, our study suggested that the composition of the dietary pattern we observed may play a protective role in the development of NPC even when people are infected with EBV.
Several research design limitations and methodology considerations should be noted. First of all, this was a retrospective study. Association does not equate directly to causality. Secondly, recall bias is a concern in case-control design. Therefore, the results should be further confirmed. Thirdly, FFQ provides direction of the association between diet and disease and yet it does not provide accurate estimates of the food frequency and food amount due to systematic errors associated with the FFQ methodology. Fourthly, since Chinese customarily share many dishes with family members; the portion size is usually small and the frequency is often high. Direct comparison of food consumption frequency with western studies may not be appropriate. Moreover, the staple was not addressed in this study since at the time white rice was the main staple which contributed over 92% among the staple foods [22]. Lastly, in contrast to PCA method which focuses soly on dietary pattern discovery by maximizing the variability explained for food frequency, PLS attempts to discriminant the diseased and the control as well as finding associated food patterns. The portion of people with the PLS-discovered pattern is usually smaller than those found by PCA.
In conclusion, this is one of the first studies on dietary pattern and NPC in Asian population, employing a novel approach, PLS-DA. We disclosed a protective dietary pattern with a nutrient-dense diet high in n-3 fatty acid and with balanced amount of good foods from both plant and animal origins. Since it is simply a Taiwanese diet with more phytonutrient-rich plant foods (fruits, vegetables) and more fresh fish and egg dishes accompanied by frequent tea drinking and morning milk, it can be easily adopted into the modern Taiwanese diet. Our findings suggests that this dietary pattern may be considered and trialed as a potential lifestyle regimen to prevent NPC.
Supporting Information S1 Table. Food Items and Food Groups included in the Derivation of Dietary Patterns associated with Nasopharyngeal Carcinoma. a Fresh eggs, preserved eggs and salted eggs were grouped together for their similar cholesterol contents. b Fruits and 100% fruit juices were grouped together due to similar vitamin and mineral contents and the fact that less than 1% of the people even consumed 100% fruit juices at the time when the study was carried out.