Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Prediction model for pancreatic cancer risk in the general Japanese population

  • Masahiro Nakatochi,

    Roles Conceptualization, Formal analysis, Funding acquisition, Writing – original draft

    Affiliation Division of Data Science, Data Coordinating Center, Department of Advanced Medicine, Nagoya University Hospital, Nagoya, Japan

  • Yingsong Lin ,

    Roles Conceptualization, Funding acquisition, Writing – original draft

    linys@aichi-med-u.ac.jp

    Affiliation Department of Public Health, Aichi Medical University School of Medicine, Nagakute, Japan

  • Hidemi Ito,

    Roles Resources, Writing – review & editing

    Affiliation Division of Cancer Information and Control, Aichi Cancer Center Research Institute, Nagoya, Japan

  • Kazuo Hara,

    Roles Resources, Writing – review & editing

    Affiliation Department of Gastroenterology, Aichi Cancer Center Hospital, Nagoya, Japan

  • Fumie Kinoshita,

    Roles Formal analysis, Writing – review & editing

    Affiliation Division of Data Science, Data Coordinating Center, Department of Advanced Medicine, Nagoya University Hospital, Nagoya, Japan

  • Yumiko Kobayashi,

    Roles Formal analysis, Writing – review & editing

    Affiliation Division of Data Science, Data Coordinating Center, Department of Advanced Medicine, Nagoya University Hospital, Nagoya, Japan

  • Hiroshi Ishii,

    Roles Resources, Writing – review & editing

    Affiliation Clinical Research Center, National Hospital Organization Shikoku Cancer Center, Matsuyama, Japan

  • Masato Ozaka,

    Roles Resources, Writing – review & editing

    Affiliation Department of Hepato-biliary-pancreatic Medicine, The Cancer Institute Hospital of Japanese Foundation for Cancer Research, Tokyo, Japan

  • Takashi Sasaki,

    Roles Resources, Writing – review & editing

    Affiliation Department of Hepato-biliary-pancreatic Medicine, The Cancer Institute Hospital of Japanese Foundation for Cancer Research, Tokyo, Japan

  • Naoki Sasahira,

    Roles Resources, Writing – review & editing

    Affiliation Department of Hepato-biliary-pancreatic Medicine, The Cancer Institute Hospital of Japanese Foundation for Cancer Research, Tokyo, Japan

  • Manabu Morimoto,

    Roles Resources, Writing – review & editing

    Affiliation Hepatobiliary and Pancreatic Medical Oncology Division, Kanagawa Cancer Center Hospital, Kanagawa, Japan

  • Satoshi Kobayashi,

    Roles Resources, Writing – review & editing

    Affiliation Hepatobiliary and Pancreatic Medical Oncology Division, Kanagawa Cancer Center Hospital, Kanagawa, Japan

  • Makoto Ueno,

    Roles Resources, Writing – review & editing

    Affiliation Hepatobiliary and Pancreatic Medical Oncology Division, Kanagawa Cancer Center Hospital, Kanagawa, Japan

  • Shinichi Ohkawa,

    Roles Resources, Writing – review & editing

    Affiliation Hepatobiliary and Pancreatic Medical Oncology Division, Kanagawa Cancer Center Hospital, Kanagawa, Japan

  • Naoto Egawa,

    Roles Resources, Writing – review & editing

    Affiliation Tokyo Metropolitan Hiroo Hospital, Tokyo, Japan

  • Sawako Kuruma,

    Roles Resources, Writing – review & editing

    Affiliation Department of Internal Medicine, Tokyo Metropolitan Komagome Hospital, Tokyo, Japan

  • Mitsuru Mori,

    Roles Resources, Writing – review & editing

    Affiliation Hokkaido Chitose College of Rehabilitation, Hokkaido, Japan

  • Haruhisa Nakao,

    Roles Resources, Writing – review & editing

    Affiliation Division of Hepatology and Pancreatology, Aichi Medical University School of Medicine, Nagakute, Japan

  • Chaochen Wang,

    Roles Conceptualization, Writing – review & editing

    Affiliation Department of Public Health, Aichi Medical University School of Medicine, Nagakute, Japan

  • Takeshi Nishiyama,

    Roles Conceptualization, Writing – review & editing

    Affiliation Department of Public Health, Nagoya City University Graduate School of Medicine, Nagoya, Japan

  • Takahisa Kawaguchi,

    Roles Resources, Writing – review & editing

    Affiliation Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan

  • Meiko Takahashi,

    Roles Resources, Writing – review & editing

    Affiliation Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan

  • Fumihiko Matsuda,

    Roles Resources, Writing – review & editing

    Affiliation Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan

  • Shogo Kikuchi,

    Roles Conceptualization, Funding acquisition, Resources, Supervision, Writing – review & editing

    Affiliation Department of Public Health, Aichi Medical University School of Medicine, Nagakute, Japan

  •  [ ... ],
  • Keitaro Matsuo

    Roles Conceptualization, Funding acquisition, Resources, Supervision, Writing – review & editing

    Affiliation Division of Cancer Epidemiology and Prevention, Aichi Cancer Center Research Institute, Nagoya, Japan

  • [ view all ]
  • [ view less ]

Prediction model for pancreatic cancer risk in the general Japanese population

  • Masahiro Nakatochi, 
  • Yingsong Lin, 
  • Hidemi Ito, 
  • Kazuo Hara, 
  • Fumie Kinoshita, 
  • Yumiko Kobayashi, 
  • Hiroshi Ishii, 
  • Masato Ozaka, 
  • Takashi Sasaki, 
  • Naoki Sasahira
PLOS
x

Abstract

Genome-wide association studies (GWASs) have identified many single nucleotide polymorphisms (SNPs) that are significantly associated with pancreatic cancer susceptibility. We sought to replicate the associations of 61 GWAS-identified SNPs at 42 loci with pancreatic cancer in Japanese and to develop a risk model for the identification of individuals at high risk for pancreatic cancer development in the general Japanese population. The model was based on data including directly determined or imputed SNP genotypes for 664 pancreatic cancer case and 664 age- and sex-matched control subjects. Stepwise logistic regression uncovered five GWAS-identified SNPs at five loci that also showed significant associations in our case-control cohort. These five SNPs were included in the risk model and also applied to calculation of the polygenic risk score (PRS). The area under the curve determined with the leave-one-out cross-validation method was 0.63 (95% confidence interval, 0.60–0.66) or 0.61 (0.58–0.64) for versions of the model that did or did not include cigarette smoking and family history of pancreatic cancer in addition to the five SNPs, respectively. Individuals in the lowest and highest quintiles for the PRS had odds ratios of 0.62 (0.42–0.91) and 1.98 (1.42–2.76), respectively, for pancreatic cancer development compared with those in the middle quintile. We have thus developed a risk model for pancreatic cancer that showed moderately good discriminatory ability with regard to differentiation of pancreatic cancer patients from control individuals. Our findings suggest the potential utility of a risk model that incorporates replicated GWAS-identified SNPs and established demographic or environmental factors for the identification of individuals at increased risk for pancreatic cancer development.

Introduction

Pancreatic cancer is a malignancy characterized by an elusive etiology, poor prognosis, and a lack of effective early detection tools. In Japan, pancreatic cancer represents the fourth leading cause of cancer deaths, with age-adjusted incidence and mortality rates having increased continuously over the past several decades[1]. The reason for this increasing pancreatic cancer burden is unclear, but it does not appear to be explained by risk factors, such as age and cigarette smoking, that have been established on the basis of epidemiologic studies[2].

The heritability of pancreatic cancer in Scandinavia has been estimated to be 36% by twin studies[3], suggestive of an important contribution of genetic variation to pancreatic cancer susceptibility. Focusing on common genetic variation represented by single nucleotide polymorphisms (SNPs), the National Cancer Institute Cohort Consortia (PanSCan) have conducted four genome-wide association studies (GWASs) with populations of European ancestry and identified 18 risk loci that were robustly associated (P < 5 × 10−8) with pancreatic cancer risk[49]. GWASs in two populations (Japanese and Chinese) of East Asian ancestry identified additional risk loci that subsequently failed replication in other independent cohorts[1012]. Whereas GWASs have improved our understanding of the role of common SNPs in pancreatic cancer, the identified SNPs confer relatively small increments in risk (1.1- to 1.5-fold) with regard to the development of this disease. Furthermore, it remains a challenge to identify low-frequency or rare variants and to further clarify their contribution to pancreatic cancer pathophysiology.

Detection of pancreatic cancer at an early stage in the general population is difficult. The relatively low prevalence of the condition, coupled with the lack of validated biomarkers or imaging modalities, has limited the feasibility of a population-based screening program. Such challenges can, in part, be addressed by the development of a risk prediction model that aims to identify a small subset of high-risk individuals by incorporating genetic variants as well as demographic and lifestyle risk factors[13]. As with risk prediction models for other cancer types, such as lung, breast, and colorectal cancer[1418], the clinical utility of such risk models for pancreatic cancer is currently limited.

As far as we are aware, no risk model has been developed for the prediction of pancreatic cancer risk in the general population in Japan. As a first step toward the development of such a model, we examined SNPs identified by published GWASs for their associations with pancreatic cancer risk in an age- and sex-matched case-control data set of Japanese individuals. We then developed the model further by incorporating the GWAS-identified SNPs that showed significant associations in our case-control cohort as well as demographic and established risk factors. A validated risk model may help to identify individuals at increased risk for the development of pancreatic cancer, thereby raising awareness that can result in the adoption of risk-minimizing behavior or in early detection by follow-up examinations.

Materials and methods

Study subjects

To develop the risk model, we performed a genetic association study with 664 cases and 664 age- and sex-matched controls selected from two case-control data sets that included a total of 945 pancreatic cancer patients and 2109 control subjects. The first data set included 622 patients who were recruited for a multi-institutional case-control study of pancreatic cancer, the details of which have been described previously[19]. In this previous study, clinically or histologically (or both) diagnosed pancreatic cancer patients were recruited from January 2010 to July 2014 at five participating hospitals in central Japan, Kanto, and Hokkaido regions. Most of the patients were recruited by gastroenterologists and had tumors at stage 3 or stage 4 at the time of diagnosis. Approximately 33% of the patients underwent surgical resection. Questionnaire data on demographic and lifestyle factors and 7-ml blood samples were collected from the study participants. The second data set included 323 newly diagnosed pancreatic cancer patients as well as 2109 control subjects who were recruited to an epidemiologic research program at Aichi Cancer Center (HERPACC). All new outpatients at Aichi Cancer Center on their first visit were invited to participate in HERPACC. Those who agreed to participate filled out a self-administered questionnaire and provided a 7-ml blood sample. The data collected were entered into the HERPACC database and linked to the hospital cancer registry system periodically to confirm cancer diagnoses. The feasibility of using first-visit outpatients as control subjects within the framework of HERPACC has been addressed previously by comparing their epidemiologic features with those of randomly selected individuals from the general population[20]. In these two case-control data sets, the vast majority of pancreatic cancer cases had a histology of ductal adenocarcinoma, with a small proportion of endocrine tumors (1.7%) also being included. None of the control subjects had a diagnosis of cancer at the time of recruitment. For all case and control subjects in the present study, data on demographic and lifestyle factors, such as cigarette smoking and family history of pancreatic cancer in first-degree relatives, were extracted from questionnaire answers. Written informed consent was obtained from all study participants, and the study protocol was approved by the ethical board of Aichi Medical University, the institutional ethics committee of Aichi Cancer Center, the Human Genome and Gene Analysis Research Ethics Committee of Nagoya University, and the ethics committees of all participating hospitals.

For the present case-control genetic association study, cases diagnosed with endocrine tumors were excluded. Case and control subjects were matched according to sex and age (categorized in 5-year intervals). During the matching process, only individuals with available data for all variables, including age, sex, cigarette smoking status, and family history of pancreatic cancer, were selected. Totals of 664 cases and 664 control subjects were eligible for statistical analysis.

Genotyping and quality control

A total of 945 pancreatic cancer case subjects and 2109 control subjects were genotyped at the Center for Genomic Medicine, Kyoto University, with the use of a HumanCoreExome-12 v1.1 BeadChip array (Illumina, San Diego, CA, USA). Five samples with a genotype call rate of <0.98 were excluded. No samples showed a discrepancy between genetic and reported sex. The identity-by-descent method implemented in PLINK 1.9 software[21] detected 17 duplicate or closely related pairs of samples (pi-hat > 0.1875), with one sample of each pair being excluded. Principal component analysis (PCA)[22] with the 1000 Genomes Project reference panel (phase 3)[23] detected seven subjects with estimated ancestries outside of the Japanese population. These seven samples were also excluded. Furthermore, PCA based on only our samples was performed to identify population outliers. On the basis of the first 10 principal components, nine population outliers were identified and were excluded from further analysis. Among the 542,585 SNPs that were genotyped with the array, we excluded nonautosomal SNPs as well as SNPs with a genotype call rate of <0.98 or a Hardy-Weinberg equilibrium exact test P value of <1 × 10−6 in the control subjects, a minor allele frequency of <0.01, or a departure from the allele frequency computed from the 1000 Genomes Project phase 3 EAS samples. Such quality control filtering resulted in the selection of 942 case subjects and 2074 control subjects as well as 248,185 SNPs.

Genotype imputation and postimputation processing

Genotype imputation was performed with SHAPEIT2[24] and Minimac3[25] software based on the 1000 Genomes Project cosmopolitan reference panel (phase 3)[23]. We searched GWAS Catalog[26] for published GWASs of pancreatic cancer and selected 77 candidate SNPs at 54 loci that had been characterized and found to be associated with pancreatic cancer (S1 Table). Of these 77 SNPs, 16 polymorphisms at 14 loci with an imputation quality score (r2) of <0.8 (rs1747924, rs351365, rs4927850, rs35226131, rs6879627, rs73328514, rs6971499, rs10094872, rs1886449, rs7190458, rs7200646, rs4795218, rs77038344, rs11655237, rs7214041, rs6073450) were excluded. After imputation, we were finally left with 664 case subjects and 664 age- and sex-matched control subjects as well as 61 SNPs at 42 loci for statistical analysis.

Statistical analysis

To build a high-precision risk model, we applied a three-step approach. We initially performed a screening analysis with the 61 SNPs at 42 loci in which the cutoff P value was defined as <0.05 for logistic regression analysis. After this screening analysis and exclusion of SNPs in strong linkage disequilibrium with other polymorphisms, eight SNPs at seven loci that were significantly associated with pancreatic cancer remained for identification of SNPs that independently influence pancreatic cancer by logistic regression analysis with a stepwise forward selection procedure. Finally, we constructed two versions of a prediction model for pancreatic cancer: Model A included established risk factors and the five SNPs at five loci identified in the stepwise selection, and model B included only the SNPs.

Simple comparison of demographic and lifestyle risk factors between case and control groups was carried out with Fisher’s exact test and Student’s t test. In the screening step, the association between each SNP and pancreatic cancer was assessed with the use of logistic regression analysis. We used imputed genotype, which is the expected number of risk alleles for pancreatic cancer and is a continuous variable ranging from 0 to 2. We applied two types of analysis condition to assess the association of each SNP with pancreatic cancer: condition 1, in which no covariates were included, and condition 2, in which covariates comprised smoking status (nonsmoker = 0, ever-smoker = 1) and family history of pancreatic cancer (no = 0, yes = 1). In the subsequent step, multiple logistic regression analysis with a stepwise forward selection procedure was performed to identify SNPs that independently contribute to pancreatic cancer; the dependent variable was pancreatic cancer status (control = 0, case = 1) and independent variables included the imputed genotypes of each SNP. The significance level for inclusion in and exclusion from the model construction was P < 0.05. A version of the model including classical risk factors and SNPs identified by the stepwise selection was designated model A, whereas a version including only the five identified SNPs was designated model B. Receiver operating characteristic (ROC) analysis with the leave-one-out cross-validation (LOOCV) method was applied to evaluate model performance with the use of pROC of the R package[27]. Confidence intervals for area under the curve (AUC) values were assessed by 10,000-times bootstrap resampling.

We defined the polygenic risk score (PRS) for pancreatic cancer as the summation of the number of risk alleles multiplied by the corresponding natural logarithm of the odds ratio, ln(OR), in model B as follows: where m is the number of SNPs (m = 5 in this study), ORi is the odds ratio for SNP i in model B, and xi is the genotype coded as the number of risk alleles for SNP i. We calculated the PRS for each subject using this equation and then divided the study subjects into quintile groups (Q1 to Q5) with equal numbers of control subjects on the basis of the PRS. We compared the middle quintile group (Q3) with other groups (Q1, Q2, Q4, Q5) with the use of logistic regression analysis with adjustment for cigarette smoking and family history of pancreatic cancer.

Heritability analysis was performed with the use of GCTA software[28]. The analysis estimates the percentage of phenotypic variance explained by common SNPs. We assumed a prevalence of 0.000095 for pancreatic cancer in the Japanese population on the basis of data in the GLOBOCAN 2012 database[29]. To estimate the heritability, we used the data set comprising the 664 cases and 664 controls adopted for the association analysis as well as the 248,185 directly genotyped SNPs used for imputation.

A P value of <0.05 was considered statistically significant. All statistical analysis was performed with SAS software version 9.4 (SAS Institute, Cary, NC, USA) and the R project version 3.3 (www.r-project.org).

Results

We performed genotyping for 945 pancreatic cancer case subjects and 2109 control subjects with the use of a HumanCoreExome-12 v1.1 BeadChip array and imputed genotypes based on the 1000 Genomes Project cosmopolitan reference panel (phase 3). We picked up 77 candidate SNPs at 54 loci that had been characterized and found to be associated with pancreatic cancer in previous GWASs (S1 Table). Of these 77 SNPs, 16 SNPs at 14 loci were excluded from further analysis because of poor imputation quality. After postimputation processing, 664 case subjects and 664 age- and sex-matched control subjects as well as 61 SNPs at 42 loci remained for the subsequent analysis.

The demographic and lifestyle risk factors for the case and control data set selected for development of the risk model are shown in Table 1. The mean age was 60.8 years for the case subjects and 60.5 years for the controls (P = 0.453). Case subjects had a higher proportion of individuals with a family history of pancreatic cancer (6.0% versus 2.3%, P < 0.001) and ever-smokers (65.7% versus 53.0%, P < 0.001) compared with controls.

Thirteen SNPs at seven loci of the remaining 61 SNPs showed a significant association with pancreatic cancer in our case-control data set with a P value of <0.05 in condition 1 with no covariate or in condition 2 with the covariates of smoking status and family history of pancreatic cancer (Table 2, S2 Table). The OR for these 13 SNPs ranged from 1.19 to 1.43 for individuals with risk variants. One polymorphism of each SNP pair that exhibited strong linkage disequilibrium (r2 > 0.8) was excluded, leaving eight SNPs at seven loci for stepwise logistic regression analysis. Five SNPs, cigarette smoking, and family history of pancreatic cancer remained significantly associated with pancreatic cancer risk in the stepwise logistic regression analysis and were therefore included in the development of two versions of the risk prediction model: Model A included cigarette smoking, family history of pancreatic cancer, and the five SNPs at five loci (Table 3), whereas model B included only the five SNPs (S3 Table). Akaike information criterion values for models A and B were 1764 and 1796, respectively. In model A, ever-smokers had a 1.5-fold increased risk compared with nonsmokers (OR = 1.58, with a 95% confidence interval [CI] of 1.26–1.98). The OR for individuals with the effect alleles ranged from 1.20 to 1.43, after adjustment for cigarette smoking and family history of pancreatic cancer. The AUC values for the ROC curves derived from models A and B with the use of the LOOCV method were 0.63 (95% CI, 0.60–0.66) and 0.61 (0.58–0.64), respectively (Fig 1).

thumbnail
Table 2. Association of pancreatic cancer with 13 SNPs at seven loci in the case-control data set with a P value of <0.05.

https://doi.org/10.1371/journal.pone.0203386.t002

thumbnail
Table 3. Demographic and lifestyle risk factors as well as the five SNPs included in risk model A.

https://doi.org/10.1371/journal.pone.0203386.t003

thumbnail
Fig 1. ROC curves for models A and B incorporating different variables according to the LOOCV method.

Model A (blue line) incorporates classical risk factors and five GWAS-identified SNPs, whereas model B (red dashed line) includes only the five GWAS-identified SNPs. AUC values (95% CI) for models A and B are 0.63 (0.60–0.66) and 0.61 (0.58–0.64), respectively. The gray diagonal line corresponds to an AUC of 0.5 and no discrimination.

https://doi.org/10.1371/journal.pone.0203386.g001

We calculated the polygenic risk score (PRS) for each study subject using model B and then divided the subjects into quintile groups (Q1 to Q5) with equal numbers of control individuals on the basis of the PRS (Fig 2a). The mean ± s.d. values of the PRS for case and control subjects were 1.17 ± 0.42 and 1.01 ± 0.42, respectively. The PRS was significantly associated with risk of pancreatic cancer (Fig 2b). Compared with subjects in the middle quintile of PRS values (Q3), the OR values were 0.62 (95% CI, 0.42–0.91), 0.83 (0.58–1.20), 1.23 (0.87–1.73), and 1.98 (1.42–2.76) for subjects in Q1, Q2, Q4, and Q5, respectively, after adjustment for cigarette smoking and family history of pancreatic cancer.

thumbnail
Fig 2. Percentage of subjects as well as the OR for pancreatic cancer according to PRS.

(a) Distribution of the PRS in pancreatic cancer case and control subjects. (b) The OR for pancreatic cancer according to the quintiles of PRS. Vertical bars represent 95% CIs. The horizontal dashed line indicates the null value (OR = 1.0). *P < 0.05, †P < 0.01 versus Q3. The cutoffs for the quintiles of PRS in the control subjects were Q1 ≤ 0.58, 0.58 < Q2 ≤ 0.91, 0.91 < Q3 ≤ 1.12, 1.12 < Q4 ≤ 1.31, and Q5 > 1.31. OR values were calculated by logistic regression analysis with adjustment for cigarette smoking and family history of pancreatic cancer.

https://doi.org/10.1371/journal.pone.0203386.g002

Finally, we estimated the heritability of pancreatic cancer due to common GWAS SNPs using only data for directly genotyped SNPs (248,185 SNPs for 664 cases and 664 age- and sex-matched controls). For a disease prevalence of 0.000095, we estimated that 16.1% (95% CI, 7.8–24.3%) of the total phenotypic variation in our data set was explained by common SNPs across the genome.

Discussion

Risk prediction models incorporating SNPs and environmental risk factors offer a means to identify a subset of individuals with increased cancer risk in the general population[30]. With the use of a case-control data set based on the Japanese population, we have now developed a risk model for pancreatic cancer and showed that it performed moderately well for discrimination of pancreatic cancer patients from individuals without the disease. Our findings suggest that a risk model that incorporates replicated GWAS-identified SNPs and established environmental factors is potentially useful for the identification of a subset of Japanese individuals with an increased risk for the development of pancreatic cancer.

Three risk models have been developed to date for the prediction of pancreatic cancer risk in general populations of different ethnicities[13,31,32]. To identify individuals at elevated risk for pancreatic cancer in a population of European ancestry, Klein et al. estimated the absolute risk of pancreatic cancer development with a risk model (based on three GWAS-identified SNPs, sex, age, ABO genotype, family history of pancreatic cancer, body mass index, cigarette smoking, and heavy alcohol intake) and incidence data from the SEER registries[13]. The AUC for the model was 0.61 (95% CI, 0.58–0.63), which demonstrated its superiority over a model that included only genetic or only nongenetic factors. However, only a few individuals were estimated to have a 10-year absolute risk of >2% even if all genetic and nongenetic factors were present, indicating that the clinical utility of the model is low. By combining SEER data with a logistic model including risk factors (cigarette smoking, current use of proton pump inhibitors, recent diagnosis of diabetes mellitus and pancreatitis, Jewish ancestry, and ABO blood group other than O) that were identified from a population-based case-control study, Risch et al.[31] showed that 0.87% of controls with a combination of risk factors had an estimated 5-year absolute risk of >5%. It should be noted that these two models were developed for populations of European ancestry and that their performance in populations of Asian ancestry, including Japanese, awaits validation. With regard to East Asian populations, Yu et al. developed a risk prediction model to estimate individual risk of pancreatic cancer in the Korean population[32]. The model included biomarkers such as fasting blood glucose and urinary glucose levels as well as demographic and lifestyle risk factors, and it showed good discrimination ability with a validation set, with C-statistics of 0.81 (95% CI, 0.80–0.83) for men and 0.80 (0.79–0.82) for women. No genetic factors were included in this prediction model, however. The discrimination ability of our risk model is similar to that of the model of Klein et al.[13], and it would be similar to that of the model of Yu et al.[32] for the Korean population if we included matching factors such as age and sex. However, one key issue with all these risk models is the difficulty of their translation to clinical or public health practice. Further studies are thus needed to clarify the application of risk prediction models in different contexts, such as for population-wide use as a risk assessment tool or for screening for individuals with a high absolute risk of pancreatic cancer.

The PRS is independent of established risk factors and can provide risk stratification beyond family history[33]. Given that a polygenic component to pancreatic cancer risk was suggested by previous studies[34], we calculated the PRS using the five replicated GWAS-identified SNPs. Our results showed that this approach provided a good stratification of pancreatic cancer risk. Exploration of the polygenic contribution to pancreatic cancer risk beyond the known risk variants, however, will require studies with larger sample sizes and more sophisticated analytic approaches[35]. Although our sample size limited further evaluation of the PRS, risk prediction models for pancreatic cancer that incorporate the PRS are worth pursuing, given that fewer common genetic variants have been identified for this cancer than for other cancer types such as breast and colorectal cancer.

Risk models that incorporate GWAS-identified SNPs should be interpreted in the context of heritability that can be explained by common SNPs. In the present study, we estimated that 16.1% (95% CI, 7.8–24.3%) of the total phenotypic variation in our data set was explained by common SNPs across the genome. A prediction model based on all common SNPs across the genome would thus have a performance that corresponds to the heritability. Heritability of pancreatic cancer in individuals of European descent has been estimated on the basis of common SNPs across the genome. Childs et al. estimated that 16.4% (95% CI, 10.4–22.4%) and 13.1% (95% CI, 9.9–16.3%) of the total phenotypic variation in PanC4 and in the combined data set, respectively, was explained by common SNPs across the genome[9]. Our results are thus consistent with these previous findings. The heritability of pancreatic cancer might therefore be similar in populations of Japanese or European descent.

One strength of our study is that the risk model we constructed was based on case-control data for Japanese subjects. Our risk model represents the first attempt to use existing GWAS-identified SNPs to identify a subset of the general Japanese population at increased risk of developing pancreatic cancer. We were able to replicate 13 SNPs at seven loci out of 61 GWAS-identified SNPs at 42 loci in our case-control data set. Although the effect sizes for the associations of SNPs with pancreatic cancer in our study are small, they are consistent with the results of previous GWASs for this cancer type. Furthermore, to address issues relating to overfitting, we used the LOOCV method to assess the performance of our prediction model. The AUC estimated with the LOOCV method was similar to those of previous models developed for pancreatic cancer, supporting the validity of our risk model based on the selected SNPs.

Our study also has several limitations. First, our GWAS was limited by the relatively small sample size and we did not validate our risk model in independent cohort samples. A clinically useful risk model needs to perform well with independent data sets and be generalizable to external populations. We will continue our efforts to find independent data sets with which to validate our risk model. Second, the clinical utility of our model as well as its potential contribution to a reduction in pancreatic cancer mortality in the general population are limited, although the discriminatory ability of the version of the model that included both demographic and lifestyle factors and replicated GWAS-identified SNPs was better than that of the version based only on SNPs. As shown previously[13], risk models that focus on rare cancer types such as pancreatic cancer may not offer clinically meaningful risk stratification because the percentage of individuals with a high absolute risk who warrant follow-up examination is small. Third, although we assessed all reported genome-wide significant SNPs documented for pancreatic cancer in GWASs, it is likely that other risk variants with borderline significance were not captured. Fourth, in addition to demographic factors and family history of pancreatic cancer, our study included only cigarette smoking as the most consistent risk factor for pancreatic cancer in Japanese. Further exploration and establishment of risk factors for pancreatic cancer in Japanese subjects may contribute to refinement of risk models.

In summary, we have developed a risk model for pancreatic cancer in Japanese individuals that showed a moderately good discriminatory ability with regard to differentiation of pancreatic cancer patients from control individuals. Further research is warranted to address the clinical utility of the model or its application to population-based screening. In particular, the goal of early detection can be pursued further by incorporation of established environmental risk factors, circulating biomarkers, the PRS, as well as other “omics” data.

Supporting information

S1 Table. Information on the 77 SNPs at 54 loci extracted from published GWASs for pancreatic cancer.

https://doi.org/10.1371/journal.pone.0203386.s001

(DOCX)

S2 Table. Association of pancreatic cancer with 61 SNPs at 42 loci extracted from published GWASs in our case-control cohort.

https://doi.org/10.1371/journal.pone.0203386.s002

(XLSX)

S3 Table. The five SNPs at five loci included in the risk model.

https://doi.org/10.1371/journal.pone.0203386.s003

(XLSX)

S1 Data. The data set used to construct the risk prediction model.

https://doi.org/10.1371/journal.pone.0203386.s004

(XLSX)

Acknowledgments

We thank Mayuko Masuda, Kikuko Kaji, Kazue Ando, Etsuko Ohara, and Sumiyo Asakura for assistance with data collection.

References

  1. 1. Lucas AL, Malvezzi M, Carioli G, Negri E, La Vecchia C, Boffetta P, et al. (2016) Global Trends in Pancreatic Cancer Mortality From 1980 Through 2013 and Predictions for 2017. Clin Gastroenterol Hepatol 14: 1452–1462 e1454. pmid:27266982
  2. 2. Matsuo K, Ito H, Wakai K, Nagata C, Mizoue T, Tanaka K, et al. (2011) Cigarette smoking and pancreas cancer risk: an evaluation based on a systematic review of epidemiologic evidence in the Japanese population. Jpn J Clin Oncol 41: 1292–1302. pmid:21971423
  3. 3. Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, et al. (2000) Environmental and heritable factors in the causation of cancer—analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med 343: 78–85. pmid:10891514
  4. 4. Zhang M, Wang Z, Obazee O, Jia J, Childs EJ, Hoskins J, et al. (2016) Three new pancreatic cancer susceptibility signals identified on chromosomes 1q32.1, 5p15.33 and 8q24.21. Oncotarget 7: 66328–66343. pmid:27579533
  5. 5. Klein AP, Wolpin BM, Risch HA, Stolzenberg-Solomon RZ, Mocci E, Zhang M, et al. (2018) Genome-wide meta-analysis identifies five new susceptibility loci for pancreatic cancer. Nat Commun 9: 556. pmid:29422604
  6. 6. Amundadottir L, Kraft P, Stolzenberg-Solomon RZ, Fuchs CS, Petersen GM, Arslan AA, et al. (2009) Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer. Nat Genet 41: 986–990. pmid:19648918
  7. 7. Petersen GM, Amundadottir L, Fuchs CS, Kraft P, Stolzenberg-Solomon RZ, Jacobs KB, et al. (2010) A genome-wide association study identifies pancreatic cancer susceptibility loci on chromosomes 13q22.1, 1q32.1 and 5p15.33. Nat Genet 42: 224–228. pmid:20101243
  8. 8. Wolpin BM, Rizzato C, Kraft P, Kooperberg C, Petersen GM, Wang Z, et al. (2014) Genome-wide association study identifies multiple susceptibility loci for pancreatic cancer. Nat Genet 46: 994–1000. pmid:25086665
  9. 9. Childs EJ, Mocci E, Campa D, Bracci PM, Gallinger S, Goggins M, et al. (2015) Common variation at 2p13.3, 3q29, 7p13 and 17q25.1 associated with susceptibility to pancreatic cancer. Nat Genet 47: 911–916. pmid:26098869
  10. 10. Campa D, Rizzato C, Bauer AS, Werner J, Capurso G, Costello E, et al. (2013) Lack of replication of seven pancreatic cancer susceptibility loci identified in two Asian populations. Cancer Epidemiol Biomarkers Prev 22: 320–323. pmid:23250936
  11. 11. Wu C, Miao X, Huang L, Che X, Jiang G, Yu D, et al. (2012) Genome-wide association study identifies five loci associated with susceptibility to pancreatic cancer in Chinese populations. Nat Genet 44: 62–66.
  12. 12. Low SK, Kuchiba A, Zembutsu H, Saito A, Takahashi A, Kubo M, et al. (2010) Genome-wide association study of pancreatic cancer in Japanese population. PLoS One 5: e11824. pmid:20686608
  13. 13. Klein AP, Lindstrom S, Mendelsohn JB, Steplowski E, Arslan AA, Bueno-de-Mesquita HB, et al. (2013) An absolute risk model to identify individuals at elevated risk for pancreatic cancer in the general population. PLoS One 8: e72311. pmid:24058443
  14. 14. Abe M, Ito H, Oze I, Nomura M, Ogawa Y, Matsuo K (2017) The more from East-Asian, the better: risk prediction of colorectal cancer risk by GWAS-identified SNPs among Japanese. J Cancer Res Clin Oncol 143: 2481–2492. pmid:28849422
  15. 15. Wen W, Shu XO, Guo X, Cai Q, Long J, Bolla MK, et al. (2016) Prediction of breast cancer risk based on common genetic variants in women of East Asian ancestry. Breast Cancer Res 18: 124. pmid:27931260
  16. 16. Gail MH (2008) Discriminatory accuracy from single-nucleotide polymorphisms in models to predict breast cancer risk. J Natl Cancer Inst 100: 1037–1041. pmid:18612136
  17. 17. Sueta A, Ito H, Kawase T, Hirose K, Hosono S, Yatabe Y, et al. (2012) A genetic risk predictor for breast cancer using a combination of low-penetrance polymorphisms in a Japanese population. Breast Cancer Res Treat 132: 711–721. pmid:22160591
  18. 18. Wacholder S, Hartge P, Prentice R, Garcia-Closas M, Feigelson HS, Diver WR, et al. (2010) Performance of common genetic variants in breast-cancer risk models. N Engl J Med 362: 986–993. pmid:20237344
  19. 19. Lin Y, Ueda J, Yagyu K, Ishii H, Ueno M, Egawa N, et al. (2013) Association between variations in the fat mass and obesity-associated gene and pancreatic cancer risk: a case-control study in Japan. BMC Cancer 13: 337. pmid:23835106
  20. 20. Inoue M, Tajima K, Hirose K, Hamajima N, Takezaki T, Kuroishi T, et al. (1997) Epidemiological features of first-visit outpatients in Japan: comparison with general population and variation by sex, age, and season. J Clin Epidemiol 50: 69–77. pmid:9048692
  21. 21. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ, et al. (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4: 7. pmid:25722852
  22. 22. Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2: e190. pmid:17194218
  23. 23. Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. (2015) A global reference for human genetic variation. Nature 526: 68–74. pmid:26432245
  24. 24. Delaneau O, Zagury JF, Marchini J (2013) Improved whole-chromosome phasing for disease and population genetic studies. Nat Methods 10: 5–6. pmid:23269371
  25. 25. Das S, Forer L, Schonherr S, Sidore C, Locke AE, Kwong A, et al. (2016) Next-generation genotype imputation service and methods. Nat Genet 48: 1284–1287. pmid:27571263
  26. 26. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. (2014) The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42: D1001–1006. pmid:24316577
  27. 27. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12: 77. pmid:21414208
  28. 28. Lee SH, Wray NR, Goddard ME, Visscher PM (2011) Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet 88: 294–305. pmid:21376301
  29. 29. Ferlay J, Soerjomataram I, Dikshit R, Eser S, Mathers C, Rebelo M, et al. (2015) Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer 136: E359–386. pmid:25220842
  30. 30. Freedman AN, Seminara D, Gail MH, Hartge P, Colditz GA, Ballard-Barbash R, et al. (2005) Cancer risk prediction models: a workshop on development, evaluation, and application. J Natl Cancer Inst 97: 715–723. pmid:15900041
  31. 31. Risch HA, Yu H, Lu L, Kidd MS (2015) Detectable Symptomatology Preceding the Diagnosis of Pancreatic Cancer and Absolute Risk of Pancreatic Cancer Diagnosis. Am J Epidemiol 182: 26–34. pmid:26049860
  32. 32. Yu A, Woo SM, Joo J, Yang HR, Lee WJ, Park SJ, et al. (2016) Development and Validation of a Prediction Model to Estimate Individual Risk of Pancreatic Cancer. PLoS One 11: e0146473. pmid:26752291
  33. 33. Chatterjee N, Shi J, Garcia-Closas M (2016) Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat Rev Genet 17: 392–406. pmid:27140283
  34. 34. Lu Y, Ek WE, Whiteman D, Vaughan TL, Spurdle AB, Easton DF, et al. (2014) Most common ‘sporadic’ cancers have a significant germline genetic component. Hum Mol Genet 23: 6112–6118. pmid:24943595
  35. 35. Machiela MJ, Chen CY, Chen C, Chanock SJ, Hunter DJ, Kraft P (2011) Evaluation of polygenic risk scores for predicting breast and prostate cancer risk. Genet Epidemiol 35: 506–514. pmid:21618606