A Mouse to Human Search for Plasma Proteome Changes Associated with Pancreatic Tumor Development

Background The complexity and heterogeneity of the human plasma proteome have presented significant challenges in the identification of protein changes associated with tumor development. Refined genetically engineered mouse (GEM) models of human cancer have been shown to faithfully recapitulate the molecular, biological, and clinical features of human disease. Here, we sought to exploit the merits of a well-characterized GEM model of pancreatic cancer to determine whether proteomics technologies allow identification of protein changes associated with tumor development and whether such changes are relevant to human pancreatic cancer. Methods and Findings Plasma was sampled from mice at early and advanced stages of tumor development and from matched controls. Using a proteomic approach based on extensive protein fractionation, we confidently identified 1,442 proteins that were distributed across seven orders of magnitude of abundance in plasma. Analysis of proteins chosen on the basis of increased levels in plasma from tumor-bearing mice and corroborating protein or RNA expression in tissue documented concordance in the blood from 30 newly diagnosed patients with pancreatic cancer relative to 30 control specimens. A panel of five proteins selected on the basis of their increased level at an early stage of tumor development in the mouse was tested in a blinded study in 26 humans from the CARET (Carotene and Retinol Efficacy Trial) cohort. The panel discriminated pancreatic cancer cases from matched controls in blood specimens obtained between 7 and 13 mo prior to the development of symptoms and clinical diagnosis of pancreatic cancer. Conclusions Our findings indicate that GEM models of cancer, in combination with in-depth proteomic analysis, provide a useful strategy to identify candidate markers applicable to human cancer with potential utility for early detection.


A B S T R A C T Background
The complexity and heterogeneity of the human plasma proteome have presented significant challenges in the identification of protein changes associated with tumor development. Refined genetically engineered mouse (GEM) models of human cancer have been shown to faithfully recapitulate the molecular, biological, and clinical features of human disease. Here, we sought to exploit the merits of a well-characterized GEM model of pancreatic cancer to determine whether proteomics technologies allow identification of protein changes associated with tumor development and whether such changes are relevant to human pancreatic cancer.

Methods and Findings
Plasma was sampled from mice at early and advanced stages of tumor development and from matched controls. Using a proteomic approach based on extensive protein fractionation, we confidently identified 1,442 proteins that were distributed across seven orders of magnitude of abundance in plasma. Analysis of proteins chosen on the basis of increased levels in plasma from tumor-bearing mice and corroborating protein or RNA expression in tissue documented concordance in the blood from 30 newly diagnosed patients with pancreatic cancer relative to 30 control specimens. A panel of five proteins selected on the basis of their increased level at an early stage of tumor development in the mouse was tested in a blinded study in 26 humans from the CARET (Carotene and Retinol Efficacy Trial) cohort. The panel discriminated pancreatic cancer cases from matched controls in blood specimens obtained between 7 and 13 mo prior to the development of symptoms and clinical diagnosis of pancreatic cancer.

Introduction
A major goal of the cancer biomarker field is the development of noninvasive tests that allow early cancer detection. Blood constituents, notably plasma proteins, reflect diverse physiologic or pathologic states. The ease with which this compartment can be sampled makes it a logical choice for screening applications to detect cancer at an early stage. However, the vast dynamic range of protein abundance in plasma and the likely occurrence of tumor-derived proteins in the lower range of protein abundance represent major challenges in the application of proteomic-based strategies for cancer biomarker identification [1,2]. Recent experience in comprehensive profiling of plasma proteins indicates that low-abundance proteins may be identified with high confidence following extensive plasma fractionation and with the use of high-resolution mass spectrometry [3,4].
Genomic analyses of human and mouse cancers have revealed significant concordance in chromosomal aberrations and expression profiles, establishing cross-species analyses as a highly effective filter in the identification of genes and loci embedded within complex cancer genomes [5][6][7][8]. Genetically engineered mouse (GEM) models afford defined stages of tumor development, homogenized breeding and environmental conditions, and standardized blood sampling thereby reducing biological and nonbiological heterogeneity. The concept that plasma from GEM models of cancer contains tumor-derived proteins that may be relevant as candidate markers for human cancer is attractive as suggested by SELDI (surface enhanced laser desorption/ ionization) scanning technology, but it remains untested as no markers demonstrated to be applicable to human cancer have been identified using such models and methods [9].
In this study, we focused our efforts on pancreatic ductal adenocarcinoma (PDAC)-a highly lethal cancer characterized by activating mutations of the Kras oncogene and inactivation of the Ink4a and Arf-p53 tumor suppressor pathways in the great majority of cases [10]. Kras activation is thought to initiate focal lesions in the pancreatic ducts, known as pancreatic intraepithelial neoplasias (PanINs), which undergo graded histological progression to PDAC in association with subsequent Ink4a and Arf-p53 inactivation [11,12]. The recent generation of mice harboring these signature genetic mutations has yielded models that closely recapitulate the histopathogenesis of the human disease with Kras G12D initiating focal PanINs that rapidly undergo multistage progression in conjunction with Ink4a/Arf or p53 mutations, resulting in invasive PDAC. Importantly, these models show broadly conserved tumor biology and molecular circuitry similar to human PDAC. The tumors exhibit a proliferative stroma (desmoplasia) and frequent metastases, express pancreatic ductal markers (CK-19) and apical mucins (e.g., Muc1, Muc5AC), show activation of developmental signaling pathways (Hedgehog, Notch, EGFR), and harbor syntenic genomic alterations to human PDAC [9,[13][14][15].
We have applied here an intensive quantitative proteomic analysis strategy to plasmas that were sampled from this pancreatic cancer mouse model at early stage, representing PanIN, and at advanced stage of tumor development, representing PDAC, and from corresponding matched controls. With this approach, we sought to explore the merits of this well-characterized GEM model of pancreatic cancer to determine whether our proteomics technology allows identification of protein changes associated with tumor development and whether such changes are relevant to human pancreatic cancer.

Mice and Plasma Pooling
The mice for proteomics analysis were obtained by breeding Pdx1-Cre Ink4a/Arf lox/lox and Kras G12D Ink4a/Arf lox/lox mice [13]. All mice were bred five generations onto an FVN/n genetic background. Experimental Pdx1-Cre Kras G12D Ink4a/Arf lox/lox mice and control Kras G12D Ink4a/Arf lox/lox and Pdx1-Cre Ink4a/ Arf lox/lox mice were euthanized at age 5.5 or 7 wk (Figure 1). Lethal comas were induced by injecting mice IP with a 0.6-0.8 ml 5% Avertin (2,2,2-Tribromoethanol, Sigma-Aldrich, part number T4,840-2). Blood was obtained by cardiac puncture using a 1-ml syringe with 22-gauge needle. Blood was placed in K 3 EDTA coated tubes (Fisher) and centrifuged at 4 8C for 5 min at 3,000 rpm. The supernatant (plasma) was removed and frozen in 100 ll aliquots on dry ice and stored at À80 8C. In all cases, the mice were subjected to autopsy, and the pancreas was fixed for histological analysis. Mice were excluded from the study if they exhibited extra-pancreatic pathology as is observed in a subset of Pdx1-Cre Kras G12D mice [9]. Pooling of samples was based on age as well as the extent of the disease based on histological examination. For early stage PanIN pool (PanIN-1 to PanIN-3 lesions) plasma analysis, median age of the mice was 5.5 wk, while for PDAC plasma analysis median age was 7 wk. Approximately one-third of the Kras Ink4a/Arf mice present with the most common pathology observed in human cases-glandular. Thus, in our selection of mice with PDAC, we only used the corresponding plasma if the tumor areas were almost exclusively glandular (i.e., less than ;5% nonglandular pathologies). Age matched controls were used for both PanIN and PDAC. All mice were male.

Sample Preparation
PDAC, PanIN, and respective control plasma pools obtained from seven to eight individual mice (1 ml of each pool) were individually immunodepleted of the top three most abundant proteins (albumin, IgG, and transferrin) using a Ms-3 column (4.6 3 250 mm; Agilent). Briefly, columns were equilibrated with buffer A at 0.5 ml/min for 13 min, and aliquots of 75 ll of the pooled sera were injected after filtration through a 0.22-lm syringe filter. The flow-through fractions were collected for 10 min at a flow rate of buffer A of 0.5 ml/min, combined and stored at À80 8C until use. The column bound material was recovered by elution for 8 min with buffer B at 1 ml/min. Subsequently, immunodepleted samples were concentrated using Centricon YM-3 devices (Millipore) and rediluted in 8 M urea, 30 mM Tris (pH 8.5), 0.5% OG (octyl-beta-d-glucopyranoside, Roche). Samples were reduced with DTT in 50 lL of 2 M Tris-HCl (pH 8.5) (0.66 mg DTT/mg protein), and isotopic labeling of intact proteins in cysteine residues were performed with acrylamide. Normal control samples received the light acrylamide isotope (D0 acrylamide) (.99.5% purity, Fluka), and PDAC and PanIN cancer samples received the heavy 2,3,39-D3acrylamide isotope (D3 acrylamide) (.98% purity, Cambridge Isotope Laboratories). Alkylation with acrylamide was performed for 1 h at room temperature by adding to the protein solution 7.1 mg D0-acrylamide or 7.4 mg D3acrylamide per milligram protein, diluted in a small volume of 2 M Tris-HCl (pH 8.5) [16].

Protein Fractionation
The two sets of samples (PDAC 3 control and PanIN 3 control) were processed in the same identical way. The 2-D protein fractionation has been performed on the basis of the Intact-Protein Analysis System (IPAS) approach [3,17,18], with some modifications. The workflow is summarized in Figure 2. Briefly, after isotopic labeling, the cancer plasma pool and normal pool were mixed, diluted to 10 ml with 20 mM Tris in 6% isopropanol, 4 M urea (pH 8.5), and immediately injected in a Mono-Q 10/100 column (Amersham Biosciences) for the anion-exchange chromatography, the first dimension of the protein fractionation. The buffer system consisted of solvent A (20 mM Tris in 6% isopropanol, 4 M urea [pH 8.5]) and solvent B (20 mM Tris in 6% isopropanol, 4 M urea, 1 M NaCl [pH 8.5]). The separation was performed at 4.0 ml/min in a gradient of 0% to 35% solvent B in 44 min; 35% to 50% solvent B in 3 min; 50% to 100% solvent B in 5 min; and 100% solvent B for an additional 5 min. A total of 12 pools were collected and run individually in reversed-phase chromatography, the second dimension of the process. The reversed-phase fractionation was carried out in a Poros R2 column (4.6 3 50 mm, Applied Biosystems) using TFA/Acetonitrile as buffer system (solvent A: 95% H2O, 5% Acetonitrile, 0.1% TFA and solvent B: 90% Acetonitrile, 10% H 2 O, 0.1% TFA) at 2.7 ml/min. The gradient used was 5% solvent A until absorbance reached base line (desalting step) and then 5%-50% solvent B in 18 min; 50%-80% solvent B in 7 min; and 80%-95% solvent B in 2 min. Sixty fractions of 900 ll were collected during the run, corresponding to a total of 720 fractions. Aliquots of 200 ll of each fraction, correspondent approximately of 20 lg of protein, were separated for mass-spectrometry shotgun analysis.

Mass Spectrometry Analysis
For protein identification we performed in-solution trypsin digestion with the lyophilized aliquots of the 720 individual fractions. Individual digested fractions 4 to 60 from each reversed-phase run were pooled in 13 pools, corresponding to a total of 156 fractions for analysis from each PDAC and PanIN experiments. Digests were analyzed in a LTQ-FT mass spectrometer (Thermo-Finnigan) coupled to a nano-Aquity nanoflow chromatography system (Waters). The liquid chromatography separation was performed in a 25-cm column (Picofrit 75 lm ID, New Objectives, in-housepacked with MagicC18 resin) using a 90-min linear gradient from 5% to 40% of acetonitrile in 0.1% formic acid at 250 nl/ min. The spectra were acquired in a data-dependent mode in m/z range of 400 to 1,800, with selection of the five most abundant þ2 or þ3 ions of each MS spectrum for MS/MS analysis. Mass spectrometer parameters were: capillary voltage of 2.1 KV, capillary temperature of 200 8C, resolution of 100,000, and FT target value of 2,000,000.

Protein Identification
The acquired data were automatically processed by the Computational Proteomics Analysis System (CPAS) [19]. Searches were performed considering cysteine alkylation with the light form of acrylamide as a fixed modification and heavy form of acrylamide (þ3.01884) as a variable modification. For the identification of proteins with false discovery rate (FDR) , 1%, LC/MS/MS spectra of PDAC and PanIN samples were subjected to tryptic and semi-tryptic searches against a database consisting of forward and reversed mouse IPI databases released in 01/2006 (v.3.12) using X!Tandem [20]. The database search results were then analyzed by PeptideProphet [21] and ProteinProphet [22] programs. Our high confidence list of identifications retained proteins with ProteinProphet scores ! 0.95 and two or more peptides per protein. For PDAC, 18,409 unique peptides corresponding to 1,040 proteins were identified in forward sequence, whereas only eight peptides corresponding to four proteins were identified in reversed sequence, resulting in a false positive identification rate for peptides of 8/18,409 or 0.04%, and proteins of 4/1,040 or 0.4%. For PanIN, 16,319 unique peptides, corresponding to 559 proteins were identified in forward sequence, whereas only five peptides, corresponding to two proteins were identified in reversed sequence, and this resulted in a false positive identification rate for peptides of 5/16,319 or 0.03%, and proteins of 2/559 or 0.4%. A secondary list of protein identifications with less than 5% FDR consisted of tryptic searches using the same algorithm and databank, but only proteins with ProteinProphet score . 0.7 and PeptideProphet score . 0.2 were retained. The result with ,1% FDR searches were later appended with data from the 5% FDR searches on the basis of external cross-correlated biological information from different sources, such as tissue specificity or mRNA expression in pancreatic cancer. The number of MS events (spectral counts) was obtained for all the proteins with less than 5% FDR (including less than 1% FDR) from tryptic searches only.

Quantitative Analysis of Acrylamide Isotopes
The quantitative approach consisted of differential labeling of peptides containing cysteine with acrylamide isotopes (heavy or light) [16]. Quantitative information was extracted using a script designated ''Q3'' that was developed in-house to obtain the relative quantification for each pair of peptides identified by MS/MS that contains cysteine residues [16]. Only peptides with a minimum of 0.75 PeptideProphet score and mass deviation inferior to 20 ppm were considered. Peptide isotopic ratios were plotted in logarithmic scale in a histogram and the median of the distribution was centered at zero. This normalization approach was chosen since the great majority of proteins were not expected to be dysregulated in cases compared to controls ( Figure S1). All normalized peptide ratios for a specific protein were averaged to compute an overall protein ratio. Proteins with quantitative information presented as ''cancer only,'' only had detected peptides labeled with the heavy form of acrylamide. All peptide and protein ratios were calculated in logarithmic scale, but reported in linear scale. Statistical significance of the protein quantitative information was obtained via two procedures: (i) for those proteins with multiple peptides quantified, a p-value for the mean log-ratio, which has mean zero under the null hypothesis, was calculated using one-sample t-test; (ii) for proteins with a single paired MS event, the probability for the ratio was extrapolated from the distribution of ratios in a control-control experiment whereby the same sample was labeled with heavy and light acrylamide ( Figure S1).

mRNA Analysis of Pancreatic Tissue
Total pancreas RNA was isolated from wild-type FVB/n mice using the Trizol reagent protocol (Invitrogen) with the slight modifications; in brief, freshly harvested pancreas was homogenized in 15 ml Trizol, centrifuged, and the aqueous layer was extracted with chloroform, and finally isopropanol precipitation was performed by adding 0.5 volumes high salt buffer (0.8 M NaCitrate/1.2 M NaCl) and 0.5 volume isopropanol. A second round of purification was performed using the RNAeasy kit (Qiagen). Total RNA from PDAC arising in Pdx1-Cre LSL-KrasG12D Ink4a/Arflox/lox mice was extracted using the Trizol Reagent and then by RNAeasy using the standard protocols. Expression profiling of normal pancreas (n ¼ 2 specimens) and PDAC RNA (n ¼ 4 specimens) was performed on Affymetrix 430 A2.0 microarrays.

Statistical Analysis of ELISA Data
Prior to statistical analysis, all candidate markers had their protein concentration standardized on the basis of the control group concentration mean. In that way all candidate marker concentrations have mean 0 and variance 1 in the control group. In short, if mu0 and sd0 are the mean and standard deviation of a candidate marker, their standardized concentration (Y9) will be Y9 ¼ (y À mu0)/sd0. This method facilitates cross-candidate marker comparison and places all markers on the same scale [24,25]. p-Values for individual markers were computed using the nonparametric Wilcoxon rank-sum test. To avoid over-fitting issues, composite markers summarizing a panel were generated using a predefined combination rule that considers the panel positive if any individual marker is positive (e.g., exceeds a threshold on the standardized scale). p-Values that measure whether the AUC of the composite markers are statistically different from CA19.9 were computed using a method described by DeLong et al. [26].

Human Samples
Newly diagnosed serum samples from patients were obtained at the time of diagnosis following informed consent using IRB-approved guidelines from the University of Michigan. A total of 30 serum samples were obtained from patients with a confirmed diagnosis of pancreatic adenocarcinoma who were seen in the Multidisciplinary Pancreatic Tumor Clinic at the University of Michigan Comprehensive Cancer Center. Anonymous serum samples from the pancreatic cancer patients were randomly selected from a clinic population that consists of 15% of individuals presenting with early stage (i.e., stage 1/2) disease and 85% presenting with advanced stage (i.e., stage 3/4). The information on individual characteristics is presented in Table S1. Inclusion criteria for the study consisted of confirmed diagnosis of pancreatic cancer, the ability to provide written informed consent, and the ability to provide 40 ml of blood. Exclusion criteria included chemotherapy or radiation therapy prior to blood draw and a diagnosis of other malignancies within 5 y from the time of blood draw. Sera were also obtained from 15 patients with chronic pancreatitis who were seen in the Gastroenterology Clinic at University of Michigan Medical Center and from 20 control healthy individuals collected at the University of Michigan under the auspices of the Early Detection Research Network (EDRN). The mean age of the tumor group was 65 y and of the chronic pancreatitis group was 54 y. Individuals from whom control sera were obtained were age and sex matched to the tumor group. All of chronic pancreatitis sera were collected in an elective setting in the clinic in the absence of an acute flare. All blood and sera were collected and processed using the same standardized protocol. Blood samples were maintained at room temperature for 30-60 min to allow the clot to form and then centrifuged at 1,300 3 g at 4 8C for 20 min. The serum was removed, transferred to a polypropylene capped tube in 1 ml aliquots, and frozen. The frozen samples were stored at À70 8C until assayed. All serum samples were labeled with a unique identifier. None of the samples were thawed more than twice before analysis.
To address the relevance of proteins observed up-regulated in the PanIN stage mouse model plasma, we submitted a proposal to the Carotene and Retinol Efficacy Trial (CARET), a cohort study that involved 18,314 individuals with increased cancer risk, to do a blinded validation study of our relevant proteins. CARET identified all individuals (13) in this cohort from whom blood was collected approximately a year prior to the diagnosis of pancreatic cancer (actual mean ¼ 10 mo), at a time when they were completely asymptomatic, as well as matched controls that were not diagnosed with cancer over a 4-y follow-up period, irrespective of their state of general health otherwise. The information on individual characteristics is presented in Table S2.

Proteomic Analysis of Mouse Plasma
Plasma obtained from PDAC-prone mice engineered with activated Kras and Ink4a/Arf deficiency [13] was subjected to proteomic analysis. The study was designed to test directly whether current proteomics technologies allow for quantitative analysis and identification of protein changes associated with tumor development in the mouse and whether such changes have relevance to human tumors.
Mice harboring Pdx1-Cre Kras G12D Ink4a/Arf lox/lox mutations exhibit stereotypical neoplastic progression from pancreatic cancer precursor lesions (PanINs) present at ;2 wk of age to advanced PDAC by 6 to 10 wk of age [13]. A plasma pooling strategy was applied for in-depth proteomic analysis. Blood was obtained from mice at the PanIN stage and at the PDAC stage (at 5.5 and 7 wk, respectively) and from age and sex matched controls, thus constituting four pools of plasma ( Figure 1). To guarantee a good homogeneity among pooled plasma samples, the tumor stage was confirmed for individual mice by histopathology prior to pooling. For quantitative proteome analysis, we applied differential isotopic labeling to each tumor pool and its matched control [16], followed by extensive fractionation of intact proteins [3]. The experimental workflow is presented in Figure 2.
Each experiment generated 156 plasma fractions on the basis of anion-exchange and reversed-phase chromatography, which were analyzed separately by liquid chromatographytandem mass spectrometry (LC-MS/MS) following tryptic digestion. Some 2,800,000 mass spectra were produced and analyzed in this study. Collectively, the PanIN and PDAC experiments resulted in a primary list of 1,095 unique high confidence proteins with ,1% FDR on the basis of reversedatabase searches (Table S3 presents the full list of protein identifications). To this primary list, we appended 347 additional proteins with ,5% FDR (Table S4). The latter proteins had corresponding mRNA expression in pancreas tissue .2-fold compared to the mean of 61 mouse tissue expression surveys from published data [27] and/or mRNA expression in pancreatic cancer .2-fold compared to normal tissue, in mouse (this study) or human (prior study [28]).
On the basis of UniProt keywords, 25% of identified proteins in the list of 1,442 proteins contained a signal peptide for secretion, and 20% were annotated as glycoproteins. Of note, the list contained a relatively large percentage (9%) of membrane proteins on the basis of Gene Ontology cellular component annotation. Peptides for several membrane proteins identified were derived exclusively from the extracellular domain. Epidermal growth factor receptor, for example, was detected in several fractions with peptides spanning amino acids 25 to 647 representing the extracellular N-terminal domain. These results are consistent with shedding of extracellular domains into the circulation [29].
To estimate the concentration range of mouse plasma proteins identified, we correlated spectral counting data (number of MS2 events/protein) [30] to known concentrations of proteins in plasma (http://www.rulesbasedmedicine.com/). We observed a significant correlation between spectral counts for a given protein and its plasma protein concentration (R 2 ¼ 0.84) ( Figure 3A). From this analysis, we estimated that our proteomic approach allowed for identification of plasma proteins across seven orders of magnitude and detection of some proteins in mouse plasma at concentrations as low as 1 ng/ml. In addition, the number of proteins identified was greater at lower predicted plasma concentrations on the basis of spectral counts ( Figure 3B), indicating substantial depth of analysis achieved with extensive protein fractionation.
The majority of medium to high abundance proteins were detected in both PanIN and PDAC experiments, while most differences in protein identifications between the two experiments represented lower abundance proteins ( Figure S2A). Likewise, in duplicate LC-MS/MS analysis of the same fractions, most differences in protein identifications observed represented lower abundance proteins ( Figure S2B). Similar experiments in which independent replicates of samples were analyzed resulted in 60% of protein sampling/ identification in both experiments [3]. These differences in protein identifications between the two experiments are largely attributed to mass spectrometry limitations in dynamic range and speed, specifically when analyzing complex samples such as plasma. In addition to mass spectrometry limitations, some of the differences observed between the two experiments may result from occurrence of some proteins at a higher level of abundance at the PDAC stage compared to PanIN. Importantly, since in each experiment (PDAC and PanIN) cancer and respective control samples were analyzed together after isotopic labeling followed by mixing, methodological variations related to fractionation and sample processing were minimized.

Tumor Related Changes in Mouse Plasma
We used acrylamide isotopic labeling of cysteine residues to obtain relative quantitative information between disease and control samples. This labeling approach is chemically very efficient as evidenced by lack of unlabeled cysteines in searching mass spectra [16]. Additionally, this labeling chemistry is fully compatible with the intact protein approach, without significantly affecting protein physicalchemical characteristics. In duplicate experiments performed with independent replicates of samples, there were no proteins that showed quantitative inconsistencies (up-regulated in one experiment and down-regulated in the other) (unpublished data). Among the 621 quantified proteins, 165 were found to be up-regulated (!1.5, p , 0.05) in cancer samples (PDAC or PanIN or both) compared to controls (Table S5).
A significant proportion of plasma proteins is synthesized in the liver and may be affected as part of the host response.
To distinguish between such classical plasma proteins from proteins that may be derived from the pancreas in our dataset, we cross-referenced the 1,442 proteins identified in our analyses with published proteome profiles of mouse liver tissue [31,32]. Approximately 38% of the 1,442 proteins were identified in mouse liver tissue, consisting mostly of relatively abundant plasma proteins. Sixty-seven of these proteins showed increased levels with tumor development in the mouse (Table S5). In contrast, proteins estimated to be of low abundance in the protein list had a much greater representation of pancreatic proteins relative to liver proteins on the basis of tissue protein and/or mRNA data ( Figure S3).
The following criteria were applied to select a subset of proteins potentially relevant to pancreatic cancer: (i) mean protein ratio in neoplasm/normal plasma ! 1.5 (p , 0.05) in PDAC and PanIN on the basis of isotopic labeling ratios, and/ or occurrence of isotope-labeled peptides in cancer samples but not in controls; (ii) not known to represent acute-phase reactants, complement or coagulation proteins according to Ingenuity Pathway Analysis annotation (Ingenuity Systems) (Table S5); and (iii) mouse protein has a corresponding ortholog gene in human. Also included in this list were proteins that were similarly elevated in either PDAC or PanIN and that had evidence of increased expression of corresponding genes in pancreatic cancer for mouse (data obtained in this study) and for human [28]. These criteria resulted in subset of 45 proteins of potential interest from the set of 165 up-regulated proteins (Table 1).
To further support our findings we measured protein levels in mouse pancreatic tissue and in mouse plasma for a subset of up-regulated proteins. These proteins were selected on the basis of their potential relationship with pancreatic cancer and the availability of antibodies and ELISA kits with the requisite specificity. IHC analysis was done for CD166 antigen precursor (ALCAM), receptor-type tyrosine-protein PTPRG, TIMP1, and tenascin C (TNC). All tested proteins demonstrated strong IHC staining in mouse PanIN and pancreatic cancer tissue sections (Figure 4). Circulating protein levels of ALCAM, ICAM1, and TIMP1 in the same mouse plasma used in the proteomic approach were measured by ELISA ( Figure  5). ALCAM, ICAM1, and TIMP1 had significantly higher levels in PDAC mice plasmas. TIMP1 was significantly elevated in PanIN plasma samples as well.

Relevance of Mouse Findings to Human Pancreatic Cancer
The relevance to human pancreatic cancer of proteins upregulated in mouse plasma with tumor development was investigated using human tissue and/or blood samples. Immunohistochemistry was performed for PTPRG, TNFRSF1A, and TNC, all of which showed positive IHC staining in human pancreatic cancer (Figure 3). Data for ALCAM, ICAM1, LCN2, TNFRSF1A, TIMP1, REG1A, REG3, WFDC2 (whey-acidic protein [WAP] four-disulfide core domain 2), and IGFBP4 were obtained by ELISA. These proteins that were up-regulated in plasma from tumorbearing mice were assayed in human sera from 30 patients with PDAC to assess their significance individually and as a panel, together with CA19-9, a marker that is currently in clinical use as a pancreatic cancer marker (Table 2) [33]. As a control group, we analyzed sera from 20 matched healthy individuals and ten to 15 individuals with chronic pancreatitis, obtained using the same protocol and storage conditions. Information regarding patient characteristics and tumor stage is provided in Table S1. Statistical analysis was performed for individual proteins and for the entire panel as a group. All but one of the proteins were significantly elevated in cancer compared to one or both control groups (p , 0.03). Seven proteins were compared between cancer and both control groups, and five of seven were significant in both comparisons ( p , 0.03) ( Figure S4 for box plots, Table 2). Only one protein (LCN2) did not achieve statistical significance. For proteins that yielded statistically significant differences between cancer and healthy individuals, the areas under the curve (AUCs) ranged between 0.75 and 0.89 ( Figure  S5), and between cancer and pancreatitis the AUCs ranged between 0.74 and 0.92. Of note, a panel of all the proteins tested, inclusive of those that did not achieve statistical significance individually so as to avoid any overfitting, yielded an AUC of 0.96 in contrast to CA 19-9, which yielded an AUC of 0.79 ( Figure 6A and 6B).
Plasma analysis of mice at the PanIN stage allows us to test whether protein changes in plasma observed at an early tumor stage in the mouse may be up-regulated in individuals with pancreatic cancer before actual clinical diagnosis. To that effect, a blinded analysis was conducted using sera collected as part of CARET, which included 18,314 participants [34]. The CARET study was intended to test the effect of daily beta-carotene and retinyl palmitate on cancer incidence and death in individuals with a history of smoking or asbestos exposure. All participants (13) in the cohort diagnosed with pancreatic cancer between 7-13 mo following a blood draw (mean ¼ 10 mo) and an equal number of controls that were matched for age, sex, year of CARET enrollment, and time of blood draw in relation to enrollment and who were not diagnosed with pancreatic cancer on the basis of information in the CARET database, were identified by CARET for the blinded pancreatic cancer validation study. The pancreatic cancer and control groups were also matched for CARET intervention. Information regarding CARET patient characteristics and tumor stage is provided in Table  S2. We tested five proteins that were up-regulated in mouse plasma at the PanIN stage (LCN2, REG1A, REG3, TIMP1, and IGFBP4) together with CA19.9, without knowledge of which individuals developed pancreatic cancer subsequent to the blood draw and which individuals were matched controls ( Table 2). When tested individually, two of the five proteins (IGFBP4 and TIMP1) showed significance at 0.05 and 0.04, respectively. CA19.9 was significant at 0.04. As a panel, the five proteins achieved an AUC of 0.817 ( p ¼ 0.005), inclusive of the three proteins that did not achieve statistical significance individually to avoid any overfitting. When the panel of five proteins was combined with CA19.9, an AUC of 0.911 was achieved ( Figure 6C).

Discussion
Our findings here indicate that plasma proteomic analysis of GEM models of cancer provide a useful strategy to identify candidate markers applicable to human cancer with potential utility for early detection. This is very relevant, since there is a compelling need to develop blood-based markers that allow early cancer detection, classify tumors to direct therapy, and monitor disease progression, regression, or recurrence. Early detection is particularly relevant to pancreatic adenocarcinoma, which is the fourth leading cause of cancer death in the United States and with a 5-y survival rate of only 3%. Because (B) TIMP1 was also elevated in plasma of PanIN mice. ALCAM overall (cancer plus controls) concentration in mouse plasma was 19 ng/ml; ICAM1 was 163 ng/ml; and TIMP1was 6.2 ng/ml. The low ng/ml concentrations of these proteins support the substantial depth of analysis achieved with our discovery platform. Normalization of concentration was performed as described in the Materials and Methods. doi:10.1371/journal.pmed.0050123.g005 of limitations in diagnostic methods and a lack of specific symptoms at an early stage, the disease is often diagnosed at late stages. In contrast, early stage disease is associated with prolonged survival following surgical resection of the tumor [35]. Therefore, improvement in means to detect pancreatic cancer early would be expected to impact outcome.
While published studies have pointed to the merits of proteomics for cancer marker identification, the challenge of discovering markers applicable to early detection has been substantial. Mass spectrometry has evolved from a tool to identify and characterize isolated proteins or for mass peak profiling to a platform for interrogating complex proteomes. However, even with recent improvements in sensitivity and mass accuracy, the complexity of the plasma proteome far exceeds the current capabilities of mass spectrometry to fully resolve their individual protein and peptide constituents in a single analysis. Current strategies to achieve in-depth coverage require sample fractionation followed by separate analyses of individual fractions or capture of protein or peptide subsets [2]. The depth of proteomic analysis achieved in this study through extensive fractionation of intact proteins and reliance on high-resolution mass spectrometry has allowed identification of low abundance proteins [3]. In addition, reliance on acrylamide isotope labeling of cysteines has allowed quantitative measures to be derived from mass spectrometric analysis of the plasma proteome. More importantly, the identification of proteins is not restricted to peptides containing cysteine residues, since in our workflow there is no capture step of isotopically labeled peptides as in the isotope-coded affinity (ICAT) method tags [36], thus providing a comprehensive list of peptides in the digests and consequently better protein coverage and confidence in protein identification. Such depth of analysis is necessary to identify potential tumor specific biomarkers and to extend discovery beyond abundant protein changes resulting from inflammation or acute-phase reaction. As a result, changes in plasma proteins relevant to pancreatic cancer could be identified across a wide dynamic range of protein abundance.
Some of the proteins identified in this study as potentially relevant to pancreatic cancer have already been associated with cancer, as evidenced from Ingenuity Pathway Analysis. In total, 13 proteins were previously investigated in pancreatic cancer tissue or for a smaller number in human blood by immunoassay (Table 1) and found to be elevated. Among those, MMP2 and its inhibitor TIMP1 are known to be involved in tumor progression and extracellular matrix degradation [37]. REG1A and REG3 are proteins highly secreted by pancreatic islet cells and have also been described as potential markers for pancreatic diseases [38]. ICAM1 and TNC are involved, respectively, in cellular attachment and inhibition of adhesion of cells to the extracellular matrix [39,40]. TNFRSF1A, has been associated with the acute-phase process [41].
Elevated levels of ALCAM, IGFBP4, LCN2, and WFDC2 in circulation in pancreatic cancer are novel findings. ALCAM is a cell adhesion molecule critical to tumor development and progression [42]. The form of ALCAM detected in circulation, corresponds to the shed extracellular domain of this integral membrane protein. The process of shedding is promoted by metalloproteases [43]. Overexpression of IGFBP4, the smallest protein from the IGF binding protein family, has been related to tumor growth [44]. LCN2 has been shown to play a role in regulating cellular growth and metastasis in colon cancer [45] and to be overexpressed in pancreatic cancer at the mRNA levels [46], concordant with gene expression analysis of tumors from our mouse model (Table 1). Interestingly, WFDC2 (or HE4), a promising biomarker for ovarian cancer [47], was found in our study to be upregulated in mouse PDAC plasma, with concordant mRNA expression. Additionally, WFDC2 was listed as up-regulated at the gene and protein levels in human PDAC tissue in a recent study [48], suggesting that this protein may also have relevance to pancreatic cancer. The whey-acidic protein (WAP) family has been described to be involved in tumor progression through the regulation of the NFkB signaling pathway [49], and a second member of this family, secretory leucocyte proteinase inhibitor (SLPI), is among our list of candidates up-regulated in both PDAC and PanIN mouse samples. PTPRG, a tyrosine phosphatase receptor that was validated in this study in both mouse and human by immunohistochemistry, has been recently described in gastric cancer as a potential tumor suppressor gene that is methylated in metastatic cells [50]. All together, the prior association of proteins identified in this study with cancer and for some with demonstrated function in pancreatic cancer is indicative of the utility of mouse models for deciphering protein changes relevant to pancreatic and other cancers in humans. Also, it should be emphasized that previously, these proteins were studied independently of each other and not identified through a systematic profiling study as presented here.
Because mice can be sampled at defined stages of tumor development and under controlled breeding conditions, greater standardization is possible using mouse models compared to human studies. Mouse models also allowed in this study investigations at an early stage of tumor development (PanIN), allowing identification of proteins associated with early events in tumorigenesis. The strong concordance between mouse and human pancreatic cancer in both tissue and circulating markers is striking. From the list of nine candidate markers found elevated by proteomics and validated in human samples, only LCN2 was not significantly elevated.
Our analysis of candidate protein markers in newly diagnosed patient samples confirmed that CA19-9 discriminates pancreatic cancer at the time of diagnosis well from healthy controls (see Table S1). CA19-9 levels were elevated in more than 80% of patients compared with healthy controls. However the sensitivity and specificity of CA19-9 in other settings relevant to pancreatic cancer, namely in discriminating between pancreatitis and pancreatic cancer and for detecting cancer at an early stage, are much reduced compared with its power to discriminate newly diagnosed pancreatic cancer and healthy individuals [51], hence the need for additional markers to constitute a panel with improved sensitivity and specificity for discriminating pancreatic cancer from pancreatitis and for detecting the disease at an early stage prior to onset of symptoms. In this respect, TIMP1 and ICAM1 had superior performance when cancer samples were compared to samples from pancreatitis patients. The panel of candidate markers that we tested, together with CA19-9, significantly improved sensitivity and specificity in preclinical samples. The next steps in building on our findings include developing high throughput assays for additional candidate markers identified, for which such assays are not currently available, and to expand validation studies to address specific applications, notably for implementing a panel-based test to distinguish between pancreatitis and pancreatic cancer and to further assess the utility of a panel approach for detecting pancreatic cancer early among individuals at increased risk of developing the disease. Figure S1. Distribution of Quantitative Events (A) Equal amounts of total immunodepleted nonfractionated human plasma were labeled with heavy and light acrylamide and analyzed Figure 6. ROC in Assays of Human Samples for Two Panels of Proteins Identified in Proteomic Analysis of Plasmas from Tumor-Bearing Mice ROC curves based on ELISA measurements of ALCAM, ICAM1, LCN2, TIMP1, REG1A, REG3, and IGFBP4 as a panel with or without CA19-9 comparing pancreatic cancer versus healthy controls (A) and pancreatic cancer versus pancreatitis (B). This panel of candidates was chosen on the basis of upregulation in tumor-bearing mice. As expected, CA19-9 performed well in comparisons with healthy individuals as controls; however, the chosen panel was significantly better than CA19-9 alone when pancreatitis patients were used as controls.

Supporting Information
(C) The panel tested with prediagnostic sera (LCN2, TIMP1, REG1A, REG3, and IGFBP4) was chosen on the basis of up-regulation at the PanIN stage. This panel performed slightly better in comparison to CA19-9 but a combination of CA19-9 with the panel of candidates significantly improved discrimination between early stage (prediagnostic) sera and matched controls. Standardization procedures and composite marker ROCs were generated without fitting, by inclusion of all tested candidate markers. Specimens from controls and pancreatitis patients were obtained from the same institution and with the same protocol for blood collection. For details see Materials and Methods and Tables S1 and S2. doi:10.1371/journal.pmed.0050123.g006 with LC-MS/MS. The histogram represents the distribution of 4,371 quantitative events. From this control-control events distribution, the number of events that exceeds a given ratio was determined. For instance, there were 125 up-regulated events (ratio ! 2.0), which corresponds to 2.8% (p ¼ 0.028   The 1,442 proteins identified were correlated to tissue specificity using published datasets from mouse liver proteomic profiling studies [31,32] and one human tissue mRNA expression study [27]. Proteins estimated to be of low abundance (,100 ng/ml) had a much greater representation of pancreatic proteins relative to liver proteins based on tissue protein and/or mRNA. Protein concentration was estimated on the basis of MS2 events (Figure 2

Accession Numbers
All the proteomic data generated in this study for the PDAC mouse model are available in the Mouse Plasma Peptide Atlas Project (http:// www.peptideatlas.org/repository/).

Editors' Summary
Background. Cancers are life-threatening, disorganized masses of cells that can occur anywhere in the human body. They develop when cells acquire genetic changes that allow them to grow uncontrollably and to spread around the body (metastasize). If a cancer is detected when it is still small and has not metastasized, surgery can often provide a cure. Unfortunately, many cancers are detected only when they are large enough to press against surrounding tissues and cause pain or other symptoms. By this time, surgical removal of the original (primary) tumor may be impossible and there may be secondary cancers scattered around the body. In such cases, radiotherapy and chemotherapy can sometimes help, but the outlook for patients whose cancers are detected late is often poor. One cancer type for which late detection is a particular problem is pancreatic adenocarcinoma. This cancer rarely causes any symptoms in its early stages. Furthermore, the symptoms it eventually causes-jaundice, abdominal and back pain, and weight loss-are seen in many other illnesses. Consequently, pancreatic cancer has usually spread before it is diagnosed, and most patients die within a year of their diagnosis.
Why Was This Study Done? If a test could be developed to detect pancreatic cancer in its early stages, the lives of many patients might be extended. Tumors often release specific proteins-''cancer biomarkers''-into the blood, a bodily fluid that can be easily sampled. If a protein released into the blood by pancreatic cancer cells could be identified, it might be possible to develop a noninvasive screening test for this deadly cancer. In this study, the researchers use a ''proteomic'' approach to identify potential biomarkers for early pancreatic cancer. Proteomics is the study of the patterns of proteins made by an organism, tissue, or cell and of the changes in these patterns that are associated with various diseases.
What Did the Researchers Do and Find? The researchers started their search for pancreatic cancer biomarkers by studying the plasma proteome (the proteins in the fluid portion of blood) of mice genetically engineered to develop cancers that closely resemble human pancreatic tumors. Through the use of two techniques called high-resolution mass spectrometry and acrylamide isotopic labeling, the researchers identified 165 proteins that were present in larger amounts in plasma collected from mice with early and/or advanced pancreatic cancer than in plasma from control mice. Then, to test whether any of these protein changes were relevant to human pancreatic cancer, the researchers analyzed blood samples collected from patients with pancreatic cancer. These samples, they report, contained larger amounts of some of these proteins than blood collected from patients with chronic pancreatitis, a condition that has similar symptoms to pancreatic cancer. Finally, using blood samples collected during a clinical trial, the Carotene and Retinol Efficacy Trial (a cancer-prevention study), the researchers showed that the measurement of five of the proteins present in increased amounts at an early stage of tumor development in the mouse model discriminated between people with pancreatic cancer and matched controls up to 13 months before cancer diagnosis.
What Do These Findings Mean? These findings suggest that in-depth proteomic analysis of genetically engineered mouse models of human cancer might be an effective way to identify biomarkers suitable for the early detection of human cancers. Previous attempts to identify such biomarkers using human samples have been hampered by the many noncancer-related differences in plasma proteins that exist between individuals and by problems in obtaining samples from patients with early cancer. The use of a mouse model of human cancer, these findings indicate, can circumvent both of these problems. More specifically, these findings identify a panel of proteins that might allow earlier detection of pancreatic cancer and that might, therefore, extend the life of some patients who develop this cancer. However, before a routine screening test becomes available, additional markers will need to be identified and extensive validation studies in larger groups of patients will have to be completed.
Additional Information. Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed. 0050123.
The MedlinePlus Encyclopedia has a page on pancreatic cancer (in English and Spanish). Links to further information are provided by MedlinePlus The US National Cancer Institute has information about pancreatic cancer for patients and health professionals (in English and Spanish) The UK charity Cancerbackup also provides information for patients about pancreatic cancer The Clinical Proteomic Technologies for Cancer Initiative (a US National Cancer Institute initiative) provides a tutorial about proteomics and cancer and information on the Mouse Proteomic Technologies Initiative