Comparison of Proteomic and Transcriptomic Profiles in the Bronchial Airway Epithelium of Current and Never Smokers

Background Although prior studies have demonstrated a smoking-induced field of molecular injury throughout the lung and airway, the impact of smoking on the airway epithelial proteome and its relationship to smoking-related changes in the airway transcriptome are unclear. Methodology/Principal Findings Airway epithelial cells were obtained from never (n = 5) and current (n = 5) smokers by brushing the mainstem bronchus. Proteins were separated by one dimensional polyacrylamide gel electrophoresis (1D-PAGE). After in-gel digestion, tryptic peptides were processed via liquid chromatography/ tandem mass spectrometry (LC-MS/MS) and proteins identified. RNA from the same samples was hybridized to HG-U133A microarrays. Protein detection was compared to RNA expression in the current study and a previously published airway dataset. The functional properties of many of the 197 proteins detected in a majority of never smokers were similar to those observed in the never smoker airway transcriptome. LC-MS/MS identified 23 proteins that differed between never and current smokers. Western blotting confirmed the smoking-related changes of PLUNC, P4HB1, and uteroglobin protein levels. Many of the proteins differentially detected between never and current smokers were also altered at the level of gene expression in this cohort and the prior airway transcriptome study. There was a strong association between protein detection and expression of its corresponding transcript within the same sample, with 86% of the proteins detected by LC-MS/MS having a detectable corresponding probeset by microarray in the same sample. Forty-one proteins identified by LC-MS/MS lacked detectable expression of a corresponding transcript and were detected in ≤5% of airway samples from a previously published dataset. Conclusions/Significance 1D-PAGE coupled with LC-MS/MS effectively profiled the airway epithelium proteome and identified proteins expressed at different levels as a result of cigarette smoke exposure. While there was a strong correlation between protein and transcript detection within the same sample, we also identified proteins whose corresponding transcripts were not detected by microarray. This noninvasive approach to proteomic profiling of airway epithelium may provide additional insights into the field of injury induced by tobacco exposure.


Introduction
Cigarette smoking, the leading cause of preventable death in the United States, is responsible for 440,000 deaths per year [1,2]. Smoking is the single most important risk factor in the development of lung cancer, the leading cause of cancer related death in the U.S., and of chronic obstructive pulmonary disease (COPD), the fourth leading cause of death overall [2]. Although smoking is strongly associated with diseases such as lung cancer and COPD, the mechanisms by which smoking contributes to their pathogenesis are not completely understood.
Cigarette smoke creates a field of molecular injury in the epithelial cells lining the entire respiratory tract. Changes include cellular atypia [3], allelic loss [4][5][6], and promoter hypermethylation [7]. Using oligonucleotide arrays and candidate gene approaches, our group and others have previously identified a number of mRNA expression changes that occur in the histologically normal airway epithelium in response to smoking [8][9][10][11][12] and in association with disease [13][14][15][16]. Furthermore, we have recently described smoking-induced changes in airway microRNA expression and their potential role in regulating the mRNA response to tobacco smoke [17]. In this study, we sought to extend this field of molecular injury to the protein level and characterize the effect of smoking on the airway epithelium proteome.
Prior studies have analyzed lung tissue from never, current and former smokers using two-dimensional electrophoresis (2DE) coupled with mass spectrometry, leading to the hypothesis that smoke exposure induces an unfolded-protein-like response [18]. Other studies identified lung-cancer-specific proteomic differences in bronchial epithelium obtained by biopsy from both ''healthy'' smokers and smokers with a history of lung cancer [19,20]. Though studies have been performed using pooled nasal lavage samples [21] and pooled exhaled breath condensate samples [22], little is known about either the effects of smoking on the proteome of airway epithelial cells, or the variability in this response between individuals. In the current study we examined the effects of smoking on the airway epithelial proteome by analyzing individual samples collected by bronchoscopy from the mainstem bronchus. The ability to collect data from individual samples lays the ground work for understanding variation in the proteomic response to cigarette smoke between individuals which may ultimately be useful for determining why only a subset of smokers develop lung cancer or COPD.
In this study, we profiled proteins and genes expressed within the same bronchial epithelium of never and current smokers via 1D-PAGE with LC-MS/MS and DNA microarrays respectively. The relationship between protein detection and mRNA expression was explored both globally and for individual proteins of interest. We found that the majority of airway proteins detected by mass spectrometry have their corresponding transcripts detected at measurable levels by microarray, and that changes at the protein level in response to cigarette smoke parallel smoking-induced changes in mRNA. This approach also detected proteins whose corresponding transcript expression was not detected by microarrays. This study represents the first application of this approach to the simultaneous proteomic and transcriptomic profiling of airway epithelium within the same individual, providing insight into the normal and smoking-affected airway proteome and the relationship between protein changes and the previously described changes in airway gene expression.

Study Population
The idemographics for subjects recruited into this study are shown in Table 1. The never and current smokers differed in age and cumulative tobacco exposure (as measured by pack-years of smoking) (p,0.05), but were similar for other demographics. None of the subjects were using inhaled medications.

Normal Airway Proteome
A total of 652 proteins were detected in one or more never smokers, with 197 proteins found in the majority of never smokers ( Figure 1). Proteins with molecular functions related to airway biology were over-represented among this list ( Table 2). The functional categorization of the normal airway proteome was compared to over-represented functional categories of the normal airway transcriptome among transcripts detected by microarray both in these same five never smoker samples as well as a larger previously described cohort of 22 never smokers [8]. mRNAs and proteins associated with nucleotide binding, and pyrophosphate activity were over-represented in both datasets (P DAVID-BH ,0.05).

Effect of Cigarette Smoking on the Large Airway Proteome
613 proteins were detected in one or more current smokers, and 169 proteins were detected in the majority of current smokers ( Figure 1). Three proteins differed in their rate of detection between current and never smokers at P Fisher #0.05. Aldehyde dehydrogenase 3B1 (ALDH3B1, NP_000685), a gene highly expressed in lung [47], was detected in all five never smokers and only one current smoker (P Fisher = 0.048). Palate, lung and nasal epithelium carcinoma associated protein precursor (PLUNC, NP_570913), a secretory protein in the upper respiratory tract was detected in four never smokers and absent in all current smokers (P Fisher = 0.048). Hypothetical protein DKFZP586A0522 protein (NP_054752) was also detected in four never smokers and absent in all current smokers (P Fisher = 0.048).
Due to the small sample size, a second list of differentially detected proteins was defined using a qualitative criterion: proteins

Oxidoreductase activity
Oxidoreductase activity, acting on the Aldehyde or oxo Group of donors 7.0*10 25

3.2*10 22
Oxidoreductase activity, acting on the Aldehyde or oxo group of donors, NAD or NADP as acceptor 3.3*10 25 1.8*10 22 Statistically enriched functional categories (FDR,0.05) and subcategories of the 197 proteins detected in the majority of never smokers as determined by DAVID. Overrepresented categories that contain more than two probe sets are included. Functional categories that are also over-represented (FDR,0.05) among transcripts detected in the all never smokers in this cohort are bolded. Functional categories that are also enriched (FDR,0.05) among transcripts detected in all never smokers from a previously published cohort [8]  were included if present in three or more samples of one class compared to the other. Twenty-three proteins differed between never and current smokers based on these criteria (Table 3).

Western Blotting
We validated mass spectrometry findings by immunoblot for three of the proteins that differed between never and current smokers ( Figure 2). PLUNC, uteroglobin and P4HB were selected from the list of twenty-three candidates based on their biologic interest, molecular weight, and antibody availability. Of these, PLUNC also had a Fisher exact p-value,0.05. Decreased levels of PLUNC and uteroglobin were confirmed among current smokers, although there was heterogeneity for uteroglobin among current smokers ( Figure 2). P4HB levels were elevated in two of the current smokers as compared to two never smokers.

Comparison of Protein and mRNA Expression
An average of 93% of proteins detected by mass spectrometry had at least one matching probe set on the HG-U133A array. Of these, an average of 86% had detectable gene expression (P detection ,0.05) in samples collected from the same participants demonstrating a significant level of co-detection (x 2 = 347, p = 2.2610 216 ). There was not a significant difference in the rate of co-detection between never and current smokers.
For select proteins where detection varied between never and current smokers, we examined the expression of the corresponding mRNA for smoking-related differential expression. PLUNC (NP_570913), ALDH3B1 (NP_000685), and hypothetical protein DKFZP586A0522 (NP_054752) were selected based on the results of the Fisher exact test. Uteroglobin (NP_003348) and the prolyl 4hydroxylase beta subunit (P4HB) (NP_000909) were selected based on their qualitative differences between never and current smokers. Within this cohort, mRNA expression positively correlated with protein detection for PLUNC, uteroglobin, and P4HB ( Figure 3).
The association between smoking and gene expression was also examined in a previously published cohort [8] from which we excluded a sample that overlapped with the samples used in this study ( Figure 3). Consistent with the protein detection data and the gene expression data from the present study, in this independent group of never and current smokers, ALDH3B1, hypothetical protein DKFZP586A0522, PLUNC and uteroglobin mRNA expression were higher in never smokers and P4HB gene expression was higher in current smokers. Additionally, we used this cohort to assess the potential confounding effects of age on the smoking-induced changes in candidate proteins identified in the current study. Within the previously published cohort, we identified 12 never and 12 current smokers matched within 1 year for age. A t-test performed on these age-matched 12 never smokers and 12 current smokers confirmed differential gene expression of ALDH3B1 (211004_s_at, p = 0.03), hypothetical protein DKFZP586A0522 (207761_s_at, p = 0.03), PLUNC (220542_s_at, p = 0.02), uteroglobin (205725_at, p = 0.0005), and P4HB1 (200654_at, p = 0.03). Differences in protein detection by mass spectrometry and transcript detection by microarray were also explored. In the matched samples, there was no expression by microarray of transcripts corresponding to 41 proteins that were detected in $50% of samples by mass spectrometry (Table 4). Additionally, expression of these transcripts was detected in #5% of the never and current smokers in the larger previously published dataset [8] of never and current smokers. Ten of these 41 proteins have been previously described in the erythrocyte proteome [48], which is not surprising given that brushings contain small numbers of red blood cells that lack nucleic acids.

Discussion
We applied 1D-PAGE coupled with LC-MS/MS to the study of the airway epithelium proteome and its response to cigarette smoke exposure. This study presents the first proteomic profile of a relatively pure population of bronchial epithelial cells obtained from bronchoscopy brushings. We also used differences in the rate of protein detection between never and current smokers to identify candidates for proteins that vary in abundance in response to tobacco-smoke exposure. The effect of smoking on several of these proteins was confirmed by Western blot. We also found that for many candidates, smoking similarly affected expression of the mRNA transcripts that gave rise to these proteins. This was accomplished by measuring gene expression in the same samples that were profiled at the proteomic level and in an independent data set. The majority of proteins identified by LC-MS/MS had detectable levels of their corresponding transcript by microarray. Differing methodologies may account for the stronger relationship between protein and gene expression reported here relative to prior studies [36,39,43,45,46].
Analysis of the proteome using 1D-PAGE coupled with LC-MS/MS resulted in the detection of 41 proteins for which expression of corresponding transcripts was not detected by microarray. Some of these failures to detect transcript expression could represent technical limitations of the microarray platform.
However, we were intrigued that several of the proteins whose transcripts were not detected by microarray represent erythrocytespecific proteins. This suggests that: 1) the airway epithelial samples collected for this study were likely contaminated with erythrocytes, and 2) that more generally, stable proteins may be detected by proteomic methods long after the mRNA which encodes for them has disappeared.
Using habitual smoking as a paradigm for inhalational exposures affecting airway epithelium, we have identified changes in protein among smokers by LC-MS/MS and validated select changes with Western blotting. A decrease in the short isoform of PLUNC has previously been described in the pooled nasal lavage fluid of current smokers when compared with nonsmokers [21]. Although the exact function of this protein is unclear, it is thought to act in the inflammatory response to inhaled irritants such as tobacco smoke. Other studies have demonstrated decreased levels of uteroglobin, an anti-inflammatory protein secreted by Clara cells, in the BAL [49], pooled nasal lavage fluid [21], and serum [50] of healthy smokers and in the bronchial epithelium of former smokers with COPD undergoing lung transplantation [51]. P4HB has been detected in a proteomic analysis of cell surface proteins of a lung adenocarcinoma cell line [52] and in the 2DE-proteomic analysis of resected lung adenocarcinomas [46]. This protein may function in the anti-oxidant response to cigarette smoke [46]. Other proteins with oxidoreductase activity identified by this approach, such as ALDH3B1, have not previously been linked to cigarette smoking at the protein level but may function in the airway epithelial response to the toxins in cigarette smoke. None of the proteins differentially detected in smokers in this study overlapped with proteins previously described as differentially expressed in the lungs of Winstar rats exposed to cigarette smoke [53], or proteins differentially detected by 2DE/MALDI-TOF in a human pneumocyte cell line exposed to cigarette smoke extract [54].
This study was limited by a relatively small sample size, the sensitivity of the proteomic technique, and challenges in the quantification of proteins. While age was a confounding variable in this study, the gene expression changes in the airway epithelium of never and current smokers were validated using age-matched samples from current and never smokers in a previously published gene-expression study [8], suggesting that the association between smoking-status and both gene and protein expression is unlikely to be due to differences in patient age. The amount of time elapsed between last smoking a cigarette and bronchoscopy was not recorded, and some of the variability of protein levels in Western blotting might relate to potential differences to the acute versus chronic effects of cigarette smoke. Although the small sample size limited the statistical analysis, Western blotting validated differences in protein detection identified by LC-MS/MS suggesting the method's potential specificity. However, the power of our study to detect additional proteomic changes that occurred in response to cigarette smoke exposure was limited. The sensitivity of this technology allowed detection of 859 proteins with a false positive rate of 1%. While this represents a small percentage of the total proteins present in epithelial cells, we have identified a greater number of proteins than previously used methods of sample collection and proteomic analysis for smokers and nonsmokers [20][21][22]. Because of the uncertainties associated with label-free quantification methods for the determination of protein expression levels, this platform serves mainly as a discovery tool. However, promising efforts in this area, including correlation of peak intensity or spectral counts with protein abundance, may soon remove this limitation [55][56][57][58].
In summary, we have described the proteomic profile of normal bronchial epithelial cells using 1D-PAGE coupled with LC-MS/ MS and linked this profile to smoking-induced transcriptional changes in these same cells. This approach has the potential to provide additional insight into host response to tobacco smoke and the pathogenesis of smoking-related lung disease.

Materials and Methods
Study population, sample collection, and ethics statement Never (n = 5) and current smokers (n = 5) were recruited for fiberoptic bronchoscopy at Boston Medical Center. Detailed medical and smoking histories were obtained including number of cigarettes smoked per day, cumulative tobacco exposure measured in pack-years, and an estimation of second-hand smoke exposure. Screening prior to bronchoscopy included an electrocardiogram, chest radiograph and spirometry. Participants with a history of underlying lung disease, significant second hand smoke exposure, an abnormal baseline EKG, or evidence of obstructive lung disease on spirometry (defined as an FEV 1 /FVC,0.7) were excluded from the study. This study was approved by the Institutional Review Board at Boston Medical Center, and all subjects provided written informed consent. , and E) P4HB subunit. The borders of each boxplot represent the interquartile range of z-score normalized natural logarithm of the MAS5 gene expression data from this cohort of 5 never smoker and 5 current smokers, and from a previously published cohort (AGED) of 23 never smokers and 34 current smokers, excluding one never smoker in common to this study. The solid line within each box represents the median gene expression, and the whiskers of the plot extend to the upper and lower extremes of the data for each gene. Bar plots represent the number of smoker and nonsmoker samples in the current study where the protein was detected. Proteomic analysis detected ALDH3B1, hypothetical protein DKFZP586A0522, PLUNC and uteroglobin in more never smokers, while P4HB was detected in more current smokers. There is concordance in the direction of change for smoking-related protein and gene expression changes for these 5 genes. * p,0.05. ** p,0.005. *** p,0.0005. doi:10.1371/journal.pone.0005043.g003 Table 4. Proteins detected in the airway by mass spectrometry that lack detectable transcript by microarray.

Proteomic Sample Processing and Mass Spectrometry
After cell lysis with 2% SDS, proteins were separated on a 4-20% polyacrylamide minigel by electrophoresis and stained with Coomassie Blue (Supporting Figure S1). Each gel lane was cut into 35-70 sections. Proteins were reduced with DTT, alkylated with iodoacetamide, and digested with trypsin using a DigestPro 96 robot (Intavis Bioanalytical Instruments, Cologne, Germany). Extracted peptides were dried and resuspended in 0.5% acetic acid in preparation for mass spectrometry.
All samples were analyzed by LC-MS/MS using an LTQ ProteomeX ion trap mass spectrometer (ThermoFinnigan, Waltham, MA). Peptides from each gel slice were serially injected onto a home-packed C18 reverse-phase column (Magic C18AQ, 15 cm6100 micron ID, Michrom Bioresources, Inc., Auburn, CA) interfaced directly to the mass spectrometer. Peptides were separated using short, biphasic, 20-minute gradients of 0-90% acetonitrile in the presence of 0.5% acetic acid. From each parent ion scan (MS scan), the ten most intense ions were selected for collision-induced dissociation, and the spectra of the peptide fragments were recorded (MS2 scan).

Protein Identification and Analysis
The data were analyzed using SEQUEST software [59]. Spectra were queried against the curated entries of the NCBI RefSeq database and Xcorr values adjusted for an empiric false positive identification rate of 1% for forward-sequence proteins as determined by the inclusion of reversed protein sequences [60]. Positive identification of a protein required observation of at least two matching peptides from the same or adjacent gel slices.

Western Blotting
Residual protein lysates from two never and five current smoker samples were quantified by 1D-PAGE and Coomassie blue staining (Supporting Figure S2). Of these samples, sufficient material was available for Western blotting of two never smoker samples and four current smoker samples. One current smoker sample was excluded due to lack of signal from the loading control, lamin A/C. Samples were incubated at 86uC in SDS-sample buffer and electrophoresed on a 4-20% SDS-PAGE gel. Proteins were transferred to nitrocellulose and stained with Ponceau Red. The membrane was blocked with 5% nonfat milk in TBS-Tween and incubated with the appropriate primary and secondary antibodies. Mouse anti-human prolyl 4-hydroxylase beta subunit was obtained from Chemicon (Temecula, CA). Mouse anti-human PLUNC and goat anti-mouse-HRP affinity purified antibodies were purchased from R&D Systems (Minneapolis, MN). Rabbit anti-uteroglobin was obtained from Abcam (Cambridge, MA). Lamin A/C, a nuclear matrix protein, was used as a loading control.

Microarray Sample Processing
Six to eight micrograms of RNA obtained from five of the never smoker and four of the current smoker participants was processed and hybridized to an Affymetrix HG-U133A GeneChip (Affymetrix Inc., Santa Clara, CA) containing ,22,215 probesets as previously described [8].

Microarray Data Acquisition and Preprocessing
Expression Console Version 1.0 (Affymetrix Inc.) was used to generate a MAS5 weighted-mean expression level for each transcript and a detection p-value (P detection ), which indicates the reliability of detection of that transcript above background on the array. The mean intensity for each array was scaled to 100. Each array included in the final analysis had at least 30% of the probesets detected above background (percent present .30%) and a 39 to 59 ratio of signal intensity for GAPDH of less than or equal to 5. One never smoker microarray was excluded based on these quality control filters (low percent present, high 39 to 59 GAPDH ratio), leaving four never and four current smoker arrays for analysis.
Sample contamination with significant numbers of nonepithelial cells was evaluated, as described previously [8], by analyzing arrays for the presence of transcripts known to be present in airway epithelium and by confirming the absence of transcripts specific to non-epithelial cell types. No arrays were excluded based on these criteria.

Comparison of Protein Detection and mRNA Expression
For each protein, we queried the microarray data from the same patient for expression (P detection ,0.05) of a matching transcript. The significance of the overlap between detected proteins and transcripts was determined using Pearson's Chi-squared test with Yates' continuity correction.
A comparison of protein detection and transcript expression level was also performed for individual proteins of interest using the microarray data generated in this study and a previously published cohort of 23 never smokers and 34 current smokers [8], excluding one never smoker in common to this cohort. The transcript expression data for these samples was obtained from http://pulm.bumc.bu.edu/aged and log normalized. The association between smoking status and gene expression was determined as previously described [8].

Functional Enrichment Analysis
Functional enrichment analysis was performed using DAVID (http://david.abcc.ncifcrf.gov/) [61]. A modified Fisher exact test (P DAVID ) was calculated for all analyses, and the Benjamini-Hochberg method was used to correct for false discovery (P DAVID-BH ).
To determine the molecular functions that were over-represented within the never smoker proteome, the Gene Ontology (GO) molecular functions of the U133A probes corresponding to the proteins detected in the majority of never smokers were compared to the GO molecular functions of all probe sets on the U133A array. A similar analysis was also performed for the never smoker transcriptome. Genes expressed at P detection ,0.05 in all never smokers with good quality microarrays were compared to a background of all genes represented by probe sets on the U133A microarray. A parallel analysis was performed in DAVID using the genes expressed at P detection ,0.05 in the 22 unique never smokers from a previously published data set [8]. Over-represented gene ontology categories for proteins changed by smoking and for proteins that were not detectably expressed by microarray were determined by comparing the corresponding RefSeq identifications numbers for these proteins against the complete set of 859 proteins detected by mass spectrometry in this set of experiments.

Supplemental Information
Additional information, including clinical data for all of the study participants, the complete list of proteins detected in each sample, percent peptide coverage for each protein and the expression levels for all genes in all samples are stored in a relational MYSQL database that is available at http://pulm. bumc.bu.edu/parce/parce.html. Microarray data from this study has been deposited in the National Center for Biotechnology Information Gene Expression Omnibus (GSE4635). Proteomic data has been deposited at Proteome Commons (http://www. proteomecommons.org/). Figure S1 1D-PAGE of a current smoker sample prior to mass spectrometry. Proteins from each sample were separated by 1D-PAGE prior to mass spectrometry. A representative sample is shown. MW indicates the molecular weight marker. BSA indicates a bovine serum albumin standard. CS indicates current smoker. Found at: doi:10.1371/journal.pone.0005043.s001 (2.28 MB TIF) Figure S2 1D-PAGE for approximation of protein yield prior to Western Blot. A small amount of material from each sample was retained for Western blotting. To roughly normalize the protein contribution from each sample, a small amount of material from the remaining samples were analyzed on 1D-PAGE and stained with Coomassie blue. MW indicates a molecular weight standard. NS indicates never smokers, and CS indicates current smokers. Found at: doi:10.1371/journal.pone.0005043.s002 (2.04 MB TIF)