NOTCH1, HIF1A and Other Cancer-Related Proteins in Lung Tissue from Uranium Miners—Variation by Occupational Exposure and Subtype of Lung Cancer

Background Radon and arsenic are established pulmonary carcinogens. We investigated the association of cumulative exposure to these carcinogens with NOTCH1, HIF1A and other cancer-specific proteins in lung tissue from uranium miners. Methodology/Principal Findings Paraffin-embedded tissue of 147 miners was randomly selected from an autopsy repository by type of lung tissue, comprising adenocarcinoma (AdCa), squamous cell carcinoma (SqCC), small cell lung cancer (SCLC), and cancer-free tissue. Within each stratum, we additionally stratified by low or high level of exposure to radon or arsenic. Lifetime exposure to radon and arsenic was estimated using a quantitative job-exposure matrix developed for uranium mining. For 22 cancer-related proteins, immunohistochemical scores were calculated from the intensity and percentage of stained cells. We explored the associations of these scores with cumulative exposure to radon and arsenic with Spearman rank correlation coefficients (rs). Occupational exposure was associated with an up-regulation of NOTCH1 (radon rs = 0.18, 95% CI 0.02–0.33; arsenic: rs = 0.23, 95% CI 0.07–0.38). Moreover, we investigated whether these cancer-related proteins can classify lung cancer using supervised and unsupervised classification. MUC1 classified lung cancer from cancer-free tissue with a failure rate of 2.1%. A two-protein signature discriminated SCLC (HIF1A low), AdCa (NKX2-1 high), and SqCC (NKX2-1 low) with a failure rate of 8.4%. Conclusions/Significance These results suggest that the radiation-sensitive protein NOTCH1 can be up-regulated in lung tissue from uranium miners by level of exposure to pulmonary carcinogens. We evaluated a three-protein signature consisting of a physiological protein (MUC1), a cancer-specific protein (HIF1A), and a lineage-specific protein (NKX2-1) that could discriminate lung cancer and its major subtypes with a low failure rate.


Introduction
In East Germany, extensive uranium mining was undertaken for the Soviet nuclear industry from 1946 until 1990 [1]. Poor working conditions in the so-called WISMUT mining company led to very high levels of exposure to ionizing radiation [2]. Exposure to arsenic occurred in some mines depending on the metal content of the ore.
A comprehensive job-exposure matrix (JEM) was developed for the quantitative assessment of exposure to radon, arsenic, and quartz dust based on extensive measurements [3]. The largest single cohort of uranium miners was established showing a dose-dependent excess risk of lung cancer by radon exposure [4,5].
Biological research on radiation-induced carcinogenesis has been focussed on the damage of the genome. So far, available results do not consistently suggest a radon-specific mutation of TP53 [6]. However, little is known about other genes and whether exposure to radiation can be related to cancer-specific proteins: Thyroid cancers from the Chernobyl tissue repository have been examined in order to detect radiation-specific protein signatures [7,8], and radiation has been associated with NOTCH1 mutations in the development of lymphomas [9].
It could be hypothesized that radiation acts on genes that are prone to instability and activated in cancer-associated pathways like NOTCH1. We took advantage of a unique tissue repository of WISMUT miners that had been opened for research after German reunification [10] to explore protein patterns in lung tissue. The statistical analysis of these data revealed a shift towards small cell lung cancer (SCLC) and squamous cell carcinoma (SqCC) at the expense of adenocarcinoma (AdCa) with increasing exposure to radon or arsenic [11,12]. An even stronger shift by level of exposure was observed for smoking in a large pooled analysis of lung cancer studies [13]. Here, we continue our research on the observed exposure-related shifts in the distribution of subtypes of lung cancer by exploring protein patterns. SCLC and SqCC appear to exhibit a higher 'stemness' than AC following a larger damage of the lung architecture. We therefore hypothesize that candidate proteins associated with stemness are more frequently expressed in SCLC and SqCC than in AdCa or cancerfree lung tissue. In particular, we investigate 1) whether we can detect an association of occupational exposure to radon or arsenic with the expression of candidate proteins, 2) if we can discriminate the subtypes of lung cancer with these proteins, and 3) if influences of exposure add to the discrimination of the major subtypes of lung cancer.

Study Design
A quantitative job-exposure matrix was applied to the occupational data of uranium miners with lung tissue in the WISMUT autopsy repository. Cumulative radon exposure was given in working level months (WLM) as previously described [12]. We stratified by .1000 WLM for high and by ,500 WLM for low radon exposure. Cumulative airborne arsenic exposure was assessed as described earlier [11] and classified as high by .100 mg/m 3 years and as low by ,50 mg/m 3 years. The cut-offs were based on the distribution of the exposure variables in miners with archived lung tissue samples and on the availability of tissue samples for rare combinations, like low radon and high arsenic exposure.
We searched the WISMUT autopsy repository for available lung tissue samples of uranium miners with a marked contrast in exposure to radon and arsenic. Ten male miners each were randomly selected from the database of the repository within 16 strata of an orthogonal study design. Stratification was performed by combinations of low or high exposure to radon and arsenic within four groups of 40 miners each, either cancer-free or with AdCa, SqCC, or SCLC. Information on silicosis, lung cancer, and occupational exposure were extracted from the database of the archive [14]. Smoking status could be classified as ever or never for 119 miners from employment records and medical documents of the WISMUT archives. This research, including ethical issues, with historical tissue samples from deceased uranium miners was authorized by the German government in a direct treaty with our institutes signed July 29, 2003. The study was based on anonymous tissue samples and conducted according to the principles expressed in the Declaration of Helsinki.
Three German pathologists re-classified lung cancer of archived tissue according to the WHO classification [15,16]. We retrieved samples where at least two of the pathologists were in agreement and excluded mixed forms for improvement of classification. Samples were available for 146 out of 160 subjects randomly ascertained from the database. An additional set of 15 samples was selected for validation. A blind reading of the newly generated slides (by I.S. and D.W.) confirmed the former histological classifications.

Immunohistochemical Analysis
Due to the prevalence of tuberculosis in miners tissue specimens underwent long-term fixation in formalin. The degradation of RNA by formalin prevented a search for RNA expression signatures by screening techniques like DNA microarrays. However, initial experiments indicated that immunohistochemistry was still applicable for many samples [17]. For that reason, we selected 30 proteins from the literature as possibly associated with lung cancer (e.g., EGFR, NKX2-1), lung development and lineage determination (e.g., NOTCH1), lung physiology (e.g., SFTPC, MUC1), tissue remodeling following exposure to radon (e.g., MMP2), or the preference of arsenic for cytokeratins (KRT5, KRT14). Immunohistochemical assays could be established for 22 proteins (supplemental Table S1). For AKT1, ATM, CDKN2A, ERCC2, ILK, NFKB1, PTEN, and WIF staining could not be established with the archived material. Tissue microarrays (TMAs) could not be employed due to the unusual mechanical properties of the WISMUT paraffin material, which was very brittle and not suitable for the punching tubes used with the TMA machine. Therefore, sections of 4 mm were cut from individual formalinfixed paraffin-embedded samples and mounted on aminopropyltriethoxysilane slides. All slides were deparaffinized and rehydrated in graded alcohols (100%, 96%, and 70%). Antigen retrieval for immunostaining was performed by heating the samples in citrate buffer (pH 6.0). Endogenous peroxidase activity was inhibited, non-specific binding was blocked. The sections were incubated with primary antibodies for 10 minutes to 12 hours overnight at room temperature. Incubation time and dilution were antibodyspecific. Dilutions are listed in Table S1. For example, we diluted 1:50 for monoclonal antibodies of HIF1A (Thermo Scientific, Frankfurt, Germany) and MUC1 (Zytomed, Berlin, Germany), and 1:1000 for NOTCH1 (Zytomed). The sections were then incubated with the biotinylated secondary antibody for 10-60 minutes. A final incubation with streptavidin-peroxidase was performed at room temperature for 5-10 minutes. Visualization of the antibodies was achieved with 3,39-diaminobenzidine or 3amino-9-ethylcarbazole for 5-10 minutes. Counterstaining was done with Mayers hematoxylin (DAKO, Glostrup, Denmark). Negative controls were performed by leaving out the primary antibody. Staining intensity and proportion of stained epithelial cells in fibrosis-free regions were blindly and independently evaluated by two pathologists (I.S., D.W.). The percentage of cells in intensity groups (none, weak, moderate, or strong) was weighted with factors 0, 1, 2, or 3, respectively, and cumulated in a score for each slide, separately for the membrane, cytoplasm, or nucleus. Supplemental Figure S1 shows the staining of HIF1A and MUC1 by subtype of lung cancer and Figure S2 depicts the staining of NOTCH1 in cancer tissue (SqCC) from a miner with high exposure to radon and arsenic.

Statistical Analysis
The sample size was limited by the availability of tissue blocks with sufficient contrast in exposure to radon and arsenic. Spearman rank correlation coefficients (r s ) were calculated with 95% confidence intervals (CI) for exploring associations between the staining scores and with exposure or age. Classification methods were applied to the score set to evaluate subtype of lung cancer or level of exposure using R [URL: www.R-project.org], SAS/STAT and SAS/IML software, version 9.2 (SAS Institute Inc., Cary, NC), and TreeView, version 1.60 [18]. Due to skewness, parametric methods were applied to the log-transformed data as ln(score+1). Their correlation structure was further explored with the SAS procedure FACTOR. Hierarchical clustering was performed using Pearson correlation and average linkage in TreeView and with the SAS procedure CLUSTER using Euclidean distance measures and average linkage. Protein profiles were explored with the Classification and Regression Tree Algorithm (CART) as implemented in the rpart library of R (according to Therneau and Atkinson) using a leave-one-out crossvalidation. Table 1 presents the characteristics of the 146 uranium miners ascertained for the 16 study groups. Age at death ranged from 48 to 87 years. The majority of miners were smokers (95% in lung cancer cases, 88% in cancer-free miners). Silicosis was prevalent in 37% of the miners with and in 64% of the miners without lung cancer and associated with a higher quartzdust exposure (median 20.9 vs. 13.4 mg/m 3 years, p,0.0001). Cumulative exposure to quartz dust correlated more strongly with exposure to radon (Spearman correlation coefficient r s = 0.78, 95% CI 0.71-0.84) than with exposure to arsenic (r s = 0.31, 95% CI 0. 16-0.45).

Associations between Marker Scores
The expression pattern of cancer-related proteins showed strong associations between the 22 markers. These associations remained relatively stable when further stratifying by exposure (supplemental Table S2). Three factors were extracted from the correlation matrix that could be attributed to HIF1A, MUC1, and NKX2-1, respectively (data not shown). Table 2 depicts the Spearman correlation coefficients of the scores of HIF1A, MUC1, and NKX2-1 with the other markers in all tissue samples. In addition, we present the correlations with NOTCH1 as a candidate for exposure-related effects. Similar associations were found if restricted to cancer tissue (data not shown). HIF1A expression was associated with NOTCH1 (r s 0.73, 95% CI 0.65-0.80), ERBB2 (r s 0.58, 95% CI 0.46-0.68) and other proteins except MUC1 and NKX2-1. MUC1 and NKX2-1 were negatively correlated with cancer markers like TP53, VEGFA, or KIT. NOTCH1 and ERBB2 correlated inversely with NKX2-1 (r s 20.31, 95% CI20.45 -20.15 and 20.30, 95% CI20.44 -20.14) but did not show an association with MUC1.

Protein Expression by Exposure to Radon and Arsenic
Supplemental Table S3 depicts the distribution of positively stained samples by level of exposure to radon and arsenic. In lung tissue from uranium miners with high exposure to both carcinogens, up-regulation of cancer-related proteins was more common than in miners with low exposure. Seven out of 18 cytoplasmic proteins were more frequently stained ($15%) in the high-exposed tissue samples, among these ERBB2 and NOTCH1. None of the samples from low-exposed miners had a similar fraction ($15%) of stained samples in excess to the high-exposure group. Table 3 (selected proteins) and supplemental Table S4 (all proteins) show the correlation of the staining scores with cumulative exposure to radon (assessed as WLM) and arsenic in all samples and stratified by lung cancer. Up-regulation of cancer-related proteins was frequently observed with increasing exposure to radon in lung-cancer tissue, but we detected no significant down-regulation. Radon exposure correlated further with TP53 in cancer-free tissue (r s 0.40, 95% CI 0.09-0.63). The effects of exposure to arsenic on the staining scores were less clear. ERBB2 and NOTCH1 showed an up-regulation in lung-cancer tissue with increased exposure to both carcinogens, but no staining or a lacking association with exposure in cancerfree samples. Table 4 shows the distribution of positively stained samples by major subtype of lung cancer and in cancer-free tissue. Most slides from cancer-free tissue lacked expression of CCND1, CD44, CDH1, EGFR, ERBB2, KIT, keratins, NOTCH1, PAK1, PTGS2, SNAI1, and VIM but showed staining of MUC1, HIF1A, NKX2-1, SFTPC, and STAT3, which were also found in AdCa. Membrane staining of MUC1, SFTPC in cytoplasm, and many other markers were frequently lacking in SCLC. AdCa and SqCC shared signatures, including ERBB2, KIT, MMP2, PTGS2, EGFR, and VEGFA. KRT5 and KRT14 were more frequently expressed in SqCC than in AdCa, whereas NKX2-1 was lacking. Figure 1 depicts the CART classification of lung cancer in all tissue samples and Figure 2 shows the clustering of cancer-tissue samples by major histological subtype. Staining of MUC1 (membrane), HIF1A (cytoplasm), and NKX2-1 (nucleus) classified cancer by subtype and cancer-free lung tissue with a failure rate of 11.0%. Higher MUC1 expression in the membrane discriminated normal tissue from cancer (failure rate 2.1%). HIF1A was expressed in NSCLC but less in SCLC, whereas NKX2-1 was lacking in SqCC. HIF1A and NKX2-1 classified the major subtypes of lung cancer with a failure rate of 8.4%. EGFR and VEGFA classified NSCLC vs. SCLC with a failure rate of 10.3% and AdCa vs. SqCC with 19.1%. We confirmed the 2-protein signature for the discrimination of the major subtypes of lung cancer with 15 additional samples retrieved from the archive. All five SCLC cases showed a very weak or lacking staining of HIF1A, and all five AdCa cases were presented with high NKX2-1 scores that were zero in SqCC or low in SCLC.

Classification of Protein Patterns by Subtype of Lung Cancer
Exposure to radon or arsenic could not be clearly detected in the 22-protein signatures, with a failure rate of 62.6% in cancer tissue and 57.5% in all samples (data not shown). Failure rates between 37% and 39% were found for smoking and silicosis, respectively. It is important to note that most miners were smokers, and staining was investigated in fibrosis-free parts of the tissue samples.

Discussion
Lung cancer was a common occupational disease in German uranium miners with an excess risk of radon exposure [4]. Increasing exposure to radon or arsenic was associated with a shift towards SCLC or SqCC at the expense of AdCa [11,12]. This raised the question whether exposure-related changes can be detected in expression patterns in the lungs of miners. We observed an up-regulation of cancer-related proteins, including NOTCH1, with increasing cumulative exposure to radon or arsenic, but could not detect an additional effect of exposure on the very distinct patterns of proteins of the major subtypes of lung cancer. A tight correlation structure between the staining scores revealed three proteins that served as good diagnostic classifiers. MUC1 discriminated cancer-free tissue from lung cancer. HIF1A and NKX2-1 discriminated the major subtypes with a low failure rate.
The correlation of NOTCH1 expression with exposure to radon is in line with experimental results. NOTCH1, a large gene comprising 37 exons, is prone to radiation-induced mutations that can contribute to T-cell lymphomagenesis [9,19]. Up-regulation of NOTCH was also observed after irradiation of embryonic kidney cells [20], and down-regulation rendered glioma stem cells more sensitive to radiation [21]. Less is known about NOTCH1 in lung cancer cases with radiation exposure. In our study, NOTCH1 was constitutively active in most NSCLC samples but less in SCLC and lacking in cancer-free tissue.
NOTCH1 was also upregulated in lung cancer samples from uranium miners with high exposure to arsenic. This may be partially due to a moderate correlation between arsenic and radon. Keratins have emerged as a relevant target of arsenic [22]. Whereas cytokeratins contributed to the specific staining of SqCC they did not show an additional preference for arsenic exposure.
NOTCH1 correlated with the expression of various other cancer-related proteins like HIF1A but not with MUC1 as a marker of normal lung physiology. Mucus production is a primary particle-defense mechanism of the airways [23]. Downregulation of membrane-bound MUC1 discriminated cancer-free tissue from cancer tissue. This supports the view that changes in mucin biosynthesis can serve as a tumor marker [24]. Treatment decision of lung cancer usually rests on the categorization into NSCLC and SCLC. In our study, the commonly applied markers VEGFA and EGFR separated both lineages but were of limited value to further discriminate between AdCa and SqCC. Both entities have shown different responses to therapy [25]. Our results revealed HIF1A together with NKX2-1 as a two-protein panel that discriminated all major subtypes with a low failure rate. HIF1A expression was lower in SCLC, and NKX2-1 staining was lacking in SqCC. We could confirm that the protein patterns of SqCC and SCLC indicate a greater 'stemness' whereas the molecular signatures of AdCa represent more differentiated stages [26,27]. AdCa is a peripheral tumor and continues expressing proteins typical of the lung physiology such as mucins, surfactant proteins, or NKX2-1.
Hypoxia is a basic feature of cancer where HIF1A has been identified as a key regulator of energy metabolism and other oncogenic pathways [28,29]. Many target genes have been identified, including VEGF, VIM, and KRT14 [30]. In our study, HIF1A, similarly to NOTCH1, correlated with most of the other tumor markers but less with MUC1 or NKX2-1 as markers of lung physiology. A higher score of cytoplasmic staining classified NSCLC from SCLC samples. HIF1A mRNA has been observed to be up-regulated in NSCLC [31] and suggested as a prognostic classifier [32,33]. HIF1A might also constitute a therapeutic target [34,35].
NKX2-1 has been found frequently amplified and overexpressed in AdCa [36] and is an established marker of lung-cancer lineage used to distinguish AdCa from the more centrally located SqCC. We confirmed its expression in AdCa whereas staining was lacking in SqCC. NKX2-1 is essential for the formation of alveolar type 2 (AT2) pneumocytes [37]. Both AT2 cells and AdCa are located in distant parts of the lung, where mucins keep the epithelial layer Table 3. Spearman correlation coefficients of selected marker scores with cumulative exposure to radon and arsenic in lung tissue from 146 uranium miners.  Table 4. Proportion of lung tissue samples with positive staining of candidate proteins in lung cancer tissue and in cancer-free samples from uranium miners. hydrated and act together with surfactants as a filtration barrier [38].
Various methodological shortcomings have to be taken into account when studying lung cancer. The classification of subtypes is prone to observer bias [39]. Here, lung tissue was available from  autopsies and subject to reference pathology. Another issue concerns misclassification of exposure [40]. Enormous efforts have been undertaken to assess occupational exposure to radon and arsenic in uranium mining [2,3]. Exposure to radon and arsenic can result in a synergistic action. Accordingly, more samples were positively stained in the group with high exposure to both carcinogens than in the low-exposed group.
In this particular context of heavy occupational exposure, confounding by smoking was estimated to be of minor concern [5]. There was no strong variation of smoking prevalence by level of exposure. No obvious effect of smoking was found in miRNA patterns in a large set of AdCa samples, where also a good molecular classification of AdCa and SqCC could be achieved [41].
Similarly, our markers were also good classifiers of the subtypes, but we could not identify an additional effect of exposure on the subtype-specific patterns. Although we were able to detect a moderate association between exposure and NOTCH1 and other proteins, the strong differences in expression by subtype might hinder the detection of weaker influences. This raised the question whether our study design was powerful enough to detect such modification in expression levels. A first investigation with cDNA microarrays in thyroid tumors, including samples from the Chernobyl Tissue Bank, revealed no radiation-specific signature [42]. A subsequent analysis allowed the identification of a subtle gene expression signature in a subgroup of Chernobyl cases, which were susceptible to radiation-induced cancer [8]. We had chosen an orthogonal study design with contrast in exposure. Although the tissue bank is rather comprehensive, the tissue blocks were limited for rare combinations like low radon and high arsenic. Furthermore, an extensive stratification results in smaller subgroups that are prone to variation by chance. We therefore paid attention to consistent trends in the results.
Another concern is the question if the methodology used was suitable and the resulting protein set was sufficiently complete and sensitive. Modern mass spectrometry-based proteomics has made great progress in its application to archival material [43,44]. Instead of using a method for global protein analysis, however, we have chosen a hypothesis-driven approach that was based on immunohistochemistry. Antigen retrieval with archival material is well established and, despite long fixation times in unbuffered formalin, 22 out of 30 antibodies could be successfully applied. Initially, we searched the literature for candidate proteins being employed in lung cancer, development, or physiology. The observed strong associations between the scores of the 22 proteins support the candidate protein approach. The tight correlation structure could be represented by HIF1A, MUC1, and NKX2-1. This dimensionality reduction might be due to the contribution of various key proteins to fundamental pathways. Although we cannot exclude missing important proteins in the set chosen for this investigation, for example from DNA repair pathways, the correlation structure of the candidate protein supports the view that such an analysis can similarly be conducted with proxy proteins. If exposure influenced a candidate protein, such an effect could be transported along the pathways or across networks of key proteins.
Apart from methodological limitations, biological explanations should also be discussed as to why exposure might cause a shift between the subtypes. Many of the markers are employed in developmental pathways of lung morphogenesis that are recapitulated in tissue regeneration and cancer [45][46][47]. These programs result in lineage-specific expression patterns that can be well classified by these proteins. It is current opinion that at least two major lineages give rise to SCLC and NSCLC [48,49]. SCLC is a more aggressive tumor with neuroendocrine features [50]. AdCa is the common subtype in never smokers and more differentiated than SqCC within the NSCLC lineage [26,27]. Repair of heavy tissue damage usually needs reconstruction of a more complex tissue architecture and implies recruitment of cells with higher stemness [51]. This may explain the observed shift in the distribution of the subtypes of lung cancer by level of exposure to carcinogens. Exposure to radon or arsenic may cause the transformation of a stem cell to a cancer stem cell. However, tightly controlled programs activated in the process of carcinogenesis might obscure the transfer of exposure-related damage in a precursor cell along the lineage to a specific cancer phenotype.
In conclusion, NOTCH1, a prominent candidate for radiationrelated effects, and other cancer-related proteins were associated weakly to moderately with exposure to radon and arsenic. MUC1, a physiological marker, HIF1A, a regulator of metabolic reprogramming, and NKX2-1, a lineage-specific marker performed well as classifiers of lung cancer and its major subtypes.