Differential Gene Expression between African American and European American Colorectal Cancer Patients

The incidence and mortality of colorectal cancer (CRC) is higher in African Americans (AAs) than other ethnic groups in the U. S., but reasons for the disparities are unknown. We performed gene expression profiling of sporadic CRCs from AAs vs. European Americans (EAs) to assess the contribution to CRC disparities. We evaluated the gene expression of 43 AA and 43 EA CRC tumors matched by stage and 40 matching normal colorectal tissues using the Agilent human whole genome 4x44K cDNA arrays. Gene and pathway analyses were performed using Significance Analysis of Microarrays (SAM), Ten-fold cross validation, and Ingenuity Pathway Analysis (IPA). SAM revealed that 95 genes were differentially expressed between AA and EA patients at a false discovery rate of ≤5%. Using IPA we determined that most prominent disease and pathway associations of differentially expressed genes were related to inflammation and immune response. Ten-fold cross validation demonstrated that following 10 genes can predict ethnicity with an accuracy of 94%: CRYBB2, PSPH, ADAL, VSIG10L, C17orf81, ANKRD36B, ZNF835, ARHGAP6, TRNT1 and WDR8. Expression of these 10 genes was validated by qRT-PCR in an independent test set of 28 patients (10 AA, 18 EA). Our results are the first to implicate differential gene expression in CRC racial disparities and indicate prominent difference in CRC inflammation between AA and EA patients. Differences in susceptibility to inflammation support the existence of distinct tumor microenvironments in these two patient populations.


Introduction
Colorectal cancer (CRC) remains the most common gastrointestinal cancer in the United States, despite recent improvements in the diagnosis and treatment of the disease. The incidence and mortality rates of CRC for African Americans (AAs) are higher than in the U.S. general population [1,2]. Many epidemiologic and genetic investigations have focused on AAs [3,4,5,6] with the goal of deciphering the reasons for such disparities. Whereas one cannot discount the contribution of socioeconomic factors, such as a more advanced stage of disease at diagnosis in AAs, other biological factors also contribute to the progression of colon cancer [4]; [7]. However, a biological basis for the existence of a more aggressive CRC in African American patients remains to be further elucidated. Genomic instability is a crucial feature in tumor development and there are at least 3 distinct pathways in CRC pathogenesis: chromosomal instability (CIN), microsatellite instability (MSI), and CpG island methylator phenotype pathways (CIMP) [8,9]. Any or all of these pathways may contribute to a more aggressive CRC biology in African Americans. Recent genome-wide association studies in CRC have shown not only strong evidence for common single nucleotide polymorphism (SNP) association in a number of genes and chromosomal regions, but also genetic heterogeneity in CRC association in AAs versus EAs [4,10,11,12,13]. Different incidence of MSI and different level of methylation for functionally very relevant genes were also reported as a possible factors in CRC racial disparities [8,14,15].
We hypothesized that the gene expression profiles of CRC in African-American and European-American patients may reveal biological differences between the two populations that could explain the more aggressive cancer phenotype in African-Americans. Thus, we performed genome-wide gene expression profiling in a large set of tumor samples that were matched for selected clinical variables. We analyzed our results on gene and pathway levels to identify key differences in tumor biology between African-American and European-American patients.

Patients
One hundred and fourteen tumors (86 included in original analysis and 28 for validation study) and 40 normal tissues from deidentified CRC patients were obtained from the Institutional Research Board (IRB) approved University of North Carolina (UNC) Tissue Procurement Facility after UNC School of Medicine IRB approval for this study. Written informed consent was obtained from all patients. All samples were collected between 1999 and 2008 at the time of operation and snap frozen in liquid nitrogen. Patients with known familial adenomatous polyposis and hereditary nonpolyposis CRC were excluded. De-identified data including race, tumor, node and metastasis (TNM), grade or differentiation, margin status, and survival were available for the majority of patients.

RNA Isolation and Microarray Hybridization
All RNA isolation and hybridization was performed on Agilent (Agilent Technologies, Santa Clara, CA) human whole genome 4X44 K DNA microarrays at UNC. RNA was extracted from macrodissected snap-frozen tumor samples using All prep Kits (Qiagen, Valencia, CA) and quantified using Nanodrop spectrophotometry (ThermoScientific, Wilmington, DE). RNA quality was assessed with the use of the Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA). RNA was selected for hybridization using RNA integrity number and by inspection of the 18S and 28S ribosomal RNA. Similar RNA quality was selected across samples. One microgram of RNA was used as a template for cDNA preparation prior to hybridization to Agilent 4X44 K whole human genome arrays. cDNA was labeled with Cy5-dUTP and a reference control (Stratagene; Catalog Number # 740000; Agilent Technologies, Santa Clara, CA; [16] was labeled with Cy3-dUTP using the Agilent low RNA input linear amplification kit and hybridized overnight at 65uC to Agilent 4X44 K whole human genome arrays. Arrays were washed and scanned using an Agilent scanner (Agilent Technologies, Santa Clara, CA).
All microarray data are in MIAME compliant form and raw and processed data has been deposited in the Gene Expression Omnibus (GEO); see http://www.ncbi.nlm.nih.gov/geo/, accession number: GSE28000.

Microarray and statistical analysis
All array data were normalized using LOWESS normalization. Data were excluded for genes with poor spot quality or genes that did not have a mean intensity greater than 10 for one of the two channels (green and red) in at least 70% of the experiments. The log2 ratio of the mean red intensity over mean green intensity was calculated for each gene followed by LOWESS normalization [17]. Missing data were imputed using the k-nearest neighbors imputation (KNN) with k = 10 [18]. Genes that were significantly up-or down-regulated were identified using Significance Analysis of Microarrays (SAM) [19]. SAM assigns a score to each gene on the basis of a change in gene expression relative to the standard deviation of repeated measurements. For genes with scores greater than an adjustable threshold, SAM uses permutations of the repeated measurements to estimate the percentage of genes identified by chance -the false discovery rate (FDR). Analysis parameters (Delta) were set to result in FDR#5%.

Network and gene ontology analysis
Differentially expressed genes were investigated for network and gene functional interrelation by Ingenuity Pathways Analysis (IPA) software (Ingenuity Systems, www.ingenuity.com; [20]. IPA scans the set of input genes to identify networks by using Ingenuity Pathways Knowledge Base for interactions between identified 'Focus Genes', in this study, the differentially expressed genes between AA and EA and known and hypothetical interacting genes stored in the knowledge base in IPA software was used to generate a set of networks with a maximum network size of 35 genes/proteins. Networks are displayed graphically as genes/gene products ('nodes') and the biological relationships between the nodes ('edges'). All edges are from canonical information stored in the Ingenuity Pathways Knowledge Base. In addition, IPA computes a score for each network according to the fit of the user's set of significant genes. The score indicates the likelihood of the Focus Genes in a network from Ingenuity's knowledge base being found together due to random chance. A score of 3, as the cutoff for identifying gene networks, indicates that there is only a 1/1000 chance that the focus genes shown in a network are due to random chance. Therefore, a score of 3 or higher indicates a 99.9% confidence level to exclude random chance.

Ten-fold Cross Validation (Ten-f-CV)
Ten-f-CV analysis [21] was used to select smaller representative set of genes for validation study by qRT-PCR. Using Ten-f-CV analysis we identified 10 genes that can predict the ethnicity of the patient for whom the array was done with an error rate of 6%, suggesting that these 10 genes are representative of the entire gene list.

Quantitative real-time PCR
Validation of microarray results was performed on 28 CRC patients (10 AA, and 18 EA). Ten differentially expressed genes (identified by SAM and selected using the Ten-f-CV method) were validated by qRT-PCR. The hydroxymethylbilane synthase (HMBS) gene served as the housekeeping gene [22]. qRT-PCR was performed in duplicates using SYBR Green Gene Expression Assays (Applied Biosystems, Forester City, CA), which include preoptimized primer sets specific for the genes being validated [23]. The validated genes were: Crystalline, beta B2 (CRYBB2), phosphoserine phosphatase homologue (PSPH), Adenosine deaminase-like (ADAL), V-set and immunoglobulin domain containing 10 like (VSIG10L), Chromosome 17 open reading frame 81 (C17orf81), Ankyrin repeat domain 36B (ANKRD36B). Zinc finger protein 83 (ZNF83), Rho GTPase activating protein 6 (ARHGAP6), WD repeat domain 8 (WDR8), TRNA nucleotidyl transferase, CCA-adding, 1 (TRNT1), and HMBS. Data were collected using the ABI PRISM 7500 sequence detection system (Applied Biosystems, Forster City, CA). qRT-PCR data for each sample were normalized using expression of the housekeeping gene HMBS. Graphs were prepared from normalized data relative to HMBS. Statistical analysis of these data was performed with a    two-sided t-test or with a two-sided Wilcoxon rank-sum test if the expression data did not follow normal distribution.

Identification of differentially expressed genes between AA and EA CRC patients
Patient population characteristics for 43 AA and 43 EA patients were matched by TNM staging ( Table 1). The two populations were similar in age, gender and tumor localization. Forty non-tumor colon tissues (13 AAs and 27 EAs) were used for genetic comparisons of normal colon gene expression between AAs and EAs.
The comparison of gene expression profiles from AA and EA tumors using SAM revealed 95 gene transcripts to be differentially expressed between the two groups at FDRs of #5%. Fifty-eight genes were up regulated ( Table 2) and 37 down regulated ( Table 3) in tumor of AAs. We used Ingenuity Pathway Analysis to assess disease and pathway associations of these 95 genes that were differentially expressed in CRC tumors by race. The disease association analysis revealed associations of differentially expressed genes with genetic pathways that are linked to inflammatory response, hepatic system disease, developmental disorders, genetic disorders and neurologic disease ( Table S1). The six top associated pathways for differentially expressed genes are shown in Fig. 1. Three of these six pathways are related to inflammatory and immune response. Differentially expressed genes in the five highest scoring networks are shown in Table 4. Top associated network functions for differently expressed genes were: 1) organismal injury and abnormalities, gene expression, cellular development 2) lipid metabolism, small molecule biochemistry, molecular transport 3) cellular assembly and organization, organ development, carbohydrate metabolism 4) antigen presentation and inflammatory response, cellular movement 5) behavior, digestive system development and function, endocrine system development and function. One of these networks (network 4; antigen presentation and inflammatory response) is graphically represented in Fig. 2. Seven of the nine genes in this network were up regulated in AA patients (HLA-DQB1, IL33, PAK2, PROKR1, SAA2, TLR4, ZNF234), and two genes were down regulated (DHX58, IL27).
We also performed SAM analysis using non-tumor colon tissues from AA and EA patients and did not see differential gene expression (data not shown), suggesting that the changes we identified are tumor microenvironment specific.

Validation of microarray results
In order to select a representative group of genes for qRT-PCR validation of differentially expressed genes between AA and EA CRC patients, we performed a 10-fold cross validation analysis that resulted in the selection of following ten genes: CRYBB2, PSPH, ADAL, VSIG10L, C17orf81, ANKRD36B, ZNF835, ARHGAP6, TRNT1 and WDR8.
Expression of these ten genes was validated by qRT-PCR on an independent test set of 28 CRC patients (10 AA and 18 EA). The qRT-PCR results are shown in Fig. 3. Two of the 10 differentially

Discussion
The causes of the CRC disparity that exists between AA and EA patients remain to be fully elucidated. Although most of the research on this disparity has focused on socioeconomic factors, recent findings strongly support the role of genetic and biological factors. Genetic differences between AA and EA CRC patients were reported for SNP association, for incidence of MSI and level of gene methylation [4,14,15]. Any of these differences can result in differential gene expression between AA and EA CRC patients.
In this study we analyzed the gene expression profiles of 86 tumors from 43 AA and 43 EA patients. Significant differences in the expression of genes related to immune response and inflammation within the tumor micro-environment were identified between these two groups. This interpretation was supported by both disease association and pathway analyses. Most of the immunerelated genes had higher expression in tumors from African-American patients than in those from European-American patients. Although preliminary, these findings are novel and could have implications for cancer therapy. From the present study, we do not know why CRC from African-American patients would have a different immunologic profile than tumors from European-American patients. We hypothesize that the causes of these differences are multifactorial. Chronic inflammation is thought to be a causative factor in colorectal carcinogenesis [24,25]. It was shown that an immune response signature in the liver of cancer patients predicts metastasis and recurrence of hepato-cellular carcinoma [26]. Thus, future studies should evaluate whether the immunologic profile of CRC in African-American patients is a predisposing factor for tumor progression and metastasis. Previous investigations identified a two-gene tumor signature (CRYBB2 and PSPHL) that accurately differentiated between African- American and European-American prostate cancer patients [27]. Those two genes were also differentially expressed between African-American and European American breast cancer patients [28] In this study we found up-regulation of CRYBB2 and PSPH gene in CRC of African-American patients. Mutations in the CRYBB2 gene are also responsible for familial cataract [29]. PSPHL is a homolog of PSPH. Interestingly, PSPH is located on chromosome 7p15.2, a chromosomal region known to have gain of function related to advanced tumor stage in non-small-cell lung adenocarcinoma [30]. It was shown that increased expression of PSPH in non-small-cell lung cancer corresponds to clinical response to treatment with erlotinib [31]. Thus, it is possible that higher expression of PSPH contributes to CRC susceptibility in AAs and that the levels of PSPH expression may be correlated with response to anti-EGFR treatment. These possibilities will have to be tested in future studies. Considering down-regulated genes in AAs we found lower expression of the C17orf81 gene. Down regulation of this gene was associated with colon cancer [32], suggesting that lower expression of this gene can contribute to more aggressive CRC in AAs. Other down regulated genes in AAs include: TRNT1 (involve in RNA processing; [33]); ARHGAP6 (promotes actin remodeling: [34]); WDR8 (facilitates formation of multiprotein complexes: [35]). Considering the cellular functions of these genes it is not hard to envision how their expression may influence aggressiveness of CRC.
Whole-genome gene expression analysis experiments can be prone to findings that are either unique to a selected patient population or are artificially created by the applied technology. To exclude the possibility of an artifact, two different approaches were used to cross validate our gene expression data. First, we compared our results of the differentially expressed genes between 86 tumors and 40 surrounding non-tumor tissues with those from a published meta-analysis of five CRC gene expression datasets in Oncomine [36,37,38,39,40]; Table S2. We found a very good agreement between our results and the results of the other 5 metaanalyses. Of the top 20 over-expressed genes in CRC tumors across the 5 other meta-analyses (Oncomine; https://www. oncomine.org), 15 were also found to be significantly up-regulated (FDR, ,5%) in our study. Of the top 20 under-expressed genes in CRC tumors across the 5 other meta-analyses, 17 were significantly down-regulated (FDR, ,5%) in our study.
Second, we validated the expression of ten key genes via qRT-PCR and confirmed differences in gene expression between CRCs of AAs and EAs for eight of them.
In conclusion, the gene expression profile of CRC corresponds to differences in tumor biology between African-American and European-American patients. The implications of these differences in disease aggressiveness and response to therapy should be evaluated in future studies.