BAP1 expression is prognostic in breast and uveal melanoma but not colon cancer and is highly positively correlated with RBM15B and USP19

BAP1 is a tumor suppressor gene important to the development and prognosis of many cancers, especially uveal melanoma (UM). Its role in more common cancers such as breast and colon cancer is largely unknown. We collected the transcriptome profiling data sets from the TCGA uveal melanoma (TCGA-UVM), breast cancer (TCGA-BRCA), and colon cancer (TCGA-COAD) projects to analyze the expression of BAP1. We found that patients with UM and breast cancer, but not colon cancer, who died had a lower level of BAP1 gene expression compared to surviving patients. Importantly, in breast cancer patients, the lowest BAP1 expression levels corresponded to the dead young patients (age at diagnosis < 46). Since the number of cases in TCGA-BRCA was much higher than TCGA-UVM, we obtained highly correlated genes with BAP1 in invasive breast carcinomas. Then, we tested if these genes are also highly correlated with BAP1 in UM and colon cancer. We found that BAP1 is highly positively correlated with RBM15B and USP19 expression in invasive breast carcinoma, UM, and colon adenocarcinoma. All three genes are located in close proximity on the 3p21 tumor suppressor region that is commonly altered in many cancers.

Although it is known that BAP1 is a tumor suppressor gene important to the development and prognosis of many cancers, especially UM, its role in more common cancers including breast and colon is largely unexplored.
Here, we collected RNA-seq data sets from TCGA-BRCA, -UVM, and -COAD projects at Genomic Data Commons Data Portal, and we employed computational techniques to investigate the role of BAP1 in UM and two common cancers, breast and colon. Furthermore, we evaluated how patient age, sex, and race may be associated with BAP1 gene expression and patient survival with these cancers. We hypothesized that BAP1 gene expression could be a potential signature for aggressive tumors. In addition we identified genes with expression that is highly correlated with BAP1. To determine the genes that may be critical to BAP1 functional pathways, we used the UM data set which is known to have a high level of BAP1 alteration to identify candidates, which were validated in the breast and colon cancer data sets.
We analyzed the data sets by applying two slightly different normalization methods. In the first method, we evaluated all genes and obtained z-scores to normalize the data. In the second method, we excluded the genes with values of zero for all patients in the data set, and then we obtained z-scores. Importantly, the results were independent of these two normalization strategies.

Dead patients had low BAP1 expression compared to survivors in breast cancer and UM but not colon cancer
We divided UM patients in four sub-categories based on their gender and vital status: femalealive, female-dead, male-alive, and male-dead (Table 1). We also divided patients into subgroups based on their age at diagnosis (grouped into intervals of 10 years starting from 26, which was the age of the youngest patient). The left panel of Fig 1(a) shows the normalized level of BAP1 for these patients, and the right sub-figure represent the density function for the values of BAP1 in dead and alive patients after excluding genes with zero values and normalizing the data set. This figure indicates that the expression level of BAP1 is lower in dead patients compared to the survivors regardless of their gender.
We also divided breast cancer female patients in four sub-categories based on their race and vital status: black-alive, black-dead, white-alive, and white-dead ( Table 2). The left sub-figure of Fig 1(b) shows the normalized (z-score) value of BAP1 for survival categories of black and white female breast cancer patients, and the right sub-figure compares the BAP1 expression in survivors with the BAP1 expression in dead patients. In this figure, we also divided male-alive 32 32 male-dead 16 16 breast cancer patients into sub-groups based on their age at diagnosis (grouped into intervals of 10 years starting from 26, which was the age of the youngest patient). We found that dead patients, regardless of their race, have a lower level of BAP1 expression compared with living patients. Importantly, the smallest level of BAP1 corresponds to the young (age at diagnosis < 46 years old) patients who died. Additionally, we investigated the expression level of BAP1 in colon cancer. We divided patients with colon cancer in four sub-categories based on their gender and vital status: female-alive, female-dead, male-alive, and male-dead ( Table 3). The left sub-figure of Fig 1(c) shows the normalized level of BAP1 for these patients, and the right sub-figure represents the difference between the BAP1 expression in dead and alive patients. We also divided colon cancer patients into sub-groups based on their age at diagnosis (grouped into intervals of 10 years starting from 31, which was the age of the youngest patient). This figure indicates that the expression level of BAP1 in patients with colon cancer is not correlated with vital status or gender.

BAP1 is highly positively correlated with RBM15B and USP19
We found the genes that were highly correlated (absolute value of correlation coefficient > 0.8) with BAP1 in the entire TCGA-UVM dataset. Although there were 24 genes with a high positive correlation with BAP1, there were no genes with a high negative correlation with BAP1. Table 4 shows the genes that are highly positively correlated with BAP1 in the entire data set of UM. Top panels (and bottom panels) of In patients with breast invasive carcinomas, BAP1 is highly positively correlated with NSG00000132153.13 (DHX30), ENSG00000259956.1 (RBM15B), and ENSG00000172046.17 (USP19), in all four categories (Table 5). Importantly, all these three genes were also highly positive correlated with BAP1 in UM (Table 1). DHX30 has the highest positive correlation with BAP1 in black-alive and white-dead categories, while USP19 has the maximum positive correlation with BAP1 in black-dead and white-alive breast cancer categories. We also obtained the genes that were highly correlated with BAP1 in samples obtained from primary tumor of patients with colon adenocarcinoma (See Table 6 and Fig 4). Interestingly, USP19 and RBM15B were also highly correlated with BAP1 expression in this data set.  The genes that were highly correlated with BAP1 in the entire TCGA-BRCA, -UVM, and -COAD datasets have been presented in Tables 4, 5, and 6. There are three genes other than BAP1, ENSG00000259956.1 (RBM15B), ENSG00000172046.17 (USP19), and ENSG00000176095.10 (IP6K1), that belong to all three tables. Note that IP6K1 was not in the  list of highly correlated genes in the all four categories of breast cancer, because its correlation coefficient with BAP1 in the black-alive category was 0.73, which was less than the threshold 0.8. However, its correlation coefficient with BAP1 in the other three categories of breast cancer patients was greater than 0.8.

No genes are highly negatively correlated with BAP1
Fig 5 shows the sorted correlation coefficients of all genes with BAP1 in UM, breast, and colon cancer patients. Although most of genes are negatively correlated with BAP1, there is no gene, which is highly negatively correlated (correlation coefficient < −0.8) with BAP1 in all data sets. Each category of breast cancer patients has some genes that are highly negatively correlated with BAP1, but there is no gene that is highly negatively correlated with BAP1 in at least two categories. Furthermore, there are also no genes in the TCGA-UVM data set and also in the TCGA-COAD data set with correlation coefficient less than −0.8 with BAP1.

Excluding genes with zero values did not change the results
We applied two normalization methods: 1) obtaining z-scores for each patient without excluding any genes and 2) eliminating genes with values of zero for all patients, then obtaining zscores for each patient (for more details see Methods). These two normalization methods led to the same results.

Discussion
Herein, we analyzed publicly available data sets from the TCGA and showed that BAP1 plays a prognostic role in both UM and breast cancer. In contrast, survival was not associated with BAP1 expression in patients with colon cancer. There is a study showing that mutation of BAP1 only significantly affected overall survival in female patients with RCC [10]. In this project, which is the first study to our knowledge that evaluated BAP1 associations with race and gender as well as age, we found that the BAP1 expression does not correlate with race or gender in these data sets. However, the dead UM and breast cancer patients had lower levels of BAP1 expression compared with surviving patients. Saliently, young dead patients with breast cancer had the lowest level of BAP1. This finding suggests that the level of BAP1 could be a signature of aggressive tumors in UM and breast cancer, especially for young breast cancer patients. In contrast, survival was not associated with BAP1 expression in patients with colon cancer.
We discovered that there are three genes (DHX30, USP19, and RBM15B), which are highly positively correlated with BAP1 in both black and white breast cancer patients. Among these three genes, two, USP19 and RBM15B are also highly positively correlated with BAP1 in UM. Importantly, USP19 and RBM15B were also among 10 genes with a high positive correlation with BAP1 expression in colon adenocarcinoma. To further learn these relationships, we performed a regression analysis to obtain the linear functions that predict the levels of USP19 and RBM15B from the level of BAP1. We found that the slopes of the lines modeling these linear relationships were higher in breast and colon cancers compared to uveal melanoma (See Fig 6).
The role of BAP1 in breast cancer is still not clear. BAP1 is located in the 3p21 chromosomal region, which is commonly deleted in breast cancer patients [11]. This suggests that BAP1 could be an important driver for tumorigenesis in breast cancer. However, according to cBio-Portal.org, there is a low frequency of BAP1 somatic mutation (1.8%) in breast cancer. Also, a recent study shows that biallelic inactivation of BAP1 is rare in breast cancer [12]. Our data show that BAP1 expression is decreased in a subset of UM and breast cancer patients. This suggests that haploinsufficiency of BAP1, likely as part of the 3p21 chromosomal deletion, could be still important in the pathogenesis of breast cancers. BAP1, which is a nuclear-localized tumor suppressor protein, has an ubiquitin carboxy-terminal hydrolase (UCH) domain resulting in its deubiquitinase activity. Although earlier studies suggested that BAP1 function is through modulation of the BRCA1, later studies showed that it is an independent tumor suppressor gene [10]. It has been shown that the deubiquitinating activity and nuclear localization are both required for BAP1-mediated tumor suppression [13]. The tumor suppressor function of BAP1 is through its impact on cell cycle regulation, chromatin remodeling and DNA damage. BAP1, which is expressed in a temporal and spatial pattern during breast development and remodeling, binds to the BRCA1 RING finger motif, and pathogenic mutations in that domain negatively impact BAP1 binding [14]. Additionally, a decrease in the level of BRCA1 protein has been observed in the malignant mesothelioma cells with BAP1 deletion, while transduction of the mutants as well as WT BAP1 results in an increase in the level of BRCA1 protein [15]. BAP1 enhances BRCA1-mediated suppression of cell proliferation through stabilizing BRCA1 [14]. However, BAP1 function is not through deubiquitination of the BRCA1/BARD1 complex.
USP19, which has been mostly studied in skeletal muscle atrophy, is a deubiquitinating enzyme [16]. It has been observed that the depletion of USP19, which is located in close proximity to BAP1 at 3p21.3 region, results in defects in cell cycle progression; USP19-depleted cells over-express KPC1 and have reduced rates of growth [17]. It has been also reported that USP19 inhibits TNFα-induced apoptosis via the stabilization of c-IAPs, which are ubiquitin ligases that regulate the stability of a variety of apoptotic and non-apoptotic proteins [18]. Importantly, a mutation in USP19 and LAMB2 has been observed in 12 patients with UM at high-risk of metastasis [19].
The RNA binding motif protein RBM15B (OTT3), which also located in the same chromosomal region 3p21, has been identified as a binding partner of the Epstein-Barr virus mRNA export factor [20] and found to be a cofactor of the nuclear export receptor NXF1 [21]. Recently, it has been shown that RBM15B is located in a nuclear macromolecular complex through its direct binding to CDK11 p110 -cyclin L complexes and the splicing factor 9G8 [22]. Moreover, RBM15B is a functional competitor of the SR proteins, capable of inhibiting the formation of the spliceosomal E complex and antagonizing the positive effect of the CDK11 p110 -cyclin L2α complex on splicing [22]. Note that expression of the large CDK11 p110 protein kinase isoforms is ubiquitous and constant throughout the cell cycle [23].
Of note, all highly positively correlated genes with BAP1 in UM and breast cancer are located in chromosome 3. In metastatic UM, monosomy of chromosome 3 has been associated with short survival, compared to metastases with disomy or partial change in chromosome 3 [24]. Chromosome 3 has also been reported to play a role in prognosis of some other cancers, including in colon cancer and AML [25,26]. Additionally, loss of chromosome 3 is well known as a poor prognostic factor in primary UM [5]. Importantly, allelic loss of several distinct regions on chromosome 3p including 3p25, 3p21.22, 3p21.3, 3p12.13 and 3p14 is one of the earliest and most frequent genomic abnormalities involved in a wide spectrum of major epithelial cancers including breast and colon [11,27]. Chromosome 3p deletions occur in more than 80% of breast carcinomas [27]. This finding suggests that USP19 and RBM15B could be in a functional pathway of BAP1 or could be altered based on regional loss of chromosome 3p21.
In summary, our findings suggest that BAP1 expression has prognostic significance in both UM and breast cancer. Future work to analyze BAP1 expression in other cancers will be performed. An effort to develop targeted therapeutics for cells deficient in BAP1 expression will be of great translational interest. The role of genes RBM15B and USP19 may be important in BAP1 function and give further insight into the mechanism of BAP1 tumor suppression and additional targets for therapy.

Materials and methods
We downloaded the TCGA transcriptome profiling data (HTSeq-FPKM-UQ files) of patients with UM, invasive breast carcinoma, and colon adenocarcinoma, from the GDC data portal (TCGA-UVM, TCGA-BRCA, and TCGA-CODA). For each of these cancer types, we downloaded the data sets data sets based on the age at diagnosis; we downloaded the data sets in the batches of 10 years interval starting from the minimum age at diagnosis. We divided the breast cancer data set into 2 main categories females and males. Since there were only 12 male patients, we only analyzed the data of female patients. We divided the female breast cancer patients to four sub-groups: black-alive (151 cases, 156 HTSeq-FPKM-UQ files), black-dead (29 cases, 31 HTSeq-FPKM-UQ files), white-alive (636 cases, 705 HTSeq-FPKM-UQ files), white-dead (112 cases, 159 HTSeq-FPKM-UQ files) ( Table 2).
We divided UM and colon cancer patients into four sub-categories female-alive, femaledead, male-alive, and male-dead. Since the race was unknown for approximately half of these data sets, we only divided these two datasets based on gender, vital status, and age at diagnosis to several sub-categorizes. The TCGA-UVM data set includes 88 cases: 40 females and 44 males (Table 1). Since the number of cases in each category was small, we obtained the correlations with BAP1 in the entire UVM data set rather than in each category. We collected the gene expression data from primary tumors of 456 patients with colon adenocarcinoma from TCGA-COAD project. This data set includes 478 HTSeq-FPKM-UQ files (Table 3).
In a recent study, it has been shown that gene expression profiles of patients have similar distributions, while the distribution of the expression value of each gene is different [28]. Furthermore, the genes' expression level is a representation of interaction network of genes, if we normalize the value of each genes separately then we loose information. Thus, the best way to normalize the gene expression data sets from the primary tumors is normalizing each file (transcriptome profiling data of each patient), separately. Therefore, we standardized the gene expression profile of each patient separately to normalize the data. We applied two normalization methods: 1) obtaining z-scores for each patient without excluding any genes and 2) eliminating genes with values of zero for all patients, then obtaining z-scores for each patient. We called the second method n-normalization.
We performed several statistical analyses to see if the expression value of BAP1 is different in alive patients versus the dead ones. We applied the Mann-Whitney-Wilcoxon (MWW), also known as Mann-Whitney U-test, which is a nonparametric test of the null hypothesis that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample [29]. We have also used t-test to investigate whether the means of expression level of BAP1 in two groups of alive and dead patients are statistically different from each other.
In order to obtain the genes, which are highly correlated with BAP1, first we normalized data by obtaining z-scores for each file independently. Then, we calculated the Pearson correlation between BAP1 and all other genes. Specifically, we have a data set [p 1 , � � �, p n ], where n is the number of patients. Each p i is a list of gene expression values, i.e. p i = [g 1 � � �, g m ], where each g j is the expression value of transcript (gene) j, and m is the total number of transcripts (m = 60483). That means the data set is an n × m-dimensional matrix D = [g ij ], where g ij is the expression of transcript j in file (patient) i.
To normalize the data set, we obtained the z-score for each p i . Thus, we got a new matrix D ¼ ½ĝ ij �, whereĝ ij 's are normalized values. Then, for each transcript j, we obtained Pearson correlation between the list ½ĝ ij ; 1 � i � n� and the corresponding list for BAP1 (½ĝ iBAP1 ; 1 � i � n�).