Distinct Cytoplasmic and Nuclear Functions of the Stress Induced Protein DDIT3/CHOP/GADD153

DDIT3, also known as GADD153 or CHOP, encodes a basic leucine zipper transcription factor of the dimer forming C/EBP family. DDIT3 is known as a key regulator of cellular stress response, but its target genes and functions are not well characterized. Here, we applied a genome wide microarray based expression analysis to identify DDIT3 target genes and functions. By analyzing cells carrying tamoxifen inducible DDIT3 expression constructs we show distinct gene expression profiles for cells with cytoplasmic and nuclear localized DDIT3. Of 175 target genes identified only 3 were regulated by DDIT3 in both cellular localizations. More than two thirds of the genes were downregulated, supporting a role for DDIT3 as a dominant negative factor that could act by either cytoplasmic or nuclear sequestration of dimer forming transcription factor partners. Functional annotation of target genes showed cell migration, proliferation and apoptosis/survival as the most affected categories. Cytoplasmic DDIT3 affected more migration associated genes, while nuclear DDIT3 regulated more cell cycle controlling genes. Cell culture experiments confirmed that cytoplasmic DDIT3 inhibited migration, while nuclear DDIT3 caused a G1 cell cycle arrest. Promoters of target genes showed no common sequence motifs, reflecting that DDIT3 forms heterodimers with several alternative transcription factors that bind to different motifs. We conclude that expression of cytoplasmic DDIT3 regulated 94 genes. Nuclear translocation of DDIT3 regulated 81 additional genes linked to functions already affected by cytoplasmic DDIT3. Characterization of DDIT3 regulated functions helps understanding its role in stress response and involvement in cancer and degenerative disorders.


Supporting Materials and Methods S2
Permutation test for TFBS enrichment The test to detect enrichment for TFBS among the regulated genes was based on a weighted statistic and significance assessed with permutation. We assume that we have expression values for a set of genes in two conditions. The genes are ranked for differential expression using for example log 2 -fold change, or the moderated t-statstic. These gene level statistics are denoted by d g . We also have a set of scores for the occurrence of motifs in the promoter of each gene. The indicator I gj equals 1 if gene g contains motif j in its promoter and 0 otherwise. We use the following test statistic with weights w gj . The weights score the values of the gene level between 0 and 1. If a gene is highly differentially expressed, it receives a score close to 1, otherwise it should receive a score close to 0. We use a logistic curve for the weights, for which we can vary the location and scale parameters according to the gene expression data.
If a motif is present in the promoter of several differentially expressed genes, the weights will be closer to 1 for these genes, and the test statistic u should be large. Conversely, if a motif is rarely seen in the promoters of the differentially expressed genes, it results in a small value of u.
The significance of motif occurrence and high differential expression is tested with permutation on the indicators I gj . The motif occurrence is permuted 1000 times and the value of the test statistic u is calculated for each permutation. The p-value for enrichment of motif j among the differentially expressed genes is calculated as where up denotes the value of the statistic in permutation p.
The parameter values for the location and scale parameters in the weigh functions have to be chosen by the user, but we recommend setting the location parameter to roughly the 80%-quantile of the gene level statistics.
We compared our method with another common permutation procedure called Gene Set En-richment Analysis (GSEA) 1 using a simulation study previously described 2 . Briefly, the expression for 600 genes in 20 samples was simulated using a multivariate normal distribution (all with variance 1). 520 genes constituted the background set, and were simulated with a mean µ = 0 and correlation ρ = 0. The remaining 80 genes were simulated with different means and correlations mixed of values µ = (0.75, 1, −1) and ρ = (0, 0.6, −0.6). Nine sets were used to test the enrichment methods, of which sets 1, 2, 6, and 7 should be detected by any well working method, and sets 4, 5, 8, 9 ideally also should be detected (although only half of the genes were differentially expressed in these sets). Set 3 should work as a negative control 2 . We simulated 100 data sets, ranked the genes in each data set by log 2 -fold change (absolute values) as well as by the moderate-t statistic (also absolute values), and tested each method on these sets. Our method was tested with three different values on the location parameter, corresponding to the 75, 80, and 85 percentiles of the gene level statistics. The scale parameter was set to 0.1.
We observe that the results for the permutation test performs slightly better than GSEA on all data sets. The results seem to be quite robust to the choice of the location parameter. The scale parameter can also be varied, an influences how sharply the logistic curve switches from values close to zero to values close to one. The results are quite robust also to the choice of this parameter (data not shown), but we recommend values in the range 0.05 -0.2. According to our simulations, a good choice for the location parameter is in the range given above (75-85th percentiles of the gene level statistics).
Our permutation method is very easy to implement and the better performance of our statistic u to the running sum in the GSEA is probably due to the fact that our statistic is not sensitive in the same way to the absolute gene ranking. Although there is a need to choose the extra location and scale parameters, our method offers more versatility in how the expression values are allowed to influence the results (we can choose to only use highly differentially expressed genes, or be more liberal and allow genes with moderate expression values to also influence the statistic). We can choose to apply the weights (the logistic curve) to absolute values of the gene level statistics, or to the original values.
For the motif enrichment p-values presented in the paper (see Table S4 in supplemental material), we ranked the genes by absolute log 2 -fold change and chose the values 0.75 for the location parameter and 0.1 for the scale parameter. We also tested the method on the down regulated genes, with similar negative results (data not shown).  Table 2: Enrichment results from our permutation test. The values correspond to the proportion of p-values < 0.05 in the 100 data sets. The location parameter was set to 0.6, 0.68, and 0.77 for the log 2 -fold change ranked data and to 1.35, 1.52, and 1.73 for the data ranked with the moderated-t statistic. The scale parameter was set to 0.1