Figure 1.
Receiver operator characteristics (ROC) of mono- and di-nucleotides (a), and histogram of α-score in promoter regions (b).
(a) Several di-nucleotides are positive or negative classifiers distinguishing 1,746 experimentally confirmed promoters of genes active in human cerebellum tissue from random sequences, with CpG, GpC and CpC/GpG the strongest positive predictors and ApT, TpA, and ApA/TpT the strongest negative ones. Shown are also G/C content, as well as the α-scores summed over the promoter regions. (b) Shown is the distribution of regions over the sums of α-scores: 1,746 experimentally confirmed promoters of genes active in human cerebellum tissue in red (i), promoters chosen from the UCSC gene set [24] in blue (ii), randomly chosen sequences as negative control in green (iii), and the respective overlaps between the distributions in purple, dark chartreuse and black.
Figure 2.
Human promoter CTAG1A and modified constructs.
(a) The 535 base pair long promoter region of human gene CTAG1A is rich in CpGs and exhibits α-scores higher than the genomic distribution with pronounced peaks. Shown are the composite αk-scores (top), the individual αk-scores for different sizes of k in the middle graph (colour coded, blue = negative, red/orange = positive), and CpGs in yellow (bottom). The three strongest regions are marked by red bars. (b) In-vitro activity of the original CTAG1A promoter (hCTAG1A Promoter), the three strongest α-score regions deleted (hCTAG1A delta), the three strongest α-score regions replaced with sequences from the genomic concatomer (hCTAG1A replace), and the three strongest α-score regions replaced with sequences from the promoter-like concatomer (hCTAG1A UP). Also shown are results without any promoter (Negative CO) and the SV40 core promoter (SV40 Promoter AVG).
Figure 3.
In-vitro promoter activity driven by artificial constructs.
Artificial constructs ArS110, ArS300, ArS201 and ArS232 exhibit strong promoter activity driving a reporter gene (firefly luciferase, internally normalized by renilla luciferase) in mammalian cell lines: (a) CHO/hamster, (b) P19/mouse, (c) VERO/monkey, (d) HEK293/human, but not in (e) the insect cell line Sf9/army worm. Also shown are the negative control (−) and the SV40 core promoter activity (+). (f) TATA-boxes 1 (left) and 2 (right) were deleted from construct ArS232: deletion of TATA-box 1 only (dT1) results in lack of activity, deletion of TATA-box 2 (dT2) does not change expression levels, while deletion of both (dT1&2) results in slightly increased expression levels.
Figure 4.
Binding affinity of artificial promoter constructs to the transcription factors TFIIB and TBP.
The binding expressed as Δnm on the y-axis was monitored in real time as sec (x-axis), using the ForteBio Octet QK instrument. Binding was conducted in four phases: (i) loading of biotinylated DNA fragments to the streptavidin biosensor tip, (ii) washing in Kinetics Buffer, (iii) association of the transcription factor and (iii) dissociation of the transcription factor. (a) The promoter constructs ArS110, ArS201, ArS232 and ArS300 show similar binding affinities to the TFIIB protein. (b) The promoter constructs ArS232, ArS232 dT1, ArS232 dT2 and ArS232 dT12 exhibit sequence-specific binding to the TBP protein. ArS232 dT12 lacking two TATA-Boxes shows the lowest binding affinity compared to the other constructs. (c) TFIIB binding vs. a negative control, for which we chose a 85 bp long sequence from inside the coding region of the luciferase gene (pGL3-Basic Promoter Promega: 1314 bp–1399 bp).
Table 1.
Calculation of the binding constants kon, koff, and KD for the TFIIB binding assays.