Mammalian TBX1 Preferentially Binds and Regulates Downstream Targets Via a Tandem T-site Repeat

Haploinsufficiency or mutation of TBX1 is largely responsible for the etiology of physical malformations in individuals with velo-cardio-facial/DiGeorge syndrome (VCFS/DGS/22q11.2 deletion syndrome). TBX1 encodes a transcription factor protein that contains an evolutionarily conserved DNA binding domain termed the T-box that is shared with other family members. All T-box proteins, examined so far, bind to similar but not identical consensus DNA sequences, indicating that they have specific binding preferences. To identify the TBX1 specific consensus sequence, Systematic Evolution of Ligands by Exponential Enrichment (SELEX) was performed. In contrast to other TBX family members recognizing palindrome sequences, we found that TBX1 preferentially binds to a tandem repeat of 5′-AGGTGTGAAGGTGTGA-3′. We also identified a second consensus sequence comprised of a tandem repeat with a degenerated downstream site. We show that three known human disease-causing TBX1 missense mutations (F148Y, H194Q and G310S) do not alter nuclear localization, or disrupt binding to the tandem repeat consensus sequences, but they reduce transcriptional activity in cell culture reporter assays. To identify Tbx1-downstream genes, we performed an in silico genome wide analysis of potential cis-acting elements in DNA and found strong enrichment of genes required for developmental processes and transcriptional regulation. We found that TBX1 binds to 19 different loci in vitro, which may correspond to putative cis-acting binding sites. In situ hybridization coupled with luciferase gene reporter assays on three gene loci, Fgf8, Bmper, Otog-MyoD, show that these motifs are directly regulated by TBX1 in vitro. Collectively, the present studies establish new insights into molecular aspects of TBX1 binding to DNA. This work lays the groundwork for future in vivo studies, including chromatin immunoprecipitation followed by next generation sequencing (ChIP-Seq) to further elucidate the molecular pathogenesis of VCFS/DGS.

Introduction T-box genes encode a large family of transcription factors that are required during embryonic development. Brachyury, the founding member of this family was first identified due to the presence of a short tail phenotype found in heterozygous mice and lack of axial development with early lethality in null mutant embryos [1][2][3]. Brachyury has an evolutionarily conserved DNA binding domain, termed the T-box, and can regulate transcription of a reporter gene in cell culture [4,5]. Since the original discovery of Brachyury, nineteen different T-box genes have been identified and are evolutionarily conserved from flies to humans [6][7][8]. Most T-box genes are dispersed on different chromosomes. They are classified based upon sequence homology to each other and are members of five different subfamilies [6][7][8]. As for Brachyury, most T-box transcription factors are required for embryonic development and many are sensitive to altered gene dosage for biological function. The T-box family has received notoriety as mutations have been identified in the etiology of several congenital malformation disorders. For example, mutations in TBX3 lead to Ulnar Mammary Syndrome, and mutations in TBX5 cause Holt-Oram Syndrome, both of these presenting disease specific limb and heart defects [9,10] among others [11].
All T-box family members share an evolutionarily conserved, DNA binding domain comprising approximately 180 amino acids. The Brachyury protein binds as a homodimer to a palindrome of two AGGTGTGA ''half-sites'' [4]. Brachyury can also bind as a monomer to a single half-site, but with 20 fold lower binding affinity [12]. Molecular biological methods have been used to identify the consensus sequence for other T-box proteins and most can bind to the Brachyury palindrome or half-site [4,12,13], but they have their own preferential binding site, as in the case of TBX5, TBX6, TBX15 and TBX18 [14][15][16]. Among other T-box proteins tested, Brachyury, TBX15 and Eomes can bind to a direct repeat [16][17][18].
The TBX1 gene encodes a T-box transcription factor that maps to the 22q11.2 region, which is hemizygously deleted in individuals with velo-cardio-facial syndrome and DiGeorge syndrome (VCFS/DGS; MIM #: 192430/188400). Since most have a typical 3 million base pair deletion, it is also referred to as 22q11.2 deletion syndrome (22q11DS). Historically, TBX1 was found to bind to the palindromic T-site, but unlike for other transcription factors, it did not significantly activate nor repress transcription of a reporter construct [12]. Heterozygous mutations in TBX1 have been reported in rare non-deleted patients with related physical defects to that of VCFS/DGS. It is believed that these are loss of function mutations resulting in haploinsufficiency [19][20][21].
To expand the repertoire of direct transcriptional downstream target genes, we performed Systematic Evolution of Ligands by Exponential Enrichment (SELEX) to identify the mouse TBX1 consensus site [4,[37][38][39][40]. We found that TBX1 binds to two different consensus sequences, one that is a perfect tandem repeat of the Brachyury half-site and the other that is an imperfect tandem repeat. TBX1 can activate transcription of these novel sites in luciferase reporter assays in cell culture. Using these new consensus TBX1 sites, we found that the TBX1 mutations previously reported alter transcriptional activity. Next, we wanted to use the new consensus sequences to identify potential downstream transcriptional target genes. After performing an in silico genome wide search for these motifs, we tested 30 and validated 11 putative direct binding sites, including sites in the Fgf8, Bmper and Otog-MyoD genomic loci. These and others are strong candidates to be pursued as direct downstream targets in future by in vivo functional experiments.

Ethics Statement
Animal studies were carried out in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. The protocol was approved by the Albert Einstein College of Medicine Animal Institute Committee (Protocol Number: 2013-0405; Protocol Name: Mouse Models of 22q11 Rearrangement Disorders). All embryo dissections were conducted after euthanizing mice by direct inhalation with CO 2 .

Recombinant GST-TBX1 Fusion Protein
The T-box region (amino acids 90-303) of mouse Tbx1 was PCR amplified from cDNA with the flanking restriction enzyme sites of EcoRI and XhoI. These sites were used to subclone the DNA fragment into the bacterial expression vector, pGEX4t3 (GE Healthcare), to generate a GST-TBX1 fusion protein. The vector was transformed into BL21(DE3)LysS competent cells (Stratagene) and grown on LB ampicillin agar plates. Colonies were picked and grown in 2XYT media, 10 mg/ml Amp, 1 M MgCl 2 and 20% glucose. Cultures were grown at 29uC and protein expression was induced by the addition of 100 mM IPTG (isopropyl-beta-Dthiogalactopyranoside). After induction with IPTG, protein was detected via Coomassie blue staining and the fusion protein was subsequently purified with glutathione Sepharose 4B beads (GE Healthcare) and detected via western blot. The same protocol was followed when inducing the F148Y, H194Q and G310S mutated TBX1 proteins.

In vitro Selection (SELEX)
A 76-mer single-stranded library of oligonucleotides 59-GTAACGTCGAGACGGAATTCGCGGCCGCN 18 CTCGAG-GATCCGTGCTCAGTCCCTATCG-39, where a random 18mer sequence flanked by two 28-mer flanking fixed sequences used for sequencing, was synthesized by Fisher Scientific (HPLC purification) as previously described [41]. The second strands were generated using Klenow enzyme (NEB) at 25uC for 3 hrs with the primer 59-CGATAGGGACTGAGCACGGATCCCT-39. The dsDNA samples were separated on a 4.5% UltraPure Agarose 1000 gel (Invitrogen) and purified by Qiaquick Gel Extraction Kit (Qiagen). PCR was performed to amplify the dsDNA products. After six rounds of selection with recombinant GST-TBX1 protein, the PCR products from round 0 (original dsDNA randomers), two, four and six were labeled with [a-32 P] dCTP (PerkinElmer, Cat# NEG513H250UC) by Taq DNA polymerase. Oligonucleotides from each round were captured using glutathione Sepharose beads (GE Healthcare). Each labeled round of oligonucleotides was tested via EMSA to determine the round with the highest enrichment. Amino acids 90-303 were digested from a plasmid containing full length Tbx1 cDNA (Tbx1-pCDNA3.1) and subcloned into the pGEX 4t3 vector for bacterial induction ( Figure S1a). Protein induction by IPTG was detected via Coomassie blue staining as well as by western blot analysis. The SELEX procedure was carried out for six rounds, incrementing the pool of oligonucleotides with the highest binding affinity with each subsequent round. The PCR products from round six were cloned into pSC-A vector for sequencing. In total, 60 colonies were picked, plasmid DNA was extracted and subjected to Sanger sequencing (Einstein Genomics Core Lab). The sequences were aligned using the WebLogo program (http:// weblogo.berkeley.edu) and two motifs were generated.

Western Blot
Proteins that were induced by IPTG and visualized with Coomassie staining were also tested via western blot with specific antibodies. Proteins were denatured with 6x Laemmli loading buffer to 95uC for 5 minutes. Samples were then loaded onto a 10% acrylamide/bisacrylamide gel for 1 hour at 120 volts and subsequently transferred onto a PVDF membrane (BioRad). Antibodies used were: 1u-rabbit polyclonal a mouse Tbx1 1:500 (Zymed); rabbit polyclonal a GST 1:500 (Abcam); 2u-ECL donkey anti-rabbit IgG, horseradish peroxidase linked whole antibody 1:10,000 (Amersham Biosciences).

Reporter Constructs
Two oligonucleotides containing six copies of the tandem repeat-AGGTGTGAAGGTGTGA (6x TR) and half site partial site -AGGTGTGATCGCGTCAT (6x K SPS) in tandem respectively, were generated synthetically from Genscript. The 6x tandem oligonucleotides were digested with XhoI and NheI and subcloned into a pGL3-promoter vector (Promega) [43]. Each tandem repeat binding site had a spacer of two random nucleotides (6xTR-pGL3p and 6xKSPS-pGL3p). After testing a concentration gradient of full length Tbx1-pCDNA3.1, we determined that 100 ng of this plasmid led to the highest fold change (Fc-27) when compared to the control experiment where the empty activating vector (Empty pCDNA3.1) was co-transfected with the reporter. Mutated binding sites were generated synthetically from Genscript. The AGGTGTGA sequence was mutated to AATTTTGA (Mutated 6xTR-pGL3p and Mutated 6xKSPS-pGL3p) [31]. In addition mutations at positions P8 ART, P11 GRC and P13 GRT in the KSPS, were also generated (P8 6xKSPS-pGL3p; P11 6xKSPS-pGL3p; P13 6xKSPS-pGL3p).

Site Directed Mutagenesis
The Quick Change Lightning Site-Directed Mutagenesis Kit (Agilent Technologies) was used to generate mutations in TBX1. Full length Tbx1-pCDNA3.1 was used as a template for the PCR reaction. Primers were designed with the nucleotide changes: F148Y Sense-59-CCCCACGTTCCAAGTGAAGCTTATG-GAATGGATCC-39; H194Q Sense-59-CTGGCCGAGTACAG-TACCACCCGGACT-39; G310 Sense-59-AACCACCGGCC-CAGTGCGCTGCCGCTC-39. After DpnI digestion, plasmids were transformed into XL10-Cold Ultracompetent cells and plated overnight on LB-ampicillin plates, per instructions supplied with kit. Colonies were picked and grown in liquid culture overnight. Plasmid DNA was isolated using the Qiagen mini-prep kit and subjected to Sanger sequencing.

Bioinformatic Analysis
An in-house bioinformatic program was created to search the mouse genome for the consensus T-box motif, AGGTG(T/C)(G/T)A, identified by the SELEX experiment. These sites were then compared to a list of the most conserved elements produced by the phastCons database based on whole-genome alignment of placental mammals [44] from the UCSC genome browser (http://genome.ucsc.edu). Motifs were then assigned to the nearest RefSeq genes and were then grouped based on conservation and distance to transcriptional start sites (TSS) defined as 2100 kb to +1 kb and 21 kb to +100 kb. Gene ontology software tools, GREAT (Genomic Regions Enrichment of Annotations Tool-http://bejerano.stanford.edu/great/public/html/ index.php) and DAVID (http://david.abcc.ncifcrf.gov), were used to generate functional groups of genes harboring the motifs, by inputting the chromosomal positions of the putative TBX1 motifs.

Whole-Mount RNA In Situ Hybridization
Embryos were fixed in 4% paraformaldehyde overnight at 4uC. The embryos were then serially dehydrated to 100% methanol and stored at 220uC. On day 1 of the protocol, the embryos were rehydrated to 1xPBS/.01%Tween-20 and the in situ hybridization assay was carried out as previously described (Franco et al., 2001). Anti-sense digoxigenin labeled RNA probes to Tbx1 and Fgf8 [45] were generated from plasmids via standard methods. The Bmper probe was generated from templates amplified from E10.5 mouse cDNA using the following primers: Fwd (59-AGTCCTT-GACTTGGCTTATC-39; Rev (59-GCACTTGGACATTA-TACTTGC-39). Each RNA template was created by PCR with a T3 RNA polymerase binding site at the 59 end and a T7 RNA polymerase binding site at the 39 end. Embryos were dissected at E9.5 and E10.5. Mice were maintained in a 12 hour dark/ 12 hour light cycle in compliance with the Albert Einstein College of Medicine of Yeshiva University Institutional Animal Care and Use Committee (IACUC).

SELEX Identifies the Optimal Binding Sites of GST-TBX1
To determine the optimal binding site of mouse TBX1, an in vitro selection method termed Systematic Evolution of Ligands by Exponential Enrichment (SELEX) was performed. We created a GST-TBX1 fusion protein containing the T-box DNA binding domain [12] and ten amino acids on either side (90-303) (Fig. S1). The GST-TBX1 protein was able to bind to the Brachyury palindromic sequence as determined by electrophoretic mobility shift assay (EMSA) (Fig. S1). The validated GST-TBX1 protein was then subjected to the in vitro SELEX selection method and after six rounds of selection, we identified a clearly distinguishable protein-DNA complex (Fig. 1A). Both protein-DNA complexes were specific because binding of the radiolabeled oligonucleotides was competed with non-radiolabeled oligonucleotides obtained from round six of selection (Fig. 1B). We found two different sized protein-DNA complexes in gel assays, suggesting that TBX1 binds in two conformations (Fig. 1B). A total of 60 separate bacterial clones containing enriched oligonucleotides were sequenced to obtain a consensus sequence. Among them, 55 sequences were selected and aligned to generate two 16 bp DNA consensus sequences, containing one or two repeated GTGT ''core'' motifs in a tandem orientation (Figs. 1C,D). The GTGT core motif is part of the consensus binding site for T-box protein family members, indicating overlap between the TBX1 motif and that of other members. One consensus closely resembles the Brachyury half-site of AGGTGTGA. We termed this motif as the TBX1-TR (Fig. 1C). The second consensus sequence is also a tandem repeat, however, the 39 site is comprised from a degenerate sequence with only two highly conserved positions (13 and 16, Fig. 1C). We termed the second consensus sequence, TBX1-KSPS (half site partial site) (Fig. 1C). In total, 62% of 55 sequences closely resembled the TR site and 34% had the KSPS site consensus (Fig. 1D).

Two Identified Motifs are Specifically Bound by GST-TBX1
To test for binding specificity, oligonucleotides were designed that contained a single copy of the TR and KSPS, where each nucleotide corresponds to the most highly selected base at each position: TR: 59-AGGTGTGAAGGTGTGA-39 and KSPS: 59-AGGTGTGATCGCGTCAT-39. The GST-TBX1 protein was able to bind to both motifs by EMSA ( Fig. 2A). Two concentrations of protein were tested showing the same proportion of protein-DNA complexes ( Fig. 2A). Binding was competed with 100X excess non-radiolabeled oligonucleotide of the same sequence. After exposing the film for an extended period of time, a second more slowly migrating protein-DNA complex appeared that was similar to that present in Fig.2A (data not shown). The shifted protein-DNA complex using the KSPS appeared to be weaker in intensity on the gel as compared to the TR ( Fig. 2A). In the same experiment, the 8bp half-site (T-site), 59-AGGTGTGA-39 and the Brachyury palindrome, 5-TCACACCTAGGTGTGAA-39 were also tested. A protein-DNA complex was never observed when GST-TBX1 was incubated with the half-site and only after over exposing the film for 24 hours was a protein-DNA complex observed with the full Brachyury palindrome (data not shown). To further test the specificity of the newly derived motifs, gradients of both poly(dI-dC) and specific cold-competitor were generated and tested by EMSA (Fig. S2A). As the concentration of poly(dI-dC) increased, the binding intensity decreased; however, the protein-DNA complex was located at the same position in the gel, indicating specific binding occurred (Fig. S2A).

TBX1 Transcriptionally Activates Reporter Genes
To test whether TBX1 could activate transcription of a reporter by binding to the newly identified consensus sequences, we performed luciferase assays using Jeg3 cells. Jeg3 cells have successfully been used previously to test TBX1 activation of reporter constructs harboring endogenous gene loci suggesting that it has the necessary co-factors for TBX1 to bind and regulate transcription [21]. Reporter constructs containing six tandem copies of the TR or KSPS were generated and tested in luciferase assays (6xTR-pGL3p and 6xKSPS-pGL3p). After testing the full length TBX1 protein in pCDNA3.1, at varying concentrations, we found that 100 ng of the expression vector yielded the highest fold change (27 fold) when compared to empty pCDNA3.1 vector (data not shown). Based upon this, we used 100 ng of the Tbx1-pCDNA3.1 construct for all subsequent luciferase assays. The consensus sites and mutated versions of these sites (AGGTGTGA to AATTTTGA) [31], were simultaneously evaluated in the same experiment. As a control, for each binding site reporter construct, the Tbx1-pCDNA3.1 transfection was compared to the simultaneous transfection of empty pCDNA3.1 vector. The mutated 6xTR-pGL3p showed a dramatic decrease in transcriptional activation when compared to the wild-type (WT) reporter construct (Fig. 2B). The 6xKSPS-pGL3p also showed activation in the presence of Tbx1-pCDNA3.1 and this activation was only partially disrupted when the binding site was mutated (Fig. 2B).

Surrounding Nucleotides Outside the Half Site are Crucial For Binding
To demonstrate the importance of the 39 half of the TBX1-1/2 SPS site and to further define essential nucleotides for binding, we generated various mutations of position 8 (P8), 11 (P11) and 13 (P13) as these nucleotides seemed to vary the most when comparing complex intensities on gel shift assays (data not shown) (Fig. 3A). When the nucleotide at P8 was mutated from an ART, binding of the lower, main protein-DNA complex (complex 1; Fig. 1A) was abolished. In contrast, the upper less prominent protein-DNA complex in the gel (complex 2; Figure 1A) remained unchanged (Fig. 3A). Mutation at P11 from a GRC nucleotide resulted in reduction of the upper protein-DNA complex (Fig. 3A). When the P13 nucleotide was mutated from a GRT, binding of GST-TBX1 was lost (Fig. 3A). Effects of these three mutations in the 2 nd half-site demonstrate the importance of these nucleotides in binding. We then proceeded to test these mutations in luciferase reporter assays in cell culture. We generated luciferase reporter constructs containing six copies of the mutated KSPS at P8, P11 and P13. For all three, we observed reduced activation when compared to the WT KSPS consensus sequence. We concluded that the surrounding nucleotides are necessary for the activation of the reporter in tissue culture ( Fig. 3B; statistical values in Figure  legend).

Mutations in TBX1 Lead to a Decrease in Activation
Human mutations in TBX1 have been previously identified in a subset of patients with VCFS/DGS but with no deletion. These mutations, including F148Y, H194Q and G310S, were previously tested in transcription reporter assays in cell culture using the Brachyury palindrome sequence to determine whether they . For comparison, the palindromic T-site was also tested but binding was very weak as compared to the newly identified binding sites and was only observed after extensive overexposure of the autoradiogram (not shown). B: Luciferase reporter constructs containing 6 copies (6x) of the TR and 6x of the KSPS, respectively, were co-transfected with full length Tbx1-pCDNA3.1 and compared to the empty pCDNA3.1 transfection to determine if TBX1 could activate transcription of a reporter via these sites. A significant increase of luciferase activity was observed in the presence of full-length TBX1 for both the 6x TR and the 6xKSPS when compared to transfection of the empty pCDNA3.1 vector (TR: 29 fold; Students t test, *p, 0.001; KSPS: 5.6 fold; Students t-test, *p,0.02). The mutations analyzed were those previously tested in a half site where AGGTGTGA was mutated to AATTTTGA [31]. When these nucleotide changes were present in the TR, there was a dramatic decrease in activation by Tbx1-pCDNA3.1 (7.4 fold; Students t test, ¤p,0.001). The same mutation in the 6xKSPS construct did not show a significant change when compared to the normal KSPS (n.s. not significant). All data are presented as means 6SD; n$3. doi:10.1371/journal.pone.0095151.g002 TBX1 DNA Binding Site PLOS ONE | www.plosone.org altered transcription [19][20][21]. Increase in transcriptional activation with F148Y, H194Q and G310S mutations versus WT TBX1 using Jeg3 cells was previously observed [21]. To further test this using the new consensus sequences, we generated the same point mutations in GST-TBX1 and evaluated their DNA-binding and transcriptional activation in cell culture (Fig. 4A, 4B). Protein-DNA complexes were formed at the same position as for the WT protein (shift 1; Fig. 1A). Binding to DNA was similar with both consensus sequence probes as determined by EMSA (Fig. 4C and  4D). As for the wild-type protein, mutant TBX1 proteins could not bind to the half site (data not shown). In addition, luciferase reporter assays were carried out to determine if these mutations in TBX1 could lead to a change in the activation of the reporter constructs harboring the 6xTR or 6xKSPS. Cells were initially transfected with the test construct (WT or mutated TBX1) and the reporter construct. Mutated protein activation values were compared to the WT protein values. We observed a statistically significant decrease in activation in the presence of two mutations, F148Y and G310S; more dramatically with the F148Y mutation, which showed no activation when compared to WT values (statistical values are presented in the Figure legend). Interestingly, we did not observe any change in activation when we tested the H194Q mutation. Because these are heterozygous mutations in human patients, a reporter assay was carried out to test whether adding in one wild type copy of Tbx1 would suppress the effect of the mutated allele. Jeg3 cells were co-transfected with 50 ng of Tbx1-pCDNA3.1 and 50 ng of either F148Y, H194Q or G310S Tbx1-pCDNA3.1 as we described earlier. Although there was a slight increase in activation in the presence of the WT protein with the G310S mutant protein, these values were still lower for the F148Y mutations when compared to the WT protein. Again, there was no change observed when the WT TBX1 protein was cotransfected with the H194Q mutant protein. (Fig. 4E). Because these recombinant proteins are not endogenously expressed in Jeg3 cells, we examined whether the ectopic TBX1 proteins were localized to the nucleus and not in the cytoplasm. Immunofluorescence was performed to visualize this set of four proteins and their nuclear localization was confirmed, using DAPI as a nuclear stain (Fig. 4F). We concluded that transfection conditions that mimic TBX1 haploinsufficiency due to the F148Y mutation resulted in reduced activation of both reporter constructs.

In Silico Genome-Wide Screen for T-sites in the Mouse Genome
A series of bioinformatic approaches were undertaken to identify potential direct downstream transcriptional target genes by examining annotated mouse genome sequence data (UCSC genome browser, mm9). The first screen was done to detect binding sites in blocks of evolutionarily conserved sequences. A total of 235,414 sequences matching half sites were found. Among them, 12,659 (5.4%) half sites were found to overlap with conserved elements (Fig. 5A) These were then assigned to the nearest RefSeq genes within +/2100 kb distance of transcriptional start sites (TSS). . Surrounding nucleotides outside of the K Site are crucial for binding. A: EMSA was performed on mutated sequences that were generated in the second K site to test if variation at position, P8 (ART), P11 (GRC) and P13 (GRT) affects binding. When P8 was changed, the faster migrating binding conformation was lost. When P11 was changed most of the slower migrating binding conformation was lost. Almost all binding was completely lost when P13 was changed. B: Luciferase reporter assays were performed to test their effect on transcription. Constructs harboring 6 copies of the mutated binding site (mutated at either P8, 11 or 13, respectively) were co-transfected with the full length Tbx1 gene. Mutation of these nucleotides affected luciferase activation. Data are presented as means 6 SD; n$3. Student's t-test, *p,0.02. doi:10.1371/journal.pone.0095151.g003 The first screen identified a total of 187 with matches to the half site consensus sequence of AGGTG(T/C)(G/T)A within highly conserved elements (Logarithm of the Odds Score-LOD.500) (Motifs and corresponding gene names can be found in Table S1). We found 425 motifs within moderately conserved elements (LOD score of 200-500) ( Table S1). These could be putative binding sites for any T-box gene. We then searched the half sites that contained the second partial site, to identify KSPSs, which would be more selective for binding of TBX1. We also examined whether any of the half sites had a second direct tandem repeat. None of the sites within evolutionarily conserved blocks from the search above had a second direct repeat (TR). Therefore, a second bioinformatic screen was done to search for TR sites anywhere in the genome, irrespective of evolutionary conservation (Fig. 5B). A total of 302 TR sites were found throughought the genome (Table S2). We searched for gene ontology groups for all of the genes harboring putative T-half sites and TR sites. Most of the groups that were identified were those involved in embryonic develpmental processes and mRNA transcription regulation (Fig. 5C,D) [46,47]. We then examined each gene for their known function or expression pattern, using literature and the MGI JAX database (www.informatics.jax.org) to ascertain whether any could be a putative TBX1 downstream transcriptional target (data not shown).
Candidate TBX1 binding sites near genes with known expression patterns in mouse embryos similar to that of Tbx1 or with phenotypes similar to that in Tbx1 -/mutant embryos were of particular interest to pursue. To further narrow the list of possible genes regulated by Tbx1, we focused on those containing KSPS or TR sites. We first checked whether the expression of any were altered in previous gene expression arrays in experiments where Tbx1 +/+ versus Tbx1 -/embryonic tissues were compared [26,28]. Twenty-seven genes were initially selected to determine if GST-TBX1 could bind.
Electrophoretic mobility shift assays were performed on motifs near candidate downstream target gene loci to determine if GST-TBX1 could bind to them. GST-TBX1 formed protein-DNA complexes with 19 of the 27 motifs with three distinct intensities of protein-DNA complexes referred to as high (similar to TBX1-TR), medium and low (Table S3). The position of the complexes in the gel were all the same, suggesting similar binding conformations. Reporter constructs were generated to include the motif to be tested with approximately 200 bp flanking either side. Of the 19 motifs tested for binding, we chose three of the strongest binding candidates for additional studies: Fgf8, Bmper and the Otog-MyoD locus.
The Fgf8 gene encodes a secreted fibroblast growth factor (FGF) that is required for craniofacial and heart development [45]. Relevant to Tbx1, a genetic interaction between Fgf8 and Tbx1 has been found [29,45]. The Fgf8 locus has a K SPS located 4 kb downstream of the transcriptional stop site (Fig. 6A). This site falls in a highly evolutionarily conserved sequence block, across mammals and vertebrates. Interestingly this is a known Fgf8 regulatory region for somite and tail bud mRNA expression, conserved from zebrafish to mouse [48,49]. GST-TBX1 was able to bind to the KSPS motif (Fig. 6B) and this was at a similar intensity as compared to the consensus KSPS (AGGTGT-GATCGCGTCAT) (data not shown). As expected from the EMSA, the transcription reporter assay in Jeg3 cells showed activation at a level similar to the K SPS consensus (5 fold change) (Fig. 6C). Whole mount RNA in situ hybridization comparing Tbx1 +/+ and Tbx1 -/embryos at E10.5 shows a decrease in expression of Fgf8 in the pharyngeal arch endoderm (Fig. 6D) as previously reported [29,50]. We did not detect a change in somite or tail bud expression, suggesting possible functional redundancy with other T-box genes with similar expression patterns.
The Bmper gene encodes a secreted protein that inhibits bone morphogenetic protein (BMP) function. The Bmper locus has a TR binding site in the intron lying between exons 13 and 14, sharing evolutionary conservation only with rat and opossum (Fig. 7A). GST-TBX1 was able to bind strongly to the 40 bp element harboring the motif (Fig. 7B) and this was similar to that of the TR consensus (data not shown). There was a small 1.8 fold increase in transcription activation in the presence of TBX1 (Fig. 7C). We suggest that the small fold activation here compared to the experiments using the 6x TR consensus sequence, could be due to the fact that we used a 400 bp sequence element that might harbor inhibitory sites surrounding the single TBX1 binding site. Bmper expression is lost in part of the first pharyngeal arch in Tbx1 -/embryos at E10.5, and expression in the inner ear is altered as well (Fig. 7D) suggesting that it could be a direct downstream transcriptional target.
The third site that was evaluated was a TR site in the Otog gene in intron 53 (of a total of 56 exons). The Otog gene encodes an Nglycosylated protein present in the acellular membranes of the sensory epithelia patches of the inner ear, important for hearing [51]. Otog and MyoD are neighboring genes, however the TR motif within the regions tested in EMSA and luciferase assays, is 70 kb from the MyoD TSS (Fig. 8A). The GST-TBX1 fusion protein can bind to the Otog-MyoD sequence block containing the TR motif, and it was competed with unlabeled DNA of the same sequence. Although binding appeared to be strong by EMSA, transcription was only activated 2.5 fold (Fig. 8B and 8C). The MyoD gene, encoding a basic helix-loop-helix myogenic regulatory transcription factor lies adjacent to Otog. We were not able to generate a specific probe for in situ hybridization analysis of Otog. As has been reported, RNA expression of MyoD is lost in the first pharyngeal arch core mesoderm in Tbx1 -/null mutant embryos [52] (Fig. 8D). We conclude that this endogenous TR site is a possible candidate for MyoD regulation by TBX1. after IPTG induction. C, D: EMSAs of the WT and three mutated proteins using the radiolabeled TR (C) and KSPS motifs (D). E: Luciferase assays were performed after co-transfecting the reporter construct (6x TR or 6x K SPS) with WT or each mutated full length Tbx1-pCDNA3.1 construct. The F148Y mutation led to a decrease in activation (TR-17 fold decrease, ¤p,0.0005; KSPS-4.5 fold decrease, ¤p,0.0003) when compared to the WT transfection. The H194Q mutation did not lead to a statistically significant change in activation but the trend was in the direction of decreased activity (not significant-n.s). The G310S mutation led to a smaller but still significant decrease in activation (TR-2.2 fold decrease, ¤p,0.01; KSPS-1.3 fold decrease, ¤p,0.05). Equal amounts of WT and mutated Tbx1 was co-transfected with the respective reporter constructs to determine if there was suppression of the mutant phenotype. There was a slight increase in activation when compared to the mutated F148Y alone transfection (TR-4 fold, N p,0.006; KSPS-1.9 fold, N p,0.006). Under these new conditions, the F148Y mutated TBX1 with WT protein still showed reduced activation when compared to WT TBX1 (TR-*p,0.0003; KSPS-*p,0.006). The H194Q+WT combination did not show any significant change (TR-p,0.4; K SPS-p, 0.1). The G310S+WT combination showed a significant increase in activation (TR-2 fold, N p,0.001; KSPS 1.5 fold, N p,0.02) when compared to G310S mutant alone. All data are presented as means 6SD; n$3. p-values were determined using the Student's t-test. F: Immunoflourescence experiments were performed with antibodies to TBX1 on transfected Jeg3 cells to valdiate that the mutated constructs were localized to the nucleus (green). Nuclear localization was confirmed by observing expression in DAPI stained nuclei shown in blue. doi:10.1371/journal.pone.0095151.g004 TBX1 DNA Binding Site PLOS ONE | www.plosone.org

Discussion
The T-box family of transcription factors is important in vertebrate development and human disease. The preferential binding site of a number of T-box proteins, including Brachyury, TBX2, TBX5, TBX6, TBX15 and TBX18 were previously identified by taking either in vitro or in vivo approaches [4,12,[14][15][16]. Most can bind as monomers to a Brachyury consensus half site, or as dimers to a palindrome, while few can also bind to a tandem repeat. In this study we carried out an in vitro selection method (SELEX) to identify the preferential binding site of mammalian TBX1. We found two classes of binding sites; a perfect direct repeat (TR), consisting of two classic Brachyury halfsites and a second, imperfect direct repeat (KSPS), in which the 59 site is similar to the Brachyury half site, but the 39 half is different. One important possibility is that there are differences in the amino acid constitution of TBX1 that confers a different binding preference as some of the other T-box proteins. For example, TBX1 appears to bind strongly to the TR, but weakly to the Brachyury palindrome and not at all to the half site motif. Although the DNA binding domain is highly conserved amongst different T-box proteins, some differences may contribute to specificity of binding and the sequences might affect the orientation in which various T-box proteins bind to DNA. Interestingly, a few amino acids that are important for Brachyury dimers to bind to the palindrome are not conserved in TBX1 [20,53]. This may explain the difference in binding preference. Using the Brachyury crystal structure, amino acids important for both the dimerization and DNA binding have been mapped [53]. Six amino acids important for binding and dimerization, distributed throughout the protein, are different between Brachyury and TBX1. Three of these amino acids are important for dimerization (M87D, N131A, F132K) and three important for DNA binding (K103R, K151N, A216G; Brachyury to TBX1 cid change respectively). Perhaps, these differences at crucial positions leads to a secondary structure conformational change allowing TBX1 molecules to bind preferably in a head to tail orientation. The TBX1 protein and DNA crystal structure has been published, but this was done so using the palindromic Brachyury binding site [54]. This group found that two TBX1 proteins can bind as monomers to the palindromic sequence. In our gel shift assays, TBX1 and Brachyury formed similar sized protein-DNA complexes (data not shown), suggesting that TBX1 might bind as a dimer to the TBX1 TR, since it consists of two half-sites. Now that the TBX1 TR has been identified as the preferential binding site, a new crystal structure might lead to further understanding of key residues of TBX1 required for binding to DNA.
We found some inconsistencies between apparent binding affinities to DNA in EMSAs versus transcriptional activity in luciferase assays in cell culture. For example, although the P13 mutation in the KSPS led to the greatest loss of binding in vitro, it had the least effect on transcription (2.3 fold decrease). One possible reason is that we used only the DNA binding domain of TBX1 for EMSAs but used the full-length protein for luciferase assays. There are multiple examples where there is a lack of direct correlation between relative binding affinity and transcriptional activation of a cis-acting motif [55]. For example, the ETS-1 DNA binding domain (DBD) undergoes minimal secondary structural changes in the presence of DNA, but the full length protein binding to DNA does induce changes in secondary structure at a distance from the protein-DNA interference [56]. Glucocorticoid receptor binding also affects structure and activity of the protein on DNA where stronger activating downstream sites bound equally in gel shift assays as those more weakly activated in luciferase assays [57]. It was also noted that changing even one nucleotide in the binding sequence could affect the binding and transcriptional activation. DBDs are not only important for protein-DNA interactions, but for protein-protein interactions as well. Perhaps GST-TBX1, in the presence the P13 binding site, has a more open interface allowing it to interact with other cofactors that provide for a more stable activation of transcription as opposed to the P8 or P11 nucleotide changes.

Mutations in TBX1
Previous studies have tested whether TBX1 mutations have an impact on transcriptional regulation of reporter constructs, but these used palindromic sequences as the binding motif, but these had conflicting results [20,21]. Since the binding consensus sites identified in the SELEX assay had a roughly, 20 fold increase in binding and transcription activation by TBX1, we reasoned that it would provide a more sensitive indicator of any change in binding or transcription by missense changes in TBX1. Previous studies did not perform any in vitro binding assays to DNA. We found that the F148Y, H194Q and G310S mutant proteins could strongly bind to the two consensus sequences we identified (TR and KSPS sites).
In contrast to what has been previously reported, where the F148Y and G310S mutations showed no effect on transcription [19] or the three (F148Y, H194Q, G310S) showed an increase in transcription of reporters using the palindrome site [21], we found a decrease in reporter gene activation, in particular for the F148Y mutation. This suggests that activation or repression might be DNA binding motif-specific. One possible explanation of this difference with previous reports [19][20][21] is that the new consensus sequence(s) provides a greater sensitivity in measuring changes in the mutant proteins. Based on the published TBX1 crystal structure, F148 and H194 are neither involved with DNA binding or dimer formation, but it was noted that F148 lies at the surface of the protein [54]. The authors explain that this residue then may have an effect on protein-protein interactions with other co-factors necessary for transcription to occur [54]. This coincides with our data in which the F148Y mutation does not affect binding to DNA (Figs. 4C, D), but does lead to a loss of activation in reporter assays (Fig. 4E).
In addition to understanding the effects of mutations on gene function, one major goal is to identify direct transcriptional target genes required for embryonic development. Using various bioinformatic selection methods, we identified and validated DNA binding to 19 different motifs present in gene loci of interest, and confirmed transcriptional activation for 11 of them (data not shown), including Fgf8, Bmper and Otog-MyoD.

Fgf8
The pharyngeal apparatus is an embryonic structure that becomes remodeled to form the face, neck and cardiac outflow tract [58]. Tbx1 is expressed in mouse embryos in the endoderm and mesoderm of the pharyngeal arches as well as the ectoderm of the distal pharyngeal apparatus, with some localized expression in the somites [26,45]. Tbx1 and Fgf8 are coexpressed in the pharyngeal endoderm and a subset of the pharyngeal mesoderm and they genetically interact in mouse embryos, implicating them in the same genetic pathway [29] [50]. An evolutionarily conserved element has been identified downstream of the Fgf8 gene locus and drives expression of a reporter in forebrain, somites and tail bud but not the pharyngeal apparatus [48]. Tbx1 and Fgf8 are also expressed in the somites and presomitic mesoderm (PSM). However, inactivation of Tbx1 does not result in loss of Fgf8 expression in these tissues nor does it affect development of these structures. The most parsimonious explanation is that Tbx1 acts redundantly with other T-box genes, upstream of Fgf8. There are several T-box genes expressed in the somites and tail bud [59][60][61] and one of these may in fact regulate transcription, possibly Brachyury, which can also bind to direct repeats or TBX6 whose preferential binding site has some resemblence to the KSPS [15].

Bmper, Otog-MyoD and transcription regulation
Although many known regulatory regions show evolutionary conservation, not all follow this pattern. We found putative TBX1 protein binding sites in the Bmper and Otog-MyoD gene loci, however, they are not in regions of high evolutionary conservation. There have been a number of reports examining regulatory regions that are not in conserved elements. For example, many of the p300 sites found by ChIP-Seq (chromatin immunoprecipitation followed by next-generation sequencing) were not in evolutionarily conserved blocks, however, they did drive expres- sion of reporters in vivo [62]. The same was true for PHOX2B direct downstream target genes found in zebrafish [63]. Recent investigation of specific binding by liver transcription factors in five vertebrate species have shown that occupancy of a small minority (10%-22%) of the binding sites on DNA is conserved among mammalian species [64]. Non-conserved, biologically functional enhancers have also been identified upstream of pax9 and otx1b in zebrafish [65]. Changes in transcriptional programs through the changes within non-conserved regions are supposed to drive evolution [66]. Therefore, we suggest that the TR sites warrant careful investigation. To this regard, we analyzed such sites in the Bmper and the Otog-MyoD loci.
The core mesoderm of the pharyngeal arches form the muscles of the craniofacial region and neck, required for chewing and swallowing [58]. We found that the Bmper gene, encoding a BMP antagonist [67][68][69] is strongly expressed in the central core mesoderm region of the pharyngeal arches. The first pharyngeal arch forms, but the distal arches do not form in Tbx1 -/embryos. We found that Bmper mRNA expression is lost in the first pharyngeal arch in Tbx1 -/embryos.
Similar to Bmper, another gene, MyoD is also expressed in the core mesoderm of the pharyngeal arches and it is also reduced in expression in Tbx1 -/embryos. Inactivation of MyoD and other basic helix-loop-helix regulatory transcription factors, results in loss of craniofacial muscle formation [70,71]. Similarly, inactivation of Tbx1 results in loss of development of craniofacial muscles [52,72]. The motif we identified is within the Otog gene body, but near the 39 end of the gene. Otogelin (Otog) encodes a glycoprotein present in the acellular gelatinouses structures covering the sensory epithelia of the inner ear [51,73]. It is known that Otog is expressed in the inner ear as early as E10. Mutations in Otog lead to autosomal-recessive sensorineural nonsyndromic hearing loss, showcasing the importance of this protein in inner ear development and hearing [74]. Unfortunately, we were not able to generate an RNA anti-sense probe to Otog to determine if it is colocalized withTbx1. On the other hand, it is possible that this site is important for regulation of MyoD expression in vivo, that these sequences are important for regulation of Otog, or neither. Nearby enhancers within neighboring genes have been found to regulate genes at a distance. This is the case for Dlx5, with two enhancers being exons of the neighboring gene Dync1/1 [75]. Only future ChIP and in vivo reporter assays in transient transgenic mouse models can validate this hypothesis. In conclusion, we used an in vitro SELEX selection process to identify two novel TBX1 consensus sequences, the TBX1-TR and the TBX1-KSPS. We found that TBX1 can activate reporter constructs harboring the newly identified binding sites in tissue culture. In addition, we have also demonstrated that in the presence of the F148Y human mutation in TBX1, activation of reporter constructs was strongly diminished. This was only possible having a highly active consensus site for transcription reporter assays in cell culture. Finally, as a prelude to future ChIP-seq and other biochemical studies, we provide an in silico list of possible direct downstream target genes, some of which may be biologically relevant to TBX1 function, such as Fgf8, Bmper and Otog-MyoD. Figure S1 Cloning of GST-Tbx1 (T-box) Construct. A:

Supporting Information
The T-box region (amino acids 90-303) of mouse Tbx1 was PCR amplified from cDNA with flanking EcoRI and XhoI restriction enzyme sties. These sites were used to subclone the fragment into the pGEX4t3 vector (GE Healthcare) to generate a GST-TBX1 fusion protein. B: GST-TBX1 was detected via western blot with an a GST antibody, with an approximate molecular weight of 52 kD. C: EMSA with recombinant GST-TBX1 (90-303) binds to published palindromic Brachyury palindrome motif [12]. Protein dilution, 1:1 and exposure time, 6 hrs. Probe: CTAGATTTCA-CACCTAGGTGTGAAATCTAG. (TIF) Figure S2 Testing the binding specificity of TBX1 binding motifs. Gradients of both poly dI-dC (left) and specific cold-competitor (right). As the concentration of the poly dI-dC increased (0.5-2 mg), the binding decreased in intensity but the creation of protein-DNA complexes still occurred at the same position. Increasing amounts of specific cold-competitor (25x-200x) was used to demonstrate the specificity of binding. (TIF) Table S1 Sites and nearest genes with T half site. The sites listed below are half-sites that lie in evolutionarily conserved regions across the mouse genome (mm9). (XLS) Table S2 Sites and nearest genes with tandem repeat site. The sites listed below are TR sites across the mouse genome (mm9), irrespective of evolutionary conservation. (XLSX) Table S3 TR and K SPS endogenous sites in the mouse genome. The sites listed above are the endogenous sites that were bound by GST-TBX1 in EMSA experiments. Listed as well is the distance to the TSS and comparative gel shift band intensity. *Gel shift bands were compared to the binding of TBX1 to the TR TBX1 site (High). (TIF)