Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Targeted Resequencing and Analysis of the Diamond-Blackfan Anemia Disease Locus RPS19

  • Alvaro Martinez Barrio ,

    Contributed equally to this work with: Alvaro Martinez Barrio, Oskar Eriksson

    Affiliation The Linnaeus Centre for Bioinformatics Uppsala University/Swedish University of Agricultural Sciences, Uppsala University, Uppsala, Sweden

  • Oskar Eriksson ,

    Contributed equally to this work with: Alvaro Martinez Barrio, Oskar Eriksson

    Current address: Department of Medical Science, Uppsala University, Uppsala, Sweden

    Affiliation Department of Genetics and Pathology, The Rudbeck Laboratory, Uppsala University, Uppsala, Sweden

  • Jitendra Badhai,

    Affiliation Department of Genetics and Pathology, The Rudbeck Laboratory, Uppsala University, Uppsala, Sweden

  • Anne-Sophie Fröjmark,

    Affiliation Department of Genetics and Pathology, The Rudbeck Laboratory, Uppsala University, Uppsala, Sweden

  • Erik Bongcam-Rudloff,

    Affiliations The Linnaeus Centre for Bioinformatics Uppsala University/Swedish University of Agricultural Sciences, Uppsala University, Uppsala, Sweden, Department of Animal Breeding and Genetics, Uppsala University, Uppsala, Sweden

  • Niklas Dahl,

    Affiliation Department of Genetics and Pathology, The Rudbeck Laboratory, Uppsala University, Uppsala, Sweden

  • Jens Schuster

    jens.schuster@genpat.uu.se

    Affiliation Department of Genetics and Pathology, The Rudbeck Laboratory, Uppsala University, Uppsala, Sweden

Targeted Resequencing and Analysis of the Diamond-Blackfan Anemia Disease Locus RPS19

  • Alvaro Martinez Barrio, 
  • Oskar Eriksson, 
  • Jitendra Badhai, 
  • Anne-Sophie Fröjmark, 
  • Erik Bongcam-Rudloff, 
  • Niklas Dahl, 
  • Jens Schuster
PLOS
x

Abstract

Background

The Ribosomal protein S19 gene locus (RPS19) has been linked to two kinds of red cell aplasia, Diamond-Blackfan Anemia (DBA) and Transient Erythroblastopenia in Childhood (TEC). Mutations in RPS19 coding sequences have been found in 25% of DBA patients, but not in TEC patients. It has been suggested that non-coding RPS19 sequence variants contribute to the considerable clinical variability in red cell aplasia. We therefore aimed at identifying non-coding variations associated with DBA or TEC phenotypes.

Methodology/Principal Findings

We targeted a region of 19'980 bp encompassing the RPS19 gene in a cohort of 89 DBA and TEC patients for resequencing. We provide here a catalog of the considerable, previously unrecognized degree of variation in this region. We identified 73 variations (65 SNPs, 8 indels) that all are located outside of the RPS19 open reading frame, and of which 67.1% are classified as novel. We hypothesize that specific alleles in non-coding regions of RPS19 could alter the binding of regulatory proteins or transcription factors. Therefore, we carried out an extensive analysis to identify transcription factor binding sites (TFBS). A series of putative interaction sites coincide with detected variants. Sixteen of the corresponding transcription factors are of particular interest, as they are housekeeping genes or show a direct link to hematopoiesis, tumorigenesis or leukemia (e.g. GATA-1/2, PU.1, MZF-1).

Conclusions

Specific alleles at predicted TFBSs may alter the expression of RPS19, modify an important interaction between transcription factors with overlapping TFBS or remove an important stimulus for hematopoiesis. We suggest that the detected interactions are of importance for hematopoiesis and could provide new insights into individual response to treatment.

Introduction

Diamond Blackfan Anemia (DBA) is a congenital pure red cell aplasia (OMIM 205900) typically presenting within the first year of life [1], [2]. The gene encoding ribosomal protein S19 (RPS19) [3], [4] has been shown to be mutated in 25% of DBA patients [5]. Recently, mutations in several other ribosomal protein genes have been identified in approximately 10% of DBA patients [6][9].

Transient erythroblastopenia of childhood (TEC; OMIM 227050) is a transient red cell aplasia with clinical similarities to DBA [10], [11]. Linkage analysis has indicated an association between TEC and the region encompassing RPS19 but no structural mutations have been identified so far [12].

Until now, more than 70 RPS19 mutations have been reported in DBA patients [5]. Mutations are spread out over the entire gene, including non-sense and mis-sense mutations as well as deletions and insertions. DBA is characterized by a marked clinical heterogeneity without correlations to any specific mutation [2], [13]. At least 30% of DBA patients respond to steroid treatment and patients carrying RPS19 mutations display a poorer response [5]. The marked clinical heterogeneity strongly implies the involvement of genetic and/or environmental tissue specific modulators [2]. It has been suggested that the red cell aplasia is caused by ribosomal protein haploinsufficiency. Consequently, the expression level of a specific ribosomal protein becomes critical for the disease. The expression may be influenced by non-coding variations resulting in a decrease in the amount of protein available below a critical threshold [14]. Moreover, the clinical variability associated with a mutation in a specific structural ribosomal protein gene may be related to non-coding variants on the non-mutant allele. Studies so far have focused on protein coding parts and no detailed catalog of non-coding genetic variation is available. The identification of putative regulatory sequence elements in non-coding regions (such as transcription factor binding sites) is therefore of importance for future research. Many transcription factors (TFs) have been implicated in human disorders, for example HNF4alpha in diabetes [15], [16], USF1 in familial combined hyperlipidemia [17] and AP2alpha in cleft palate [18].

We therefore aimed at identifying non-coding variations that are associated with the DBA or TEC phenotypes. Here, we report on the targeted resequencing of the entire RPS19 locus in 77 DBA and 12 TEC patients not carrying a mutation in exons of the RPS19 gene and we provide a catalog of the genetic variations identified. Furthermore, we searched the entire region for putative transcription factor binding sites (TFBS) some of which are presumably altered by the variations identified. We suggest that gene variants at TFBSs influence the expression of RPS19 with a resulting effect on disease pattern and response to treatment. These findings are important to clarify the regulation of RPS19 gene expression and for our understanding of the pathobiology behind DBA.

Results

Genetic catalog of the RPS19 locus

We initially targeted a region of 14.7 kbp on human chromosome 19 (chr19:47,053,043–47,068,080), encompassing RPS19 and 2 kbp of flanking region (Accession numbers: RSG_JCVI|RPS19-004110_004111-C_G, RSG_JCVI|RPS19-005767_005771-D_CTAA). Variations were assigned in all individuals to provide a genetic map of the RPS19 locus. In addition, we analyzed a region upstream of the initial sequencing effort in a subset of patients (chr19:47,048,100–47,053,043); both analyzed regions together comprised of 19'980 bp (figure 1). We detected a total of 73 variations of which 65 were single nucleotide polymorphisms (SNPs; 89.1%) and 8 were insertion/deletion variants (indels; 10.9%). Forty-three SNPs (66.2%) and 6 indels (75.0%) were not previously described and could be classified as novel (figure 1 and table 1; SNPs identified in this study are referred to as “novel” throughout this report - their subsequently assigned database identifiers are listed in table 1). Altogether, 49 of the variations identified (67.1%) were novel. One of the novel indels overlaps with a known SNP (rs725332; table 1). All variants are located outside of the protein coding sequence of RPS19. Interestingly, the density of variations (SNP or indel) is one per 273.7 bp (3.6 variations kbp−1), including one SNP per 293.8 bp (3.4 SNPs kbp−1). This is in contrast to the expected density of one SNP per 1.9 to 2.18 kbp−1 that has been estimated for human chromosome 19 in previous studies [19], [20]. Several of the detected SNPs show a high frequency within the patient material (table 1). However, a considerable number of variations (30 out of 73) show frequencies of less than 1% and the prevalence of these “private” or rare variations is high, compared to previous estimates of 7% [21].

thumbnail
Figure 1. Schematic view of the RPS19 locus on chromosome 19.

The genomic region targeted by the resequencing analysis (chr19:47'048'100–47'068'200) is shown as a snap shot using the UCSC Genome Browser (http://www.genome.ucsc.edu/). Amplicons and mammalian conservation are indicated, as well as detected variations (novel and known Polymorphisms, respectively). SNPs contained in dbSNP (version 128) are shown next to our reported variants. The six exons of the RPS19 gene and 3′-end of the DMRTC2 gene located upstream are shown in grey. A more detailed view presenting all available information compiled on the targeted region is shown in supplementary figure S2. The whole analyzed region encompasses 19'980 kbp. (A) Detailed picture of the overlapping TFBSs and functional data extracted from EnsEMBL and Transfac (supplementary text S2) next to discovered variation in the upstream area towards the DMRCT2 gene. A more detailed section (1) describes the multiple alignment of a region comprising 476 bp upstream of the RPS19 Start codon (ATG) located in the second exon. For the 7 species selected, five different SNPs (in red with red arrows pointing the SNP position in the sequence), a transcription start site (TSS) from the Fantom database (presented as an arrow indicating transcriptional direction), several interesting TFBSs overlapping highly conserved SNPs (in blue with blue stars indicating important positions; table 2), and the four highly conserved regions reported by DaCosta et al. (containing the detected n-Myc motif) are highlighted.

http://dx.doi.org/10.1371/journal.pone.0006172.g001

thumbnail
Table 1. Variation detected in the DBA/TEC patient cohort within the resequenced region on chromosome 19.

http://dx.doi.org/10.1371/journal.pone.0006172.t001

Comparative genomic sequence analysis of the RPS19 locus

Human RPS19 has homologs in eukaryotes and archaebacteria but no eubacterial counterparts [22]. RPS19 is a component of the 40S subunit of the ribosome, which is important for regulation of translation of mRNAs into polypeptides [23]. From all the mammalian sequences available, we selected those assembled into chromosomes with high coverage and lack of gaps in the targeted genomic region, assuring gene synteny and that the original structure of the human RPS19 gene is conserved (i.e. number and order of exons). We obtained and aligned syntenic genomic regions of 200 kb around the orthologous RPS19 gene from 6 species (mouse, dog, cow, orangutan, macaque and chimpanzee; supplementary figure S1 and supplementary text S1). Infocon [24] identified a total of 161 blocks of high information content (BHICs) with highly conserved multi-species alignments within the 200 kb region. They averaged 20 bp in size and their distribution in protein coding, non-coding and untranslated regions is shown in supplementary figure S2. A high information content block is a cluster of conservation between species where the alignment contains information for every species represented. Because this alignment is so highly conserved at almost every position, the consensus sequence for each BHIC defaults to the reference genome used in our alignment. 12 SNPs were contained in BHICs, 10 of them are detailed in supplementary table S1. If we consider the conservation of the polymorphic nucleotide, 7 of them are totally conserved across species (novel-12, novel-14, novel-15, novel-17, novel-18, rs2075749, rs2075750), 3 present a miss-match in one of the species (novel-13, mouse; novel-40, dog; rs1366610, mouse) and another 3 present two miss-matches (novel-3, cow-dog; novel-42, cow-mouse; rs930102, human-cow). With this analysis, we discovered that in many cases the human variation is found across species. Additionally, we downloaded the 29-way eutherian mammals Enredo-Pecan-Ortheus (EPO) alignment track containing ultra-conserved elements from the EnsEMBL database and compared this to our alignment results (supplementary figure S1). From nine conserved elements defined as EPO, five were entirely contained in our 7-way species alignment. Surprisingly, two were not contained at all. None of the novel SNPs was contained in a defined EPO region.

All information obtained in our report and from external resources was converted into *.gff files in an effort towards improving the annotation of the RPS19 locus (*.gff files can be imported to the UCSC genome browser for visualization and are compiled in supplementary text S2; see also figure S2).

Identified putative transcription factor binding sites that superimpose with novel SNPs

We used the resulting multi-species alignment to analyze whether any of the detected variation marks out any sequence element important for regulation by modulators or regulating factors. We searched selected genomic regions for putative transcription factor binding sites (TFBS) focusing on regions with a high degree of conservation (figure 1). Our aim was to identify whether any of the detected SNPs coincide with predicted TFBS. A number of detected variations fall within putative TFBS (supplementary table S1). Additionally, to further narrow down the number of identified TFBS, we asked whether the identified TFBS are likely to be functionally relevant for adaptations in expression of RPS19 protein in the context of DBA/TEC. We searched the literature for association of the corresponding transcription factors to general transcription, tumorigenesis and hematopoiesis. Sixteen different transcription factors (GATA-1 and -2, CDC5, Ebox, HOXA3, MSX-1, MZF1, PAX-2, -5 and -6, PBX-1, PPARalpha, PPARgamma, PU.1, SP1, YY1) are of particular interest with possible link to the DBA and TEC phenotypes (e.g. important for hematopoiesis, implicated in cancer development, strong general transcription factor). The putative TFBS coincide with 23 of the detected variants (table 2). The corresponding transcription factors bind to 15 possible sites upstream of the RPS19 coding region (TFBS encompassing 19 SNPs), at one position within the second intron (one detected SNP) and to three sites located in intron 4 (three SNPs). In some cases, the TFBS was independently identified by two different tools/databases (e.g. PBX-1 binding site in upstream region; table 2).

Comparative analysis of the proximal promoter region of RPS19

Previous studies have tried to identify the promoter region of RPS19. Our study confirms that the RPS19 promoter region shares typical features with other mammalian ribosomal protein genes (e.g. absence of a canonical TATA-box). We also detected an accompanying non-consensus CCATT-box 72 to 83 bp upstream of the transcription start site commonly described in public databases (according to EnsEMBL). Three different transcription start sites (TSS) have been described (BC018616; BC000023; D28389) which differ only in the length of the 5′UTR of the resulting mRNA. We have observed additional 5′UTR variants differing in length between 33 to 467 nucleotides (unpublished data). The observed spread of the TSS together with the absence of a canonical TATA-box classify the RPS19 promoter as “broad type promoter” according to Sandelin and colleagues [25]. Interestingly, the TSS stretch encompasses regions important for expression of RPS19 described previously by DaCosta et al. (figure 1) [26]. The authors identified regions of high conservation between mouse and human in the putative promoter region of RPS19, regions we also detect in our analysis (figure 1A). They predicted a promoter sequence and subsequently showed that the predicted promoter sequence and one of the conserved regions are important for expression of a reporter construct [26].

We did not detect any variation in the 1.5 kb proximal region upstream of the first exon. This region is additionally characterized by a high degree of conservation (figure 1_1). These findings underscore the importance of this region for expression of RPS19. DaCosta and colleagues identified several putative TFBS within the upstream region of RPS19 [26]. We checked whether we could reproduce their predictions of TFBS. In several cases, we obtained an even finer consensus motif (e.g. n-Myc, figure 1). In their study, DaCosta and coworkers define a strong C-Rel/Rel-A site that coincides with an NF-κB site. This is not surprising, because NF-κB is known to bind C-Rel in transcriptional regulatory systems [27]. Our matrix used in this detection is actually capable of detecting such a consensus site but cataloged it as NF-κB. Finally, we detected the SP1 motif in the first conserved region and an SP1 instead of the CACCC-Bf binding site [26]. This particular SP1 binding site (CCACCC) has been described as a regulatory switch element that stimulates SP1/GATA1 cooperation, and the consensus sequence is similar to the CACCC-Bf sequence [28].

Discussion

Association studies take advantage of the known variations throughout the human genome including SNPs and microsatellites. It has been suggested that the identification of all the potential risk-conferring variations within one disease associated gene is important for appropriate genotype-phenotype correlations [29], [30]. Targeted resequencing studies are therefore an important step that may provide detailed catalogs of genomic variations to further studies of the mechanisms underlying diseases and pharmacogenetic responses [31]. We focused our efforts on the resequencing analysis of the RPS19 gene locus, a region that has been linked to two forms of anemia, namely Diamond-Blackfan Anemia (DBA) and Transient Erythroblastopenia of childhood (TEC) [12], [32]. Our initial goal was to identify non-coding polymorphisms that are associated with either disease.

In the present resequencing study, we show a considerable and previously unrecognized variation within the RPS19 gene locus. Estimates have approximated the degree of variation for chromosome 19 to 1 SNP in about 2 kbp of sequence [19], [21]. We show here that the genetic variation at the RPS19 locus in our patient cohort is significantly higher, with 1 SNP per 294 bp, and provide a catalog of additional variations associated with DBA and TEC. A large proportion of the presented variations consists of “private” SNPs. Interestingly, independent resequencing studies of e.g. the innate immunity genes and the APO gene cluster have also detected an unexpectedly high degree of variation [20], [29]. The high degree of variation in this study and the fact that RPS19 seems to play a central role in a large proportion of DBA patients suggest that regulatory networks altered by one or the other SNP may have implications for RPS19 expression.

However, our results revealed no clear correlation between any of the identified SNPs and either of the DBA or TEC phenotypes. Linkage analyses have previously indicated co-segregation of the two disorders suggesting they are allelic variants [4], [12], [32]. This lack of phenotype-genotype correlation may indicate that there exists an as yet unidentified sequence element in this region responsible for the regulation of RPS19 expression. Indeed, it has been described that regulatory elements may be situated far away from the actual gene. Mutations in such elements have previously been implicated in human diseases [33]. Alternatively, the observed linkage of this region to TEC patients is not due to mutations in RPS19, but to a different gene within the 1 Mbp region described previously [4]. Although the region contains a number of genes, no other ribosomal protein gene is located within this 19q13.2 region and no candidate gene of known relevance for erythropoiesis could be identified.

Consequently, we hypothesized that mutations in non-coding regions of RPS19 could disrupt the binding of regulatory proteins. We aimed to identify new regulatory modulators and carried out a bioinformatics analysis of the locus to identify putative transcription factor binding sites (TFBS). As a result, we obtained a catalog of variations within our patient cohort and we provide a map of putative transcription factor binding sites (table 2 and supplementary table S1). Several of the corresponding transcription factors (TF) are of particular interest. Ten of the identified TFs are ubiquitously or widely expressed (i.e. general transcription factors) and important for regulation of development, cell cycle and cell division, and cell plasticity (Cell division control protein 5 (CDC5); Homeobox cluster protein A3 (HOXA3); Msh-like homebox protein 1 (MSX-1); Paired box transcription factors (PAX); Peroxisome proliferator-activated receptor (PPARalpha and gamma); SP1 transcription factor (SP1); YY1 transcription factor (YY1)) [34][36]. Variations in the TFBS of these factors could possibly lead to altered transcriptional activity of the RPS19 gene, as has been described previously for transcription factors and the Ebox module [37][40]. A marked reduction in the transcriptional activity of RPS19 may have effects similar to that observed for haploinsufficiency. Strikingly, the non-reference allele of one SNP (rs3214574) deletes the putative binding site for CDC5. Instead, a strong TFBS for GATA-1/2 is created. This suggests a significant change for tissue specific expression. Moreover, the general TFs as well as the Ebox binding site could be pivotal for cellular response to extracellular stimuli and this may explain individual response to treatment or endogenous cytokines [41][43]. Furthermore, several of these general transcription factors have been shown to play a role in cell proliferation and tumorigenesis for which ribosomal protein genes are essential [44][47].

Six TFBS identified in our study are even more interesting (GATA binding proteins 1 and 2 (GATA-1 and GATA-2); Myeloid zinc finger 1 (MZF-1); Pre-B-cell leukemia homeobox 1 (PBX-1); hematopoietic transcription factor PU.1 (PU.1); Ebox binding site). They belong to factors with a direct link to hematopoiesis [48][53]. Several of these factors are involved in the progression to leukemia and they are essential for normal hematopoiesis (e.g. PU.1). We speculate that these factors may play a crucial role in the transcription of RPS19 during hematopoiesis, and alterations in the respective TFBS could lead to diminished RPS19 expression. This might render erythroid precursors to be less capable to proliferate, which has been suggested as a mechanism underlying DBA in patients with mutations in the coding sequence of RPS19 [14]. On the other hand, alterations in TFBS could also lead to increased levels of RPS19 and in the best case promote remission which is seen to occur spontaneously.

Another possible mechanism is that specific alleles in SNPs overlapping with the TFBS for PU.1, GATA-1/2 or PBX-1 might be of importance for the development of hematopoietic stem cells by altering their capacity of self-renewal, expansion and quiescence [50], [53], [54]. These factors are candidates in the mechanism underlying a block in erythroblast expansion and differentiation in DBA patients [55].

In summary, we report here on the considerable individual variation detected in our resequencing study of the disease locus RPS19 in DBA and TEC patients. Furthermore, we identified a series of transcription factors putatively involved in the regulation of RPS19 expression and implicated in the pathobiology of DBA and TEC. Functional follow-up studies are needed to further investigate the predicted interactions described in this report.

Methods

Ethics Statement

The study was approved by the Regional Ethical Review Board of Uppsala (Diary Number 2006/118). Informed consent of patients or their parents was obtained and has been documented in the patient files by the responsible clinician following routines approved by the Regional Ethics Board and according to Swedish legislation.

Patient cohort

We analyzed DNA prepared from peripheral blood of 77 DBA and 12 TEC patients of Caucasian origin. Patients included in the study were excluded by sequencing to carry a structural mutation in RPS17, RPS19 or RPS24, respectively (Primer sequences available on request). Most of the patients are sporadic cases, except for 10 of the DBA patients and eight of the TEC patients who previously showed association with this genomic region. All patients were ascertained by hematologists of their country for criteria for DBA or TEC, respectively, and have been described previously [4], [12], [32].

Resequencing

Resequencing was carried out as described (METHOD-A; dbSNP (http://www.ncbi.nlm.nih.gov/SNP/)). Additional sequence analysis was performed by sequencing standard PCR products (from approximately 2 µg genomic DNA) in both directions on an ABI PRISM® 3700 DNA Analyzer (AppliedBiosystems) according to manufacturer's protocol and using Sequencher® Programme for analysis of the resulting sequences. Primer sequences are listed in supplementary table S2.

Comparative genomic sequence analysis

200 kb of the human genome sequence around the RPS19 gene locus (hg18; chr19:47,048,239–47,068,000) as well as orthologous sequences for six mammalian species in different orders (rodents [mouse], canines [dog], ungulates [cow], primates [orangutan, macaque and chimpanzee]) were retrieved from EnsEMBL (http://www.ensembl.org/Homo_sapiens/) [56], assuring gene synteny was conserved and gaps were not extensive (supplementary text S1 and supplementary figure S1). MultiPipMaker aligned these orthologous sequences with the ‘single coverage’ option to eliminate matches caused by duplications and the ‘search both strand’ option [57]. The identified multi-species conserved sequences were analyzed by virtue of the Infocon program [24]. Infocon identifies blocks of high information content (BHIC) in parts of the alignment and optionally calculates a consensus sequence in each block. A BHIC is a cluster of conservation between species in which the alignment contains information for every species considered for the alignment.

In order to obtain a more informative alignment of the mammalian clade, the multi-species alignment EPO track was downloaded from EnsEMBL containing elements conserved along 29 eutherian mammals and subsequently converted into a *.gff file (supplementary texts S1 and S2). For a description of the *.gff file format see http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml.

Prediction of transcription factor binding motifs

Sequences of the whole human RPS19 upstream gene region (hg18; chr19:47,048,239–47,056,684) as well as 2nd and 4th introns (chr19:47,056,756–47,057,020 and chr19:47,065,125–47,065,608, respectively) and the 6 mammalian species already mentioned were subjected to a transcription factor binding sites (TFBS) detection with help of various programs: MotifScanner, MotifLocator, MotifSampler, MATCH and pMATCH.

These programs identify over-represented motifs in a sequence data set and annotate putative binding sites consulting libraries of position weight matrices (PWMs). PWMs represent the intrinsic sequence variability of TFBS in the form of a matrix. Each matrix stores the frequency for each nucleotide at every position of the putative motif in order to summarize the alignment information for the TF with a binding site. The libraries of PWMs used in this study were Jaspar [58] and TRANSFAC Professional (releases 7.0 (public), 11.2 and 12.1) [59]. The professional version of TRANSFAC requires licensing.

MotifScanner uses different orders of markov chains as background model for matching PWMs. A parameter called a ‘prior’ assigns an a priori probability of TFs binding to a distinct sequence. In MotifLocator the ‘prior’ is substituted by a posterior threshold for filtering matching PWMs. The resulting scores, in an absolute scale without log correction, represent the likelihood ratio of a certain PWM match versus a random match. MotifSampler [60] implements a stochastic model of a Gibbs sampler to detect “over-represented” motifs not matching any known PWM. We always used the default parameters and third order vertebrate and human models (Eukaryotic Promoter Database and dbTSS) for MotifScanner and MotifLocator when searching against TRANSFAC and Jaspar libraries. All these programs are contained in a workbench for regulatory sequence analysis called TOUCAN [61].

MATCH [62] and pMATCH [63] are closely interconnected with the TRANSFAC database. MATCH was executed to minimize the false positives error (minFP) to guarantee specificity, whilst pMATCH was selected with minimization for FP and combined with the sum of both false positive and false negative errors (minSUM) to increase sensitivity.

There are other recognition tools for promoter and regulatory motif analysis available: RSAT (Universite Libre de Bruxelles), TESS (University of Pennsylvania), TSSG/TSSW (Baylor College of Medicine), MatInspector (Genomatix Gmbh), SiteSeer (University of Manchester), AliBaba2 (BioBase Gmbh), FUNSITE (ICG), Footprinter (University of Washington). However, we used the tools best integrated with the libraries used in the study.

Additionally, several Perl scripts were created to perform data analysis in the suitable tools, parse TFBS annotations into *.gff files and filter overlapping SNPs within selected TFBS. All available annotations of the locus were manually formatted into *.gff files (supplementary text S2). Images of the RPS19 locus containing variations, conserved areas, annotations and TFBS provided within this report were created using the UCSC Genome Browser (http://www.genome.ucsc.edu/) [64].

Supporting Information

Figure S1.

Detailed view of the analyzed region. Every *.gff file available has been imported into the UCSC Genome Browser (Kuhn et al, 2009) for visualisation (see supplementary text S2).

doi:10.1371/journal.pone.0006172.s001

(1.71 MB PDF)

Figure S2.

Gene structures of the orthologous RPS19 loci of the species selected for comparative analysis taken from EnsEMBL (Hubbard et al, 2009).

doi:10.1371/journal.pone.0006172.s002

(1.09 MB PDF)

Table S1.

Full list of all detected TFBS overlaying identified variations. a location of the SNP/TFBS with respect to the RPS19 open reading frame b database identifier (dbSNP) for previously described SNPs or number of novel SNP, respectively. We complement this field with a small registry of which TFBSs are created (+) or destroyed (−) in a certain SNP (the actual SNP if not stated) and the detection score for this binding site if applicable. c SNP alleles, indicating reference and alternative (replace) allele d name of transcription factor or motif recognized, as named in the databases e library, version of the library and identifier to which a TFBS is associated under a PWM database release f motif recognized (5′ to 3′ sequence). Allele positioning marked in bold. Alleles falling exactly adjacent to the end of a motif highlighted in bold and italics. Capital letters highlight important positions for the putative binding strength of a motif. Chromosome, start, end position and strand within the human reference genome (hg18, build 36.1) g score for the reference genome allele. MATCH and pMATCH use 1.000 as maximum score between an optimal binding site match and matrix power of detection. TOUCAN detection tools (MotifScanner, MotifLocator, MotifSampler) do not use global maximum or matrix scoring, but the higher the numbers, the better the predicted site. The a priori probability or threshold is stated under the score when any of the TOUCAN tools has been used h score for non-reference allele i program used for detection of a motif. Error minimization criteria stated when applicable j transcript factor contained in any of the multi-conserved sequence (MCS) region of the multi-species alignment. The first value belongs to the Infocon program and the second to the EPO EnsEMBL track. ‘Y’ indicates that the TFBS motif is totally contained in the area (see supplementary figure S1 and text S2), ‘y+’ that there is major overlapping part (>75%), ‘y’ that there is significant overlapping part (<75%, >50%), ‘y−’ only if a minor part overlaps (<50%) and ‘N’ indicates an inexistent overlap

doi:10.1371/journal.pone.0006172.s003

(0.16 MB PDF)

Table S2.

Primer sequences for analysis of 5′upstream region.

doi:10.1371/journal.pone.0006172.s004

(0.16 MB PDF)

Text S1.

Sequences collected for the bioinformatic analyses (in fasta format).

doi:10.1371/journal.pone.0006172.s005

(0.07 MB TXT)

Text S2.

Compressed archive with files used in the study for visualization in the UCSC Genome Browser (in *.gff format)

doi:10.1371/journal.pone.0006172.s006

(0.16 MB ZIP)

Acknowledgments

We thank all the patients. We acknowledge C. Bäcklin for assistance with figures and thank G. Andersson for fruitful discussions.

Author Contributions

Conceived and designed the experiments: AMB OE EBR JS. Performed the experiments: AMB OE. Analyzed the data: AMB OE EBR ND JS. Contributed reagents/materials/analysis tools: JB ASF. Wrote the paper: JS.

References

  1. 1. Vlachos A, Ball S, Dahl N, Alter BP, Sheth S, et al. (2008) Diagnosing and treating Diamond Blackfan anaemia: results of an international clinical consensus conference. Br J Haematol.
  2. 2. Ellis SR, Lipton JM (2008) Chapter 8 diamond blackfan anemia: a disorder of red blood cell development. Curr Top Dev Biol 82: 217–241.
  3. 3. Draptchinskaia N, Gustavsson P, Andersson B, Pettersson M, Willig TN, et al. (1999) The gene encoding ribosomal protein S19 is mutated in Diamond-Blackfan anaemia. Nat Genet 21: 169–175.
  4. 4. Gustavsson P, Garelli E, Draptchinskaia N, Ball S, Willig TN, et al. (1998) Identification of microdeletions spanning the Diamond-Blackfan anemia locus on 19q13 and evidence for genetic heterogeneity. Am J Hum Genet 63: 1388–1395.
  5. 5. Campagnoli MF, Ramenghi U, Armiraglio M, Quarello P, Garelli E, et al. (2008) RPS19 mutations in patients with Diamond-Blackfan anemia. Hum Mutat.
  6. 6. Farrar JE, Nater M, Caywood E, McDevitt MA, Kowalski J, et al. (2008) Abnormalities of the large ribosomal subunit protein, Rpl35a, in Diamond-Blackfan anemia. Blood 112: 1582–1592.
  7. 7. Gazda HT, Sheen MR, Vlachos A, Choesmel V, O'Donohue MF, et al. (2008) Ribosomal protein L5 and L11 mutations are associated with cleft palate and abnormal thumbs in Diamond-Blackfan anemia patients. Am J Hum Genet 83: 769–780.
  8. 8. Cmejla R, Cmejlova J, Handrkova H, Petrak J, Pospisilova D (2007) Ribosomal protein S17 gene (RPS17) is mutated in Diamond-Blackfan anemia. Hum Mutat.
  9. 9. Gazda HT, Grabowska A, Merida-Long LB, Latawiec E, Schneider HE, et al. (2006) Ribosomal protein S24 gene is mutated in Diamond-Blackfan anemia. Am J Hum Genet 79: 1110–1118.
  10. 10. Alter BP (1996) Aplastic Anemia, Pediatric Aspects. Oncologist 1: 361–366.
  11. 11. Skeppner G, Wranne L (1993) Transient erythroblastopenia of childhood in Sweden: incidence and findings at the time of diagnosis. Acta Paediatr 82: 574–578.
  12. 12. Gustavsson P, Klar J, Matsson H, Forestier E, Henter JI, et al. (2002) Familial transient erythroblastopenia of childhood is associated with the chromosome 19q13.2 region but not caused by mutations in coding sequences of the ribosomal protein S19 (RPS19) gene. Br J Haematol 119: 261–264.
  13. 13. Willig TN, Draptchinskaia N, Dianzani I, Ball S, Niemeyer C, et al. (1999) Mutations in ribosomal protein S19 gene and diamond blackfan anemia: wide variations in phenotypic expression. Blood 94: 4294–4306.
  14. 14. Ellis SR, Massey AT (2006) Diamond Blackfan anemia: A paradigm for a ribosome-based disease. Med Hypotheses 66: 643–648.
  15. 15. Todd JA (1996) Transcribing diabetes. Nature 384: 407–408.
  16. 16. Silander K, Mohlke KL, Scott LJ, Peck EC, Hollstein P, et al. (2004) Genetic variation near the hepatocyte nuclear factor-4 alpha gene predicts susceptibility to type 2 diabetes. Diabetes 53: 1141–1149.
  17. 17. Shoulders CC (2004) USF1 on trial. Nat Genet 36: 322–323.
  18. 18. Rahimov F, Marazita ML, Visel A, Cooper ME, Hitchler MJ, et al. (2008) Disruption of an AP-2alpha binding site in an IRF6 enhancer is associated with cleft lip. Nat Genet 40: 1341–1347.
  19. 19. Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, et al. (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 928–933.
  20. 20. Bairagya BB, Bhattacharya P, Bhattacharya SK, Dey B, Dey U, et al. (2008) Genetic variation and haplotype structures of innate immunity genes in eastern India. Infect Genet Evol 8: 360–366.
  21. 21. Miller RD, Phillips MS, Jo I, Donaldson MA, Studebaker JF, et al. (2005) High-density single-nucleotide polymorphism maps of the human genome. Genomics 86: 117–126.
  22. 22. Dresios J, Panopoulos P, Synetos D (2006) Eukaryotic ribosomal proteins lacking a eubacterial counterpart: important players in ribosomal function. Mol Microbiol 59: 1651–1663.
  23. 23. Mauro VP, Edelman GM (2002) The ribosome filter hypothesis. Proc Natl Acad Sci U S A 99: 12031–12036.
  24. 24. Stojanovic N, Florea L, Riemer C, Gumucio D, Slightom J, et al. (1999) Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions. Nucleic Acids Res 27: 3899–3910.
  25. 25. Sandelin A, Carninci P, Lenhard B, Ponjavic J, Hayashizaki Y, et al. (2007) Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat Rev Genet 8: 424–436.
  26. 26. Da Costa L, Narla G, Willig TN, Peters LL, Parra M, et al. (2003) Ribosomal protein S19 expression during erythroid differentiation. Blood 101: 318–324.
  27. 27. Grilli M, Chiu JJ, Lenardo MJ (1993) NF-kappa B and Rel: participants in a multiform transcriptional regulatory system. Int Rev Cytol 143: 1–62.
  28. 28. Fischer KD, Haese A, Nowock J (1993) Cooperation of GATA-1 and Sp1 can result in synergistic transcriptional activation or interference. J Biol Chem 268: 23915–23923.
  29. 29. Fullerton SM, Buchanan AV, Sonpar VA, Taylor SL, Smith JD, et al. (2004) The effects of scale: variation in the APOA1/C3/A4/A5 gene cluster. Hum Genet 115: 36–56.
  30. 30. Neale BM, Sham PC (2004) The future of association studies: gene-based analysis and replication. Am J Hum Genet 75: 353–362.
  31. 31. Wood LD, Parsons DW, Jones S, Lin J, Sjoblom T, et al. (2007) The genomic landscapes of human breast and colorectal cancers. Science 318: 1108–1113.
  32. 32. Gustavsson P, Willing TN, van Haeringen A, Tchernia G, Dianzani I, et al. (1997) Diamond-Blackfan anaemia: genetic homogeneity for a gene on chromosome 19q13 restricted to 1.8 Mb. Nat Genet 16: 368–371.
  33. 33. Enattah NS, Sahi T, Savilahti E, Terwilliger JD, Peltonen L, et al. (2002) Identification of a variant associated with adult-type hypolactasia. Nat Genet 30: 233–237.
  34. 34. Blake JA, Thomas M, Thompson JA, White R, Ziman M (2008) Perplexing Pax: from puzzle to paradigm. Dev Dyn 237: 2791–2803.
  35. 35. Hall BK, Miyake T (1995) Divide, accumulate, differentiate: cell condensation in skeletal development revisited. Int J Dev Biol 39: 881–893.
  36. 36. Pearson JC, Lemons D, McGinnis W (2005) Modulating Hox gene functions during animal body patterning. Nat Rev Genet 6: 893–904.
  37. 37. Chaudhary J, Skinner MK (1999) Basic helix-loop-helix proteins can act at the E-box within the serum response element of the c-fos promoter to influence hormone-induced promoter activation in Sertoli cells. Mol Endocrinol 13: 774–786.
  38. 38. Larsson L, Johansson P, Jansson A, Donati M, Rymo L, et al. (2008) The Sp1 transcription factor binds to the G-allele of the -1087 IL-10 gene polymorphism and enhances transcriptional activation. Genes Immun.
  39. 39. Gingras ME, Masson-Gadais B, Zaniolo K, Leclerc S, Drouin R, et al. (2009) Differential binding of the transcription factors Sp1, AP-1, and NFI to the promoter of the human alpha5 integrin gene dictates its transcriptional activity. Invest Ophthalmol Vis Sci 50: 57–67.
  40. 40. Sekido R, Murai K, Funahashi J, Kamachi Y, Fujisawa-Sehara A, et al. (1994) The delta-crystallin enhancer-binding protein delta EF1 is a repressor of E2-box-mediated gene activation. Mol Cell Biol 14: 5692–5700.
  41. 41. Miltenberger RJ, Sukow KA, Farnham PJ (1995) An E-box-mediated increase in cad transcription at the G1/S-phase boundary is suppressed by inhibitory c-Myc mutants. Mol Cell Biol 15: 2527–2535.
  42. 42. Solomon SS, Majumdar G, Martinez-Hernandez A, Raghow R (2008) A critical role of Sp1 transcription factor in regulating gene expression in response to insulin and other hormones. Life Sci 83: 305–312.
  43. 43. Kronke G, Kadl A, Ikonomu E, Bluml S, Furnkranz A, et al. (2007) Expression of heme oxygenase-1 in human vascular cells is regulated by peroxisome proliferator-activated receptors. Arterioscler Thromb Vasc Biol 27: 1276–1282.
  44. 44. Ruggero D, Pandolfi PP (2003) Does the ribosome translate cancer? Nat Rev Cancer 3: 179–192.
  45. 45. Pozzi A, Ibanez MR, Gatica AE, Yang S, Wei S, et al. (2007) Peroxisomal proliferator-activated receptor-alpha-dependent inhibition of endothelial cell proliferation and tumorigenesis. J Biol Chem 282: 17685–17695.
  46. 46. Gordon S, Akopyan G, Garban H, Bonavida B (2006) Transcription factor YY1: structure, function, and therapeutic implications in cancer biology. Oncogene 25: 1125–1142.
  47. 47. Nan H, Qureshi AA, Hunter DJ, Han J (2008) A functional SNP in the MDM2 promoter, pigmentary phenotypes, and risk of skin cancer. Cancer Causes Control.
  48. 48. Ryan DP, Duncan JL, Lee C, Kuchel PW, Matthews JM (2008) Assembly of the oncogenic DNA-binding complex LMO2-Ldb1-TAL1-E12. Proteins 70: 1461–1474.
  49. 49. Kastner P, Chan S (2008) PU.1: a crucial and versatile player in hematopoiesis and leukemia. Int J Biochem Cell Biol 40: 22–27.
  50. 50. Ficara F, Murphy MJ, Lin M, Cleary ML (2008) Pbx1 regulates self-renewal of long-term hematopoietic stem cells by maintaining their quiescence. Cell Stem Cell 2: 484–496.
  51. 51. Hromas R, Davis B, Rauscher FJ 3rd, Klemsz M, Tenen D, et al. (1996) Hematopoietic transcriptional regulation by the myeloid zinc finger gene, MZF-1. Curr Top Microbiol Immunol 211: 159–164.
  52. 52. Steidl U, Steidl C, Ebralidze A, Chapuy B, Han HJ, et al. (2007) A distal single nucleotide polymorphism alters long-range regulation of the PU.1 gene in acute myeloid leukemia. J Clin Invest 117: 2611–2620.
  53. 53. Wickrema A, Crispino JD (2007) Erythroid and megakaryocytic transformation. Oncogene 26: 6803–6815.
  54. 54. Arinobu Y, Mizuno S, Chong Y, Shigematsu H, Iino T, et al. (2007) Reciprocal activation of GATA-1 and PU.1 marks initial specification of hematopoietic stem cells into myeloerythroid and myelolymphoid lineages. Cell Stem Cell 1: 416–427.
  55. 55. Ohene-Abuakwa Y, Orfali KA, Marius C, Ball SE (2005) Two-phase culture in Diamond Blackfan anemia: localization of erythroid defect. Blood 105: 838–846.
  56. 56. Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, et al. (2009) Ensembl 2009. Nucleic Acids Res 37: D690–697.
  57. 57. Schwartz S, Elnitski L, Li M, Weirauch M, Riemer C, et al. (2003) MultiPipMaker and supporting tools: Alignments and analysis of multiple genomic DNA sequences. Nucleic Acids Res 31: 3518–3524.
  58. 58. Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32: D91–94.
  59. 59. Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, et al. (2006) TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34: D108–110.
  60. 60. Thijs G, Lescot M, Marchal K, Rombauts S, De Moor B, et al. (2001) A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17: 1113–1122.
  61. 61. Aerts S, Van Loo P, Thijs G, Mayer H, de Martin R, et al. (2005) TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis. Nucleic Acids Res 33: W393–396.
  62. 62. Kel AE, Gossling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, et al. (2003) MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res 31: 3576–3579.
  63. 63. Chekmenev DS, Haid C, Kel AE (2005) P-Match: transcription factor binding site search by combining patterns and weight matrices. Nucleic Acids Res 33: W432–437.
  64. 64. Kuhn RM, Karolchik D, Zweig AS, Wang T, Smith KE, et al. (2009) The UCSC Genome Browser Database: update 2009. Nucleic Acids Res 37: D755–761.