Genome-Wide Expression Analysis in Down Syndrome: Insight into Immunodeficiency

Down syndrome (DS) is caused by triplication of Human chromosome 21 (Hsa21) and associated with an array of deleterious phenotypes, including mental retardation, heart defects and immunodeficiency. Genome-wide expression patterns of uncultured peripheral blood cells are useful to understanding of DS-associated immune dysfunction. We used a Human Exon microarray to characterize gene expression in uncultured peripheral blood cells derived from DS individuals and age-matched controls from two age groups: neonate (N) and child (C). A total of 174 transcript clusters (gene-level) with eight located on Hsa21 in N group and 383 transcript clusters including 56 on Hsa21 in C group were significantly dysregulated in DS individuals. Microarray data were validated by quantitative polymerase chain reaction. Functional analysis revealed that the dysregulated genes in DS were significantly enriched in two and six KEGG pathways in N and C group, respectively. These pathways included leukocyte trans-endothelial migration, B cell receptor signaling pathway and primary immunodeficiency, etc., which causally implicated dysfunctional immunity in DS. Our results provided a comprehensive picture of gene expression patterns in DS at the two developmental stages and pointed towards candidate genes and molecular pathways potentially associated with the immune dysfunction in DS.


Introduction
Down syndrome (DS; trisomy 21) is characterized by a complete, or occasionally partial, triplication of Hsa21. With an incidence of about one in 750 births [1], DS is the most common autosomal abnormality affecting live-born infants. More than 80 clinical features with variation in number and in severity are reported in DS [2].
DS patients are clinically associated with multiple blood cellrelated phenotypes, including increased risk to develop leukemia, decreased lymphocyte counts, markedly enhanced incidence of autoimmune disorders as well as vulnerability to recurrent bacterial and viral infections [3,4,5,6]. Moreover, pneumonia and other types of respiratory infections are the most common causes of death in DS children and early adults [7]. The abnormalities highlight that DS individuals are very likely associated with intrinsic defects of the immune system [8]. However, the molecular mechanisms by which trisomy 21 leads to the immune system disorders in DS remain poorly investigated. Transcriptome of peripheral blood cells from DS would provide a unique molecular window into immunodeficiency relevant to DS. Several gene-expression studies in T lymphocytes [9] and blood cells [10] from DS patients reported dysregulated expression of some immune-associated genes, yet very small sample size or ageunmatched controls restricted statistical analysis.
Here, we characterized gene-expression in uncultured blood cells from DS and age-matched control samples in neonate and child group. The dysregulated genes in DS at the two developmental stages were identified and showed a comprehensive picture of gene expression patterns. Furthermore, classification of dysregulated genes based on their known functions provided insight into immunodeficiency in DS patients.

Identification of differentially expressed genes between DS and control samples
Using the Affymetrix GeneChip Human Exon 1.0 ST Array containing ,20,000 known human genes, we performed gene expression analysis in DS peripheral blood cells and age-matched controls from N and C group. Of 17,626 core transcript clusters with RefSeq-supported annotation, 13,027 in C group and 13,168 in N group (corresponding to 12,876 and 13,017 well-characterized genes, respectively) were reliably expressed. In RNA-Seq analysis of endothelial progenitor cells from DS and control, 13,144 active RefSeq genes were shared by both the cells [11]. This suggests that the number of the expressed genes in DS and control cells was comparable in the two studies.
To identify differentially expressed genes between DS and controls in each group, we used ANOVA in combination with fold change in gene expression. Consistent with the observations of Prandini et al. [12], gender and DS6gender effects were not significant for expressed Hsa21 genes in DS individuals; however, expression of some non-Hsa21 genes were significantly affected by the two effects in each group. Therefore, the two effects were kept in our analysis. In N group, 174 (1.3%) of the 13,168 transcript clusters, containing 113 (64.9%) up-regulated genes and 61 (35.1%) down-regulated genes, were differentially expressed between DS and controls ( Figure 1A, Table S1). In C group, 383 (2.9%) transcript clusters displayed differential expression between DS and controls. Pearson's correlation test showed no significant results between the child age and expression of the genes (smallest q value was 0.99). Among the 383 transcript clusters, 312 (81.5%) were up-regulated and 71 (18.5%) were down-regulated in DS children ( Figure 1B, Table S2). The higher percentage of up-regulated genes in N and C group was similar to prior report [13]. In the two groups, only 22 dysregulated genes, including six transcript clusters on Hsa21, were shared (Table 1). Elevated expression of the six transcript clusters was also implicated in other DS cells or tissue (Table 1). For the 22 transcript clusters, the extent of fold change between the two groups was significantly correlated (Pearson's cor = 0.92, P = 1.28610 29 ), suggesting their importance in DS phenotypes at the two developmental stages.
Furthermore, the dysregulated genes in each group were divided into four intervals according to fold change ( Figure 2). The genes that were mildly altered in expression, with fold change between 0.5,0.77 or 1.3,2.0, accounted for the majority of the dysregulated genes in each group. The genes whose expression was intensively varied (fold change ,0.5 or .2) were in the minority.

Expression variation of Hsa21 genes in DS
Given trisomy of Hsa21, we assessed expression variation of genes on this chromosome. There are 237 transcript clusters on the array, corresponding to 234 known genes on Hsa21. Of the transcript clusters, 157 (66%) and 146 (62%) were expressed in N and C group, respectively. There were 5.1% (8/157) in N group and 38.3% (56/146) in C group with significant expression difference between DS and age-matched controls ( Figure 1C, D). Of the expressed Hsa21 genes in the two groups, no significantly down-regulated genes was detected in DS samples ( Figure 1C and D), which is consistent with other reports [12,14] and reflects the dosage effects of trisomy. The average expression ratios (DS/ control) of expressed Hsa21 genes were 1.2960.43 (mean 6 sd) in N group and 1.4060.35 in C group, revealing an overall upregulation of Hsa21 genes which is also observed in previous data [12,15,16,17].

QPCR validation
To confirm changes in gene-expression levels detected by the arrays and assess our statistical methods, we conducted approximately 2,500 QPCRs of all 37 cDNA samples from DS and control individuals. We first checked our definitions about a reliably expressed gene. To this end, several genes, whose normalized expression signals on the array in some individuals were close to the threshold according to our definition, were analyzed in the assay. These genes included VAV2 in N group and BTG3, PDGFD and PDGFRB in C group. Their signals, albeit very weak relative to the endogenous genes, were detected by QPCR in all samples, thereby indicating the reasonable threshold defined by our criterions. We next selected a subset of the genes from each group and measured their expression levels. These genes included Hsa21 and non-Hsa21 genes that were statistically up-regulated, unaltered or down-regulated in analysis of the array. DS/control ratio for each gene was calculated and was very consistent with fold change obtained from the array (Table S3, Figure S1), with a Pearson correlation coefficient of 0.91 (P = 2.2610 214 ) ( Figure 3).

Chromosomal distribution of the dysregulated genes
We analyzed chromosomal distribution of the dysregulated genes in N and C group to evaluate effects of trisomy 21 on the whole genome. Of the dysregulated transcript clusters, 326 and 166 were mapped to non-Hsa21 in N and C group, respectively. These data are in agreement with the prior data [13,15,18], providing further evidence for the idea that both expression changes of Hsa21 and non-Hsa21 genes contribute to the etiology of DS. Chromosomal distribution of these dysregulated genes in N group and C group were showed in Figure 4. Figure 5 illustrated the percentage of the dysregulated genes in the expressed genes at individual chromosome and the whole genome level (corresponding to ''all''). Through binomial tests, chromosome 21 in each group was significantly overrepresented with Pc = 0.027 in N group and Pc,10 26 in C group after Benjamini-Hochberg (BH) correction [19]. None of the remaining chromosomes was significantly overrepresented or underrepresented at Pc,0.05.

Hierarchical cluster analysis of the dysregulated genes
In Figure 6, we illustrated the results of hierarchical clustering on the dysregulated transcript clusters in DS. Cluster analysis clearly separated the DS neonates from controls ( Figure 6A). The child samples were divided into two major distinguishable groups, leaving one DS exception (D6) grouped into the control group ( Figure 6B). This exception is not surprising, because although some phenotypes frequently occur in DS patients, the degree to which individual is affected varies. It is likely that the misclassification of the DS child is a reflection of mild phenotypic abnormalities caused by a combination of environmental and genetic variation.
We further asked whether hierarchical clustering could distinguish DS from control individuals based on the dysregulated non-Hsa21 genes. DS neonates were also separated from the matched controls ( Figure S2A), which is similar to the observation that hierarchical clustering distinguishes DS from control fetuses based on the non-Hsa21 altered genes in amniotic fluid cell-free mRNA [18]. Unlike N group, DS and control children were clearly divided into three groups ( Figure S2B). The first group included 11 controls and sample D6 that showed different expression patterns from the other DS samples in Figure 6B. The second group included the remaining 9 DS samples. The third group, including control C1, 3, 5 and 8, seemed to display intermediate patterns between the other two groups. The four controls were also grouped together as shown in Figure 6B, showing less well-defined expression patterns in these samples.

Effects of transcription factors on disruption of non-Hsa21 genes in DS
The dysregulated non-Hsa21 genes in each group demonstrated the pervasive effects of trisomy 21 on the whole genome. One hypothesis suggests that disruption of non-Hsa21 genes in DS is through modulation of transcription factors (TFs) [20]. Therefore, we searched the dysregulated genes in each group for TFs. Expectedly, ten TFs in C group and two in N group were deregulated ( Table 2). We next asked whether targets of the TFs were accordingly altered in expression. We first checked Pax5 which was down-regulated in both of the groups. Six genes, CD19, CD79A, BLNK, EBF1, FCER2 and SPIB, are activated by Pax5 [21,22,23]. Expression of these targets was down-regulated in DS children and neonates, however, only CD79A, BLNK, FCER2 and SPIB were expressed with significant alteration in DS children (Table S6). We further examined expression pattern of the known targets of the other TFs in our array data. A total of 31 genes can be activated or repressed by the TFs (Table S6); nonetheless, expression of the targets was not significantly altered accordingly. The reason for this could be that dysregulation of target gene expression also depends on other deregulated TFs or coactivators. These data indicate that deregulation of these TFs plays limited roles in massive disruption of non-Hsa21 gene expression in DS.

Functional analysis of the dysregulated genes
To explore biological functions of the dysregulated genes in each group, we analyzed the biological process and KEGG pathways through Onto-tools. The significantly enriched GO biological processes and KEGG pathways (Pc,0.05 after BH correction) were showed in Table 3. One GO biological process and two KEGG pathways were found to be enriched in N group. Likewise, one GO biological process and six KEGG pathways were enriched in C group.

Discussion
In present study, we analyzed gene expression difference between uncultured blood cells from DS and controls at the two age stages. A total of 383 and 174 dysregulated transcripts were identified in DS children and neonates, respectively and the array data were validated by approximately 2,500 QPCRs. Using functional profiling analysis, we identified significantly disrupted biological pathways which were relevant to immunodeficiency observed in DS.
The frequency (35.1%) of the down-regulated genes in DS neonates was higher than that (18.5%) in DS children. In amniotic fluid cell-free mRNA [18] and heart tissue [24] from DS fetuses, 46% and 59% of the dysregulated genes showed down-regulation, respectively. However, the reason for these observations is unclear. Given that TFs usually can activate or repress multiple targets, we postulated that the deregulated TFs exclusively in N or C group could contribute to the frequency difference observed in our study. GRHL1 expression was up-regulated exclusively in DS neonates, however, its target, P450scc [25], showed normal expression. Similarly, the known targets of the altered TFs only in DS children were normally expressed (Table S6). The data point to limited contribution of TFs to the frequency difference between N and C   group. Other mechanism responsible for the observations remains further postulated.
In DS neonates, only eight transcript clusters on Hsa21 were significantly up-regulated in expression. The finding is compatible with the previous data that a quite small number of Hsa21 genes are dysregulated in DS fetus samples [16,18,26]. We thus hypothesize that triplication of Hsa21 per se induces a modest dysregulation of Hsa21 genes during early developmental stages. The modest dysregulation might partially account for the phenotypes of DS individuals and mouse models at early stages. It has been reported that the basal forebrain cholinergic system is apparently normal in DS fetuses and infants, based on both neuronal numbers and choline acetyltransferase activity [27,28]. In Ts65Dn mice, the cerebellum has normal size at birth as compared to wild type littermates [29].
We analyzed functions of the 22 common dysregulated genes in both age groups. Among the 22 genes, ITGB2, HLA-DOA, HLA-DOB, Pax5 and CD22 are more associated with the immune system based on published data. ITGB2 encodes b chain of the integrin lymphocyte functional antigen-1 (LFA-1) and its overexpression increases aggregation of DS lymphoblastoid cell lines (LCLs) [30]. HLA-DOA and HLA-DOB form HLA-DO which functions as a modulator of Ag presentation [31]. Their down-regulation in DS might affect efficiency of Ag presentation. Both Pax5 and CD22 were down-regulated in DS children and neonates. Pax5 plays an essential role in B-lineage commitment [32]. Strikingly, Pax5(2/ 2) cells show slower growth, decreased surface IgM expression, and total loss of B cell receptor signaling [33]. CD22-deficient mice exhibits a reduced number of mature B cells in circulation and a significant diminution of surface Ig levels in these B cell subpopulations [34,35]. Deregulation of the five genes suggests their critical roles in immunodeficiency in DS patients.
We further explored biological functions of the dysregulated genes in the two age groups at a genome-wide level. In N group, innate immune response and two KEGG pathways, systemic lupus erythematosus (SLE) and leukocyte trans-endothelial migration, were enriched. The process or pathways are directly linked to the immune system, thereby supporting the hypothesis that the immune system in DS is intrinsically deficient from the very beginning [8]. Leukocyte trans-endothelial migration is vital for immune surveillance and inflammation [36]. In this pathway, Vav2 expression in DS neonates was down-regulated (Table S3). The gene is an activator of Cdc42, Rac1 and RhoA [37] which regulate actin dynamics and gene expression. Down-regulation of Vav2 could disturb leukocyte migration, for knockdown of Vav2 prevents Rac activation in some cells [38].
In C group, one biological process and six pathways were significantly deregulated in PBMCs of DS patients. In the process of leukocyte adhesion, overexpression of ITGB2 and ITGAL whose products form integrin LAF-1 has been reported in LCLs from DS patients and increase adhesiveness of these cells [30]. It is worthy of note that product of ITGB1 can form 11 heterodimeric integrins through pairing with alpha chain of integrins [39]. Up-regulated expression of the gene could have an impact on balance of the associated integrins. The first three pathways are focal adhesion, cell adhesion molecules (CAMs) and regulation of actin cytoskeleton which are implicated in cell growth, survival and mobility [40,41,42]. It has been shown that absolute total lymphocytes are significantly lower in DS children than controls [43,44,45]. Also, T lymphocyte maturation is impaired in DS individuals [46]. Our findings could be used to evaluate these observations associated with DS. The three remaining pathways are B cell receptor signaling pathway, primary immunodeficiency and natural killer cell mediated cytotoxicity. Individuals with DS display increased susceptibility to recurrent infections [4] and significantly reduced IgG2 levels [45]. Moreover, lower NK cytotoxic activity compared to controls has been observed in DS children and adults [47,48]. Disturbance of the three pathways could contribute to immune dysfunction observed in DS.
There are several potential limitations in this work. One limitation is that total RNA was isolated from distinct cell types and cell composition between N and C group. Another is smaller sample size in N group which could reduce statistical power. For a more comprehensive analysis of gene expression in DS, one would need to consider cell type difference and sample size.

Ethics statement
The Ethical Committee of the Chinese National Human Genome Center at Shanghai approved this project for the involvement of human subjects (approval ID: 201201) and parents of all the individuals provided written informed consent.

Samples and groups
All samples, including 15 DS patients and 22 controls, were collected at the Shanghai Children's Medical Centre from peripheral blood of the participants. All DS patients were confirmed by karyotyping. The samples were divided into two groups: N group (5 DS versus 7 control individuals, age: 3 days to 38 days) and C group (10 DS versus 15 control individuals, age: 1 year to 13 years). In each group, age and cell type were matched between DS and controls. Although the mean age of all the DS individuals was less than that of all control individuals, the difference was not statistically significant (p = 0.27 and p = 0.20 for C and N group, respectively, Student's t test) (Table S4).

Total RNA isolation
From the 25 child samples (C group), peripheral blood mononuclear cells (PBMCs) were isolated from peripheral blood using Lympholyte H-H (CEDARLANE), and total RNA was extracted using the mirVana TM miRNA Isolation Kit (Invitrogen)   according to the manufacturer's protocol. The remaining 12 neonate samples (N group) had limited blood volume; therefore, total RNA was extracted from peripheral blood cells (PBCs). RNA concentration and purity was tested using a Nanodrop 2000c spectrophotometer (Thermo Scientific). RNA quality was further assessed with the Agilent 2100 Bioanalyzer using RNA 6000 Nanochips (Agilent Technologies). None of the 37 RNA samples had DNA contamination or RNA degradation. RNA samples were stored at 280uC for hybridization.

Screening of GATA1 mutation in DS samples
DS neonates are at an increased risk of transient leukemia (TL) [49] and as many as 10% of them are affected by transient myeloid disorder [50]. Somatic mutations of GATA1 exon 2 including insertions, deletions, and point mutations are found in almost all cases of DS-associated TL [51,52]. GATA1 encodes hematopoietic transcription factor [52,53] and its mutation in DS samples could affect transcriptomes and lead to an unfair comparison between DS and controls. Therefore, we checked for the presence of GATA1 mutation in DS neonates and children. Genomic DNAs from the DS samples were isolated with QIAamp Blood DNA Mini kits (Qiagen) according to the manufacturer's protocol.  Amplicons including the whole GATA1 exon 2 were obtained by PCR using forward (59-TGTCTGAGGACCCCTTCTGT-39) and reverse (59-AAGCTTCCAGCCATTTCTGA-39) primer and then sequenced. Sequence comparisons were performed by use of programs available from NCBI. However, no mutation of GATA1 exon 2 was found in the DS samples used in this study (Files S 1).

Array hybridization
For each sample, ribosomal RNA was removed from 1 mg of total RNA with the RiboMinus Human/Mouse Transcriptome Isolation kit (Invitrogen) [54,55]. A similar approach for the depletion of abundant rRNA molecules was used in RNA-seq analysis [11,56], which could investigate a plethora of nonpolyadenilated mRNA. cDNA was synthesized using the Gene-Chip WT (Whole Transcripts) cDNA Synthesis and Amplification Kit (Affymetrix) according to manufacturer's instructions. The sense cDNA was fragmented and end labeled by use of the GeneChip WT Terminal Labeling Kit (Affymetrix). Approximately 5.5 mg of biotin-labeled DNA target was hybridized to the Affymetrix GeneChip Human Exon 1.0 ST Array at 45uC for 16 hr following manufacturer's protocol (Affymetrix). Arrays were washed after hybridization, stained on a GeneChip Fluidics Station 450 and scanned on a GCS3000 Scanner (Affymetrix). Raw data were extracted from the scanned images and the Expression Console software (Affymetrix) was used for data analysis.

Normalization and summarization of array hybridization data
Expression Console software (Affymetrix) was used to quantilenormalize the probe fluorescence intensities over all 37 samples with PM-GCBG background correction. An iterative probe logarithmic intensity error (IterPlier) model (http://www. affymetrix.com/support/technical/technotes/plier_technote.pdf) was used to summarize the meta-probe set (representing gene expression) intensities. To stabilize variance, a constant of 16 was added to all probe set intensities, and then the signal values were log 2 transformed. Removing the transcript clusters without any RefSeq-supported annotation, we generated the expression signals of 17,626 transcript clusters with core sets. A transcript cluster was considered to be reliably expressed when the log 2 -transformed expression signal was larger than 6.0 in at least 90% of the all samples in each group [54]. The remaining transcript clusters in each group, which met this criterion as determined by using R software (http://www.r-project.org/), were utilized for further analysis.
Raw microarray data have been deposited with GEO (Accession No. GSE35665).

Identifying differentially expressed genes in each age group
In this study, gender was less well matched between DS and controls in each group. Gender and interaction of DS and gender effects could have an impact on gene expression variation. To identify differentially expressed genes between DS and control individuals in each group which are independent of the two effects, we used analysis of variance (ANOVA) at an individual gene level.
For each gene, the following linear model was used: Where y ijk is the normalized expression of the gene in log2 for ith disease type (DS or control), jth gender (male or female), and kth sample in each group; the symbols D, G and DG represent the fixed effects due to the disease, gender as well as interaction of disease6gender, respectively; the error for each gene for sample ijk is designated as eijk. ANOVA was performed using R software. Multiple testing (BH correction) was performed on P values after ANOVA, no gene in DS neonates and only eight genes in DS children were significantly dysregulated (q,0.05). Thus, we referred to the method of Conti et al. [15] and applied P value after ANOVA and fold change (DS/control) for differentially expressed genes between DS and controls in each group. Specifically, genes with significantly different expression between DS and control individuals met following criteria: P value,0.01 and fold change (DS/control).1.30 (up-regulated) or ,0.77 (down-regulated). Here, fold change is equal to 2 ' delta mean value, which is the error of the mean of log 2 (DS) and mean of log 2 (control). Due to broad age range (from 1 year to 13 years) in C group, we determined correlation of differentially expressed genes with age and assessed whether the variable had effects on gene expression. Pearson's correlation test was performed for age, as described by Lockstone et al [13] and P values were adjusted by use of BH correction.

Quantitative real-time PCR
For validation of expression ratios between DS and control samples obtained from the array, total RNA was reverse transcribed into cDNA using PrimeScript TM RT reagent Kit (TaKaRa). Quantitative real-time PCR (QPCR) was performed on 11 genes from N group and 25 genes from C group using the LightCycler 480 (Roche Diagnostics) or the StepOne Plus (Applied Biosystems) systems with SYBRH Premix Ex Taq TM II (Perfect Real Time) (TaKaRa). Each gene was amplified in replicates of three. The genes and their primer sequences, designed by Primer 3 software [57], were described in Table S5. UBA7 (NCBI Gene ID 7318) and RNF4 (NCBI Gene ID 6047) were used as endogenous control genes [14]. For each sample, we normalized the mean cycle threshold value (C t ) by subtracting the mean of the C t generated from the two reference genes. The same ANOVA as the array analysis above was used to test statistical significance of difference in gene expression between DS and controls.

Chromosomal distribution of significantly dysregulated genes in DS
Distribution of significantly dysregulated genes in DS individuals was tested against the null chromosomal distribution of the expressed transcript clusters in N and C group, respectively. Significant chromosomes were identified with binomial test (Pc,0.05 after BH correction), as described by Zhang et al [54]. STRIPE software was used to plot the chromosomal distribution of the dysregulated transcript clusters in each group [58].

Cluster analysis
For the transcript clusters in each group that were differentially expressed between DS and control samples, Euclidean distance of expression levels was computed and hierarchical clustering was performed using complete-linkage method (the hclust function in the stats package) in R software. Heatmaps were plotted using the heatmap.2 function in the gplots package with the ''scale = 'row''' option set to ''z-score normalize the rows''.

GO and KEGG pathway analysis
Onto-Express and Pathway-Express in Onto-tools [59] were used to identify enriched Gene Ontology (GO) [60] terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) [61] pathways, respectively. Among the differentially expressed genes in each group, GO terms or KEGG pathways that were overrepresented relative to the gene set on the Affymetrix Human Exon 1.0 ST Array were selected (four or more hits [54], hypergeomatric test Pc,0.05 after BH correction ).