Gene Expression Profiling during Early Acute Febrile Stage of Dengue Infection Can Predict the Disease Outcome

Background We report the detailed development of biomarkers to predict the clinical outcome under dengue infection. Transcriptional signatures from purified peripheral blood mononuclear cells were derived from whole-genome gene-expression microarray data, validated by quantitative PCR and tested in independent samples. Methodology/Principal Findings The study was performed on patients of a well-characterized dengue cohort from Recife, Brazil. The samples analyzed were collected prospectively from acute febrile dengue patients who evolved with different degrees of disease severity: classic dengue fever or dengue hemorrhagic fever (DHF) samples were compared with similar samples from other non-dengue febrile illnesses. The DHF samples were collected 2–3 days before the presentation of the plasma leakage symptoms. Differentially-expressed genes were selected by univariate statistical tests as well as multivariate classification techniques. The results showed that at early stages of dengue infection, the genes involved in effector mechanisms of innate immune response presented a weaker activation on patients who later developed hemorrhagic fever, whereas the genes involved in apoptosis were expressed in higher levels. Conclusions/Significance Some of the gene expression signatures displayed estimated accuracy rates of more than 95%, indicating that expression profiling with these signatures may provide a useful means of DHF prognosis at early stages of infection.


Introduction
The dengue virus is a member of Flaviviridae family, genus flavivirus with four antigenically distinct serotypes (DENV-1 to DENV-4). Dengue virus infection is a global public health concern, with an estimated incidence of 50-100 million cases of dengue fever (DF), resulting in 500,000 clinical cases of lifethreatening dengue hemorrhagic fever syndrome (DHF) and 24,000 deaths per year [1]. DHF is characterized by vasculopathy, which results in sudden plasma leakage that reduces the blood volume and may result in hypovolemic shock, known as dengue shock syndrome (DSS). There is no antiviral therapy to treat dengue infection, neither are there means to prevent the development of DHF.
During the acute febrile phase of infection, DF and DHF patients display a very similar clinical picture. However at defervescence (after 4 to 7 days of the beginning of the symptoms), DHF patients start to present signs of circulatory disturbance [2], which makes medical management a major challenge in endemic areas. This is especially true during outbreaks when dengue cases typically over saturates the capacity of all medical points-of-care, and results in shortage on the capacity to attend the regular demand for medical assistance causing major disruptions on the public health systems.
The World Health Organization [2] has established clinical criteria to define DHF cases, but the difficulties of both fulfilling these criteria and early identification of severe cases, make the clinical management of severe forms of the disease even a greater challenge [3,4]. Therefore, a search for new tools to predict patient outcome is essential to facilitate the assessment of the need for medical interventions.
The current concept underlining DHF immunopathology relies on epidemiological evidence indicating an increased risk of developing DHF in secondary dengue infections. This concept led to the identification of biological mechanisms involving antibody mediated enhancement (ADE), mediated by nonneutralizing antibodies [5,6], as well as exacerbated activation of cross-reactive T cell clones [7,8,9,10], both acquired after primary infection. Some of the ''markers'' of dengue severity found on peripheral blood that were correlated with plasma leakage on DHF include several inflammatory cytokines, chemokines and adhesion molecules that are expressed at high levels [11,12,13,14,15,16,17], complement activation products [18,19], increased frequency of activated cells [20,21] and large number of cells undergoing apoptosis [9,22]. All these findings have helped to advance the understanding of the immune mechanisms involved in the development of DHF but none of them were proven to be reliable or useful as biological markers for clinical use.
DNA microarrays have been used as a tool to identify ''signature genes'' and predict successfully, patient outcome for cancer [23,24] as well as for bacterial and viral infections [25,26]. Using this approach, some studies have shown that at the peak of the disease, several genes are differentially expressed in dengue patients presented with the more severe disease [27,28]. Hence, we believe that an early gene expression signature differentiating the mild from the severe clinical outcomes of dengue could be a useful tool for developing biomarkers to predict clinical outcome, which will facilitate the clinical management of dengue infected patients.
Here we report the analysis of gene-expression microarray data of PBMC samples collected from DF and DHF patients during the febrile phase of the disease. These data were used as the basis for the development of reliable biomarkers to predict the clinical outcome of dengue infection.

Dengue Cohort Design and Strategy for Functional Genomic Studies
A cohort of acute febrile patients admitted on three hospitals in the city of Recife, state of Pernambuco, Brazil, was established and described elsewhere [29,30]. Briefly, sequential blood samples were obtained at the day of admission, day 1, and at days 3, 5, 7, 15 and 30. Dengue cases (confirmed by either serology, RT-PCR or virus isolation) were clinically classified according to the WHO criteria into two classes: Dengue Fever (DF) and Dengue Hemorrhagic Fever (DHF) [31]. All participants signed an informed consent. This study, that included several methods suitable for studies related to dengue immunopathology including functional immunomics, was reviewed and approved by ethics committee of Brazilian Ministry of Health CONEP: 4909; Process nu 25000.119007/2002-03; CEP: 68/02. In addition, the Johns Hopkins IRB also reviewed this study as protocol JHM-IRB-3: 03-08-27-01.
The inclusion criterion for this functional genomic study was: All the subjects enrolled had to have at least three medical visits within the first two weeks of study enrollment. The dengue patients had to have confirmed acute dengue 3 infection based on RT-PCR/virus isolation and serology, be febrile at the time of the first hospital visit (temp. above 38.5uC) and with more than 10610 6 PBMCs available for microarray analysis collected at the first visit. Moreover, for DHF group, the samples must be collected prior the onset of circulatory disturbances (hematocrit and levels of serum albumin normal) and no signs of bleeding (tourniquet test negative). Samples had to have clear definition of the clinical outcome of either DF or DHF. A non-dengue group of patients (ND) consisting of individuals with febrile infection of unknown etiology with negative tests results for dengue by RT-PCR, virus isolation and serology after at least 3 samples collected within the first two weeks after enrollment. This group includes suspected dengue cases collected during the same period as the dengue cases, but for which dengue infection was not confirmed through either RT-PCR/virus isolation or serology in at least three blood samples collected at different days. The samples from DF, DHF and ND patients were matched to avoid spurious associations with patient age, gender, dengue infection history and days of symptoms between the groups.
The functional genomic studies were performed on total RNA extracted from PBMC purified from blood samples collected from febrile patients at the time of their first medical visit. The samples selected for this study were collected from 18 confirmed dengue 3, genotype III cases and 8 control samples (ND group). None of the DHF patients presented vasculopathy signs and symptoms at the time the samples used in the functional genomic characterization were collected. At the time of collection the patients referred approximately 5 days of disease and the absence of fever was reported two to three days after enrollment. Among the dengue confirmed cases, 8 patients were characterized as DF and 10 patients were classified as DHF (Table 1).

Sample Processing for Genechip Hybridization
Blood samples from patients enrolled in this study were collected in heparin vacutainer tubes (BD Vacutainer) and within 2 hours from the collection, PBMC samples were separated by gradient density using Ficoll-Paque (Amersham Biosciences) and cryopreserved in 10% (v/v) Dimethyl sulfoxide (DMSO; Sigma-Aldrich) in inactivated fetal bovine sera (FBS; Hyclone). Four million frozen cells were thawed and immediately lysed with Trizol (Invitrogen) for total RNA extraction through chloroform extraction and isopropyl alcohol precipitation following manufacturer's protocol [gene expression using either fresh or frozen PBMC were compared and shown to be similar, (data not shown)]. The total RNA was purified by using the RNeasy MinElute Cleanup Kit (Qiagen) according to the manufacturer protocol and quantified by spectrophotometry at 260 nm and 280 nm (UV spectrum). Total RNA was used for cRNA synthesis through twocycle target labeling kit (Affymetrix) according to the manual manufacturer. Briefly, RNA isolated from the PBMC was biotinlabeled and hybridized to human oligonucleotide microarrays (Affymetrix) by using a two-cycle methods of cDNA synthesis as follow. On the first cycle, first-strand cDNA was prepared by using a T7-(dT) primer and Superscript II reverse transcriptase (Invitrogen) from 10 to 100 ng of cellular RNA. Second strand synthesis was performed by using E. coli DNA polymerase I and the double-stranded cDNA was used for in vitro transcription (IVT) for cRNA amplification by using Megascript T7 kit (Ambion). The product of this first cycle of reaction (cRNA) was used for reverse transcription for synthesis of first and second

Microarray Data Quality Control
The quality of the microarray data was assessed using several criteria: visual inspection, noise/efficiency measurements, spike-in controls, housekeeping gene expression, and RNA degradation plots. Visual inspection based on the high-resolution. DAT files did not reveal smudges or streaks, and the B2-oligo probes (e.g., chessboard patterns around edges and name of array) were also visible. Noise/Efficiency measurements made by the Affymetrix MAS 5.0 software that can be used to evaluate the quality of the arrays are displayed in the Supplement material S1. Noise Q factors, background, scaling factors, and the percentages of present calls were similar across all arrays. Average background values were below 100 for 20 of the 26 microarrays; scaling factors were within three folds of each other, for 25 of the 26 microarrays; and the percentages of present calls were around 40% or higher for 25 of the 26 arrays. None of the six arrays that presented high background had a rate of present calls significantly below 40%. All 39-probe sets and middle probe sets for all 4 spike-in poly-A control genes (dap, thr, phe, lys) were called present in all arrays, as were all probe sets for the four prokaryotic hybridization control genes (bioB, bioC, bioD, cre) and the human housekeeping genes GAPDH (human glyceraldehyde-3-phosphate dehydrogenase) and Beta-Actin. Furthermore, RNA degradation plots showing the average intensity of probes indicate an acceptable levels in all probe sets displayed (see in Supplement material S2).

Quantitative Real Time PCR for Microarray Validation
Four DF/DHF differentially expressed genes (MT2A, PSMB9, C3aR1 and HLA-F) were selected for quantification by quantitative real time PCR (qPCR). Genes were amplified and detected using TaqManH gene expression assays with primers and probes commercially designed (Applied Biosystems). RNA extraction was performed according to the manufacturer's manual for the RNeasy Kit (Qiagen). Total RNA was reverse transcribed to cDNA using SuperScript III First-strand Synthesis System for qPCR (Invitrogen) using random hexamer primers according to the manufacturer's instructions. Quantitative real time PCR was carried out on the ABI PRISM 7500 device (Applied Biosystems). Beta-Actin was selected as endogenous control and all data were normalized against it. The reactions were performed in triplicates and included 2 ml of cDNA, primers (20 mM each) and 6.25 mM of the specific probe or commercially pre-designed Gene Expression Assay Mix (Applied Biosystems), Human Beta-Actin (Applied Biosystems), TaqMan Universal PCR Master Mix (Applied Biosystems) and water added to a final volume of 25 ml. Triplicates of non-template controls (NTC) were included each time qPCR was undertaken. Cycle conditions were as follows: after an initial 2-min hold at 50uC and 10 minutes at 95uC, the samples were cycled 40 times at 95uC for 15 sec and 60uC for 1 min. Baseline and threshold for cycle threshold (Ct) calculation were set manually with Sequence Detection Software version 1.4. The efficiency of amplification (E) of a target molecule was calculated from the slope of the standard curve (plot of Ct versus the negative log10 concentration of the target) derived from the slopes (E = 10ˆ(21/Slope)21). For relative calculation the Delta Ct method was used [ABI PRISMH 7000 Sequence Detection System and Applied Biosystems 7500 Real-Time PCR System -User Bulletin, Applied Biosystems] once all assays met the amplification efficiency criteria of 100%610% [Application Note 127AP05-02]. Samples used in the qPCR assays are described on Table 1  (samples of ND patients and DHF patients number 105 and 112 were not used).

Statistical Analysis
Patient data quality-control, statistical analysis, and plotting were carried out using Affymetrix MAS 5.0 software [32] and the open source R statistical package, version 2.5.0 [33], and add-on libraries, in particular the BioConductor library, version 1.16 [34]. Dendrograms and MDS plots were produced with the R functions ''hclust'' and ''isoMDS'', respectively, whereas heatmaps were obtained with the functions ''hset.emap'' and ''heatmap''. All microarray data is MIAME compliant and the raw data has been deposited in a MIAME compliant database as accession number # GSE18090 and it is available at http://www.ncbi.nlm.nih.gov/geo/query/acc. cgi?token = lpofdqcuugmwsng&acc = GSE18090. P-values corresponding to two-tailed Welch's two-sample t-tests were obtained with the function ''t.test''. Functional category overrepresentation analysis was performed with the performed with the EASE program (Expression Analysis Systematic Explorer) at the DAVID Bioinformatics resource website (http://david.abcc.ncifcrf.gov/) [35]. The Linear Discriminant Analysis classification method implemented by directly estimating the sample means and covariance matrices for each diagnostic class [36]. Classification accuracy was estimated by the method of bolstered resubstitution [37], which displays good properties for gene-set selection in small-sample situations [38,39]

Detection and Variance Filters
After careful quality control analyses of each genechip, Affymetrix MAS 5.0 software was used to analyze the data. From 54,675 gene transcripts on each of the 26 arrays, 12,299 were called present (p-value ,0.04) on all arrays, whereas 20,365 were not called present on any of the arrays. In order to retain only promising genes and enable significant statistical analysis, we chose to analyze the genes that were called present on at least 24 of the 26 arrays used in this study. This filter resulted in a total of 15,848 genes and among those, 3,549 genes (15,848212,299 = 3,549) that were called absent on one or two arrays. Subsequently we applied a variance filter to eliminate constitutive or housekeeping genes, which retained the top 1/8 gene in variance out of 15,848 previous genes, which resulted in 1981 ''best'' genes. We can see that the samples fell into two major groups; the one on the right contains only ND samples, and half (4) of the DF samples, while the group on the left contains all the DHF samples, a couple of ND samples (including one outlier, ND_199), and the other half of the DF samples. This agrees with intuition, since the two most different groups should be ND and DHF, with the DF samples forming an intermediary group. This is confirmed by the 2-D and 3-D multidimensional scaling (MDS) plots for the 1981 selected genes, displayed in Figures 2A and 2B. The dissimilarity measure used was 12r, where r is Pearson correlation of log-transformed values. Arrays were colored according to diagnostic class. It can be seen that the DHF samples constitute a tighter cluster than the ND samples, as expected since ND samples were obtained from patient with fever of unknown etiology, whereas the DF samples could be divided in two groups, one similar to the DHF samples, and another not similar to DHF, and in fact closer in expression to some of the ND samples (these are the two groups marked in Figure 2A). The small stress values (13.52% in the 2-D case and 7.32% in the 3-D case), particularly in the 3D case, mean that the underlying structure of the data is intrinsically a low-dimensional one, which indicates that a small number of signature genes might be able to account for the discriminatory content in the data.

Differential Expression Analysis
Statistical tests of differences between means (Welch two-sample t-tests) were performed to show the most differentially expressed mRNAs among the diagnostic groups. We considered four comparisons among groups: DF vs. DHF samples; Dengue (DF+DHF) vs. ND samples; DF vs. ND samples; DHF vs. ND samples. The first comparison corresponds to the 18 samples that come from dengue patients and is the most important comparison for our purposes, as it discriminates between the severe hemorrhagic form of dengue and the benign one. Important genes used to characterize these two clinical outcomes were gathered on Table 2. The ''volcano plot'' on Figure 3 and the correspondent gene list on Table 3, show that p-values were well correlated to fold changes in DF vs. DHF samples as well as for the other comparisons (Supplement material S3). The largest differences were observed in the comparison of DHF vs. ND, as expected, followed by differences in (DHF+DF) vs. ND and DF vs. ND. The critical differences in gene expression between DF and DHF are the least pronounced. The top 40 genes with the most significant p-values among the 1981 genes previously selected, for each of the four comparisons, along with fold change values, average signal intensity, and annotation from the DAVID Bioinformatics Database at NIAID/NIH (http://david.abcc. ncifcrf.gov/) can be accessed in the Supplement material S4. Figure 4 shows the heatmap expression for the 40 top genes that discriminate the DF and DHF samples, according to the Welch ttest criterion, as well as the dendrogram of hierarch clustering and MDS plot for the 40 top DF-DHF discriminatory genes. Table 4 displays the results of functional overabundance analysis of the list of top 40 DF-DHF discriminatory genes performed with the EASE program (Expression Analysis Systematic Explorer) at the DAVID Bioinformatics resource website. The EASE analysis indicated the enrichment of certain categories of genes. Five categories presented significant scores (p,0.05) after Bonferroni correction for multiple tests, including the ones involved on immune and defense responses, response to biotic stimulus, copper/cadmium binding and copper ion homeostasis. The EASE results for all 4 comparisons can be accessed in the Supplement material S5.

Identification of Classifier Genes
In addition to univariate gene selection by t-tests, we did exhaustive feature selection (all possible combinations) of single, pairs, and triplets of genes out of the prefiltered set of 1981 genes, using Linear Discrimination Analysis as the classification rule, and bolstered resubstitution as the error estimator (see Methods section). Table 5   pairs (the error margins above refer to a 95% confidence interval). Feature selection with two genes and three genes has the potential of ''fetching'' genes that cannot otherwise be found by using univariate methods (such as t-tests). This can be seen from the list of top two-gene classifiers. For example, we can see the gene for the HLA-F (which is lower expressed in DHF than in DF, data not shown). Figure 5 displays the plot of the best 2-gene classifier found by exhaustive feature selection, consisting of the pair of genes PSMB9 and MT2A. The estimated probability of error on future data for this classifier, as determined by bolstered     resubstitution, is only about 5.38%. In this case, lower expression of both genes is a signature for DHF, whereas higher expression of both genes is a signature for DF.

Validation of Microarray Data Using Quantitative Real Time PCR
Quantitative real time PCR assays were performed in order to validate the results seen on the microarray assay. The following genes that were selected: PSMB9, MT2A, HLA-F and C3aR1, displayed expression levels that were predominantly increased in DF samples compared to DHF samples. Quantitative PCR of the genes cited above was performed using eight DHF and eight DF samples obtained from the same patients tested in microarray experiments. According to the Figure 6A, the qPCR quantifications showed a very good correlation with the microarray data.
In addition, qPCR was performed in a separate set of eight DHF and eight DF independent samples collected from a different set of patients from the same cohort. The qPCR results indicated that the level of expression of the genes selected was expressed at lower levels in DHF than in DF patients, in agreement with the results obtained with the 2D LDA model based on the microarray data shown in Figure 5. Hence, each of the four genes measured were expressed at lower level in DHF and they classified correctly the samples analyzed ( Figure 6B). These results are very promising. We are confident that they can be the basis for a successful development of a reliable classifier to predict the clinical outcome of dengue infection.

Discussion
A functional genomic study was performed in order to obtain insights about the early immune mechanisms associated with dengue severity and to identify biomarkers to predict the infection outcome. Initial analysis resulted in the selection of 1981 candidate genes to be statistically evaluated. The analysis of degree of similarity between the samples in a 2 and 3-dimensional spaces has indicated that the DHF group formed a very tight pattern, whereas the remaining groups were more dispersed in the plot. In the 3-dimensional MDS plot the non-dengue samples (green spheres) are grouped close together, with only one sample outlier. The DHF samples (red spheres) are all at the far right side, while the DF samples (blues spheres) are more spread apart. These observations suggest the presence of a specific pattern of gene regulation against a dengue virus infection when compared to non-specific febrile disease. Welch's two-sample t-tests were used for comparisons among the diagnostic groups: Non-Dengue vs. Dengue (DF+DHF) samples; DF vs. DHF samples and so on Table 3. List of genes shown on the ''Volcano'' plot according to the fold-change.  S100A9 S100A9 S100A9 TNFSF12-TNFSF13  TNFSF12-TNFSF13  TNFSF12-TNFSF13   TNFSF13  TNFSF13  TNFSF13   TNFRSF17  TNFRSF17  TNFRSF17   TYROBP  TYROBP   (Supplement material S4) but we will focus the discussion on the comparison between DHF versus DF. The top 200 most differentially expressed genes between DF and DHF cases according to Welch's t-test was contrasted with the list of the top 200 genes according to largest fold change of average expression mRNA levels, resulting in 73 genes differentially expressed among the dengue clinical manifestations that were common to both lists. The list including all the 73 genes was further analyzed using the EASE algorithm in order to identify which categories of genes are overrepresented in this group of genes. The statistically significant overrepresented functional categories included the groups involved in immune and defense responses ( Table 2). Among the nineteen genes included in the immune response category were the HLA-DPab genes, complement factor D, CX3CR, MyD88, TNFSF17 (BCMA), TNFSF13 (APRIL); and among the genes included in the defense response were Mixovirus resistant gene (Mx), 29, 59-oligoadenylate synthetase (OAS1 and OAS2) and interferon response factors (Supplement material S7), which were all less expressed in DHF. While the immune and defense response genes were expressed at lower levels in DHF, several genes associated with apoptosis responses (PDCD4, PACAP, Tumor protein D52, MAGED1, proapoptotic caspase adaptor protein and TNF ligand super family) were up regulated. Interestingly, CD53, a tetraspanin produced by monocytes and B cells that prevents cells from undergoing apoptosis [40] was down regulated in DHF patients, reinforcing the indication of a pro-apoptotic environment in DHF.
It is not unrealistic to expect that some of these genes could be mechanistically involved in the DHF immunopathogenesis or might be used as the basis for the prediction of dengue infection severity. Using this technology, a gene expression pattern was identified in patients suffering the most severe forms of the disease. Simmons et all [27] have shown a molecular signature on PBMCs discriminating early and late phases of DSS and they reported that genes transcripts of IFN-stimulated genes were less abundant in DSS patients than in patients without DSS, whereas the genes involved on apoptosis were up-regulated in the DSS patients. Some of the genes differentially expressed that were found by Simmons et al [27], such as MX, IFIT, pro-apoptotic caspase adaptor protein and OAS, were also found in this study. However, they were not among the most significant differentially expressed genes according to p-value in our study, perhaps because of differences on the stage and severity of the disease. In this study DHF patients were grades I or II and were compared to DF,  whereas in the Simmons study the patients were DHF/DSS, grade IV and were compared with DHF patients without DSS. In a separate study, Ubol et al [28] using samples from a cohort of children from Thailand obtained similar results associating decreased innate immune response and increased apoptosis with development of DHF. However, the individual genes identified were quite different from the ones found in this and other studies.
One possible reason might be the age group used, since its known that immune response repertoire differs during early ages, where innate response predominates, as compared to adults [41]. In another interesting study, Kruif et al (2008) reported a general association of dengue severity and up regulation of genes involved on innate immune response during acute phase of infection in children [42]. However, in addition to the age group bias present Figure 6. Expression levels of genes discriminating DF and DHF patients. A-For all tested genes, qPCR assays were performed using a mix of eight DF or eight DHF samples used in microarray assays. B-For all tested genes, qPCR assays were performed for a set of eight DF and eight DHF samples used in microarray assays (white solid columns), or a set of eight DF and eight DHF independent samples (grey solid columns). The experiment was performed twice and each group was analyzed in triplicates. doi:10.1371/journal.pone.0007892.g006 in the Kruif study, the authors used RNA extracted from total leucocytes, which is composed predominantly by granulocytes, and this most have contributed for the difference in findings. This result suggests that granulocytes may be up-regulating the expression of innate immune response genes whereas the monocytic cells are suppressing it, but more detailed studies need to be done simultaneously on specific cell populations of the same patient. In summary, despite differences between the study designs and differences in the genetic background of our populations our results are consistent with the similar studies reported by Simmons, Ubol and Kruif [27,28].
As reviewed by Clementini and Gianantonio [43], there is much evidence that genetic factors, involved on susceptibility/resistance to infectious disease, influence the immune response in humans. These factors are complex traits modulated by environmental factors, such as previous dengue infection, and do not follow Mendelian inheritance pattern. The differential expression of some of these markers are possibly due to genetic polymorphisms that can interfere with mRNA expression levels, either directly by sitting at the promoter or indirectly by interfering on the pathway that modulates the transcription of those genes. Host polymorphisms present in genes involved in dengue immune responses have been correlated with altered gene expression and susceptibility to DHF [44]. Among these genes are: the TNF-308 allele, which is associated with increased levels of TNFa, is correlated with a greater susceptibility to developing DHF [45,46]; the wildtype AA MBL2 genotype, which is correlated with increased levels of MBL and increased risk factors for the development of denguerelated thrombocytopenia [47] and the polymorphism of the CD209 promoter [48], which is associated with a decreased expression of DC-SIGN and possibly with a lower susceptibility of dendritic cells to be infected by the dengue virus. Thus, searching for gene expression alteration among different dengue clinical manifestation using microarray technology can suggest in a high throughput fashion, genetic factors and immunopathology mechanisms involved on dengue severity.
Moreover, during the acute phase of infection, patients suffering from DF or DHF have a very similar clinical picture. However, the disease defervescence period (after 4 to 7 days of the beginning of the symptoms) is accompanied by severity-varying circulatory disturbance signs [2]. Thus, it seems that the events preceding the defervescence may determine the outcome of the disease severity and a question of interest is the selection of a small set of the best DHF-prognosis gene markers. The MDS plot (Figure 2) including the top 40 most discriminatory genes according to p-value shows that expression patterns of DF and DHF patients are quite different and appear to be distributed into three groups; one DHF (red) group very distinct from the DF (blue) cases, and a third group, which DF and DHF are closer. This result is not surprising and it suggests that DF and DHF are extremes of a continuum spectrum of the same disease, as suggested by Sierra et al (2007), and not two separate diseases [49]. In addition, this result suggests the potential of designing a reliable classification marker based on the quantification of few gene products by qPCR or any other method to quantify RNA or protein products. Towards this goal, we selected the best multivariate sets of candidate genes for prognosis, by means of exhaustive feature selection (all possible combinations) of single, pairs and triplets of genes out of the prefiltered set of 1981 genes, using Linear Discrimination Analysis as the classification rule, and bolstered resubstitution as the error estimator (see the Methods section). According to our results, the top 3-gene classifiers displayed an estimated rate of more than 95% chance of correct classification. We selected a few genes (PSMB9, MT2A, HLA-F and C3aR1) to test by qPCR assays.
Initially, the qPCR assays were performed in the same blood samples used for the microarray study. All gene expression levels determined by qPCR were consistent with the results obtained with the microarray. Subsequently, qPCR quantification was performed in eight DF and eight DHF samples collected from an independent set of patients ( Figure 6). The qPCR quantification showed that the genes (PSMB9, MT2A, HLA-F and C3aR1) were expressed at higher levels in DF than in DHF and confirmed the expression levels seen on the first set patient samples used in the microarray study and were in agreement with the 2D LDA model shown in Figure 4. Hence, each of the four genes measured were expressed at lower level in DHF and they classified correctly the samples analyzed ( Figure 5B). Thus, the qPCR assay results confirmed that quantification of those genes in samples collected on the first medical visit of a dengue infected patient can be used to predict whether the individual will develop DHF symptoms two or three days later.
Our data indicates that the classification of patients into DF and DHF on the basis of gene profiling is feasible and may be a useful means of guiding clinical management of dengue patients. Further analyses using additional independent samples are underway to confirm the value of these classifiers. However, some points need to be addressed on future studies, including the validation of the gene markers identified here on samples collected from people infected with other dengue serotypes for ultimately support the development of a qPCR-based kit to predict the clinical outcome of people infected with any of the dengue serotypes during the first days of the symptoms. Besides the patient-management benefits, this study can also help on the characterization of natural dengue infection and hopefully will facilitate the elucidation of the molecular mechanisms involved in DHF.

Supporting Information
Material S1 Noise and efficiency measurements. Arrows indicate the only array, in the non-dengue group,which showed a reduced signal/noise ratio and percentage of present calls.