Transcriptomic Profile of Whole Blood Cells from Elderly Subjects Fed Probiotic Bacteria Lactobacillus rhamnosus GG ATCC 53103 (LGG) in a Phase I Open Label Study

We examined gene expression of whole blood cells (WBC) from 11 healthy elderly volunteers participating on a Phase I open label study before and after oral treatment with Lactobacillus rhamnosus GG-ATCC 53103 (LGG)) using RNA-sequencing (RNA-Seq). Elderly patients (65–80 yrs) completed a clinical assessment for health status and had blood drawn for cellular RNA extraction at study admission (Baseline), after 28 days of daily LGG treatment (Day 28) and at the end of the study (Day 56) after LGG treatment had been suspended for 28 days. Treatment compliance was verified by measuring LGG-DNA copy levels detected in host fecal samples. Normalized gene expression levels in WBC RNA were analyzed using a paired design built within three analysis platforms (edgeR, DESeq2 and TSPM) commonly used for gene count data analysis. From the 25,990 transcripts detected, 95 differentially expressed genes (DEGs) were detected in common by all analysis platforms with a nominal significant difference in gene expression at Day 28 following LGG treatment (FDR<0.1; 77 decreased and 18 increased). With a more stringent significance threshold (FDR<0.05), only two genes (FCER2 and LY86), were down-regulated more than 1.5 fold and met the criteria for differential expression across two analysis platforms. The remaining 93 genes were only detected at this threshold level with DESeq2 platform. Data analysis for biological interpretation of DEGs with an absolute fold change of 1.5 revealed down-regulation of overlapping genes involved with Cellular movement, Cell to cell signaling interactions, Immune cell trafficking and Inflammatory response. These data provide evidence for LGG-induced transcriptional modulation in healthy elderly volunteers because pre-treatment transcription levels were restored at 28 days after LGG treatment was stopped. To gain insight into the signaling pathways affected in response to LGG treatment, DEG were mapped using biological pathways and genomic data mining packages to indicate significant biological relevance. Trial Registration: ClinicalTrials.gov NCT01274598


Introduction
Lactobacillus rhamnosus GG (LGG) isolated from human intestine is a well characterized strain shown to have antimicrobial effects against enteric bacterial pathogens and rotavirus [1] respiratory viruses such as respiratory syncytial virus (RSV) [2], rhinovirus infections [3] and influenza [4,5,6].Immune modulating mechanisms attributed to probiotic bacteria like LGG have been based principally on in vitro cell culture models [4,7], some recently summarized in vivo models [1,8] and limited controlled intervention studies in humans [9].However, there has been no convincing clinical demonstration of LGG-induced immune modulation in human patients given prolonged probiotic consumption [1].
Current evidence indicates that Lactobacillus rhamnosus (L.rhamnosus) can ameliorate intestinal injury and inflammation caused by various stimuli.L. rhamnosus species can specifically exert protective activity against lipopolysaccharide (LPS) induced inflammatory damage in animal models [10,11] or cells lines by blocking TNFα-and LPS-induced IL-8 activation [12,13].It has also been reported that probiotic derived factors can reverse pathogen-induced inflammation.
LGG modulates LPS-induced inflammation by decreasing the activation of proinflammatory transcription factor NF-Kb and IL-6 secretion, while inducing the anti-inflammatory cytokine IL-10 [10].
As one of the most experimentally and commercially used probiotics, LGG, was originally isolated from human intestine and has been extensively characterized [14].L. rhamnosus is among the largest of the lactic acid bacteria that has the ability to persist in human intestinal mucosa displaying functional pili and producing bacteriocins [9].The health benefits of LGG have been demonstrated in human feeding studies with normal populations or subjects suffering from gastrointestinal disorders and allergies [9,15].
Research using in vitro and in vivo animal models have been used to characterize the mechanisms employed by LGG to modulate epithelial barrier function [16], stimulate specific immune cell function [8], and utilize bacteria-host crosstalk to displace pathogenic bacteria from intestinal compartments [17].However, no study has comprehensively evaluated the effect of continuous LGG consumption on changes in human whole blood cell transcriptome as an indicator of safety and immune modulating activity.The primary aim of this Phase I open label study was to provide information on adverse events that may occur in healthy elderly volunteers receiving LGG administered twice a day for 28 days [18].The secondary aim as described in this manuscript was to evaluate potential mechanisms of action of LGG in the healthy elderly by studying their immunologic responses to consumption of LGG for 28 days.

Ethics Statement
This study was approved by the Partners Institutional Review Board (IRB 2010P001695) and was registered at ClinicalTrials.gov(NCT01274598).An Independent Data Safety Monitoring Board reviewed the protocol prior to initiation and throughout study.In addition, the study was monitored by the Center for Biologics Evaluation and Research (CBER) from FDA under IND 14377 and the National Institutes of Health (NIH) Office of Clinical and Regulatory Affairs (OCRA) and National Center for Complementary and Integrative Health (NCCIH).The protocol for this trial and supporting CONSORT checklist are available as supporting information S1 Fig and S1 Table .All data is available for public access through the database of Genotypes and Phenotypes (dbGaP) (www.ncbi.nlm.nih.gov/gap)accession phs000928.v1.p1.

Study design
This is a phase I, open label clinical trial that evaluated the effect of Lactobacillus rhamnosus GG (LGG), ATCC 53103 on the whole blood transcriptome of elderly subjects.Subjects of 65-80 years of age were recruited from the greater Boston Area using email and hard copy advertisements sent to subjects registered in the Massachusetts General Hospital (MGH) database according to IRB approved protocol (S1 Fig) between December 1, 2010 and August 5, 2011 as previously described [18].Interested subjects were asked to call the study telephone number, were informed about the study and pre-screened via questionnaire regarding their general good health, whether they consumed yogurt or probiotic on a daily basis, if they were interested in participating in the study and their availability for the required follow-up period.Those interested were scheduled for a screening visit at MGH's Clinical Research Center (CRC) where subjects completed the consent process, signed the study consent form, gave permission to be tested for HIV, and were asked by study physicians to provide a detailed medical history including current use of medications (prescription and nonprescription), probiotic and dietary supplements.Laboratory tests included complete blood count (CBC), chemistry panel, liver function tests (LFTs), hepatitis B surface antigen, hepatitis C and HIV antibody tests and urine toxicology.At the end of the screening visit, subjects were provided information on foods and probiotic products they should avoid in order to maintain eligibility in the trial.Subjects were contacted by telephone about their eligibility after the lab test results were available, except for those testing positive for HIV, who were asked to return for a follow-up visit at which time the subject was informed of the result, counseled, and referred for further evaluation.Fifteen eligible subjects attended a start up visit where final eligibility criteria were checked and information on the study design, schedule and patient routines and responsibilities were explained prior to the first oral administration of a dose of 1 x 10 10 colony forming units of LGG per capsule twice daily (1 capsule AM and PM for 28 days) (Fig 1).The LGG capsules were provided by Amerifit Brands Inc., Cromwell, Connecticut and were tested for no evidence of bacteria other than LGG [18].The first dose was administered under observation at the CRC.Subjects were evaluated during the study at Day 0 (baseline), Day 28 (+/-2 days), and Day 56 (+/-1 week), as well as via telephone calls on Days 3 (+/-1 day),7 (+/-2 days), 14 (+/-2days) to record any possible adverse events to the treatment.Compliance with LGG consumption was calculated as the percentage of pills dispensed that were not returned on day 28 [18].Compliance was also estimated based on relative abundance of LGG DNA copies detected in fecal samples of patients throughout the study.

Clinical sample collection and handling
Venous blood samples were drawn from non-fasted participant (n = 15) at CRC on day 0 (baseline), day 28, and day 56.At each time, blood was collected directly into PAXgene Blood RNA tubes (Preanalytix, Qiagen BD, Valencia, CA) to stabilize blood RNA.After a four hour stabilization period at room temperature, PAXgene tubes with collected blood were frozen at -80°C until further processing.Fecal samples were collected by participants in sterile plastic containers that rested in an H frame that fit into the toilet seat.Subjects were asked to collect samples within 24 hrs of their visits at days 0, 28, and 56 and to place the plastic container with the sample into a styrofoam container surrounded by four ice packs to cool and maintain the specimen at 4°C.Upon arrival, study staff immediately processed the fecal samples into one gram aliquots that were snap frozen at -80°C until further processing.Once all clinical sample collection was completed samples were shipped on dry ice to the USDA/ARS, Beltsville Human Nutrition Research Center, Diet, Genomics and Immunology Laboratory, in Beltsville MD for nucleic acid isolation and processing.

Isolation of RNA from whole blood samples
RNA was isolated from whole blood using the PAXgene Blood RNA kit from PreAnalytiX [19].Paxgene tubes were thawed at room temperature for at least three hours.After tubes were centrifuged for 15 min at 4,000 x g the supernatant was discarded and 4 mL of RNAse-free water was added to lyse cells in the pellet.After further centrifugation, pellet matter was treated with different buffers, purified and subjected to on-column DNAse I treatment according to the manufacturer's instructions.Integrity and quantity of purified RNA was determined via the Experion Automated Electrophoresis Station (Hercules, CA).RNA quality was reported as a score from 1-10 referred to as the RNA Quality Indicator (RQI).RNA samples falling below an RQI threshold of 8.0 were omitted from the study.

Globin depletion
Following isolation, total RNA samples were depleted of globin mRNA using the GLOBINclear Human Kit as recommended by the manufacturer's protocol (Ambion, Austin TX) [20].One microgram of purified RNA was mixed with biotinylated -Globin Capture Oligonucleotides and incubated for 15 min to allow for hybridization.Streptavidin magnetic beads were then used to capture and remove globin mRNA via a magnetic separation.Globin-depleted mRNA was further purified with additional washes using a rapid magnetic bead-based purification method.Quantity and quality of globin-depleted RNA was re-determined using the Experion platform.

TruSeq Library Prep and Sequencing
The Illumina TruSeq RNA Sample Prep v2 kit (Ilumina, San Diego, USA) was used to prepare the RNA samples for sequencing.Due to limited quantities of high quality RNA available for sequencing, a trial was performed to determine and confirm the minimum quantity of RNA that could be used as input for the TruSeq protocol.RNA inputs of 100, 250, 500 and 1000 ng originating from a single participant were sequenced and gene counts were analyzed for statistical similarity using a matched pair analysis.Conversion of RNA to sequencing libraries involved purifying poly-A containing mRNAs using magnetic beads, fragmenting the molecules, and converting them into cDNA.The cDNA was then subject to end repair, 3' end adenylation, ligation of Illumina indexing adapters, and PCR enrichment.Libraries were validated for average fragment size and quantified on the Experion Automated Electrophoresis Station using DNA 1K chips.Three libraries were prepared from each subject from samples collected before treatment (Day 0), twenty eight days into daily probiotic consumption (day 28) and after probiotic consumption had been suspended for 28 days (Day 56).Libraries were brought to equimolar concentrations (3-5pM) for cluster generation on Illumina's cBot prior to being run on the Hi-Seq 2000 sequencer (Illumina,San Diego, CA) for 100 cycles in single-read format.

Sequence Trimming and Alignment
FASTQ files generated from sequencing were imported into CLC Bio's Genomics Workbench (v6.5,Aarhus,Denmark).Sequences below a length of 80bp and below a PHRED quality score of 30 were trimmed to ensure 99.9% base call accuracy.Sequences were then aligned to the human reference genome (GRCh37.64)via CLC's RNA-Seq module with a maximum number of two mismatches, minimum length fraction of 0.95, and a minimum similarity fraction of 0.95, so that at least 95% of bases would map with 95% similarity (http://www.ensembl.org/Homosapiens/Info/Index). Mapped reads for each sample were summarized into gene level expression counts that were used as input for gene expression analysis.

RNA-Seq Data analysis
Determination of differentially expressed genes (DEG) required an analytical approach tailored to RNA-Seq datasets.For this study we used three statistical tools including Bioconductor packages: edgeR [21], DESeq2 [22], and TSPM.The first two are based on negative binomial generalized linear models (glm) but differ in their normalization and filtering procedures [23].The third method is based on a two-stage Poisson model (TSPM) [24] that analyzes over-dispersed genes separately from genes that did not exhibit variation significantly greater than the mean (i.e.Poisson distribution).Gene counts representing unique exon reads were chosen for analysis.The time effect was tested using likelihood-ratio statistics to compare data from days 28 and 56 against day 0. By using subject as a blocking variable the time effect was assessed for each patient separately ensuring that baseline differences between subjects were subtracted out.Output from statistical packages included log-fold change (log 2 ), log counts per million (or mean by time point), the likelihood ratio statistic (for GLM-based analyses), p-values and FDR-adjusted p-values.Differential expression was determined by fitting a glm using the Cox-Reid profile-adjusted likelihood method for estimating dispersions followed by the likelihood ratio test.P values were corrected using the Benjamini-Hochberg false discovery rate adjustment [25].In addition, the probability of any specific gene being a false discovery (q-value) was also calculated with the TSPM method [26].DEGs generated from each analysis were compared and used to determine which common genes were differentially expressed.A difference in gene expression was considered significant if the adjusted FDR p-value was < 0.1.
Quality of reads was also checked using a quality control pipeline SolexaQA [27] where nucleotides of each read were scanned for low quality and trimmed.Processed reads were then mapped to the human reference genome using TopHat 2 [28].SAM output files from TopHat alignment, along with the GTF file from ENSEMBL human genebuild v69.0, were analyzed using Cuffdiff-Cufflink (v1.3.0) to test for differential expression.Mapped reads were normalized based on upper quartile normalization method (-N/-upper-quartile-norm). Cuffdiff models the variance in fragment counts across replicates using the negative binomial distribution as described [29].

Gene Enrichment
Interpretation of high-throughput gene expression data is facilitated by the consideration of prior biological knowledge [30,31,32,33].Biological network analysis was performed using Ingenuity Pathway Analysis (IPA) (v 9.0,Ingenuity Systems, Mountain View, CA, USA) to predict potential biological processes, pathways and molecules affected by DEGs.This web-based tool facilitated the association of changes in gene expression with related biological pathways based on a gene's functional annotation and known molecular interactions.Focus genes were overlaid onto a global molecular network developed from information contained in the IPA Knowledge Base (KB), a large structured collection of observations in various experimental contexts with nearly 5 million findings manually curated and updated from the biomedical literature.The reference network contains ~40,000 nodes that represent mammalian genes and their products, chemical compounds, microRNA molecules and biological functions.Nodes are connected by ~1480000 edges representing experimentally observed cause-effect relationships that relate to expression, transcription, citation, molecular modification, and transport as well as binding events [34].Networks of these focus genes are algorithmically generated based on their connectivity and number of focus genes.The more focus genes involved, the more likely the association is not due to random chance.In order to identify the networks that are highly expressed, IPA computes a score according to the fit of the genes in the data set.This score is generated using a p-value calculation determined by a right-tailed Fisher's exact test, and it is displayed as the negative log of that p-value.This score indicates the likelihood that the fit of the focus genes in the network could be explained by chance alone.A score of 2 indicates that there is a 10 −2 chance that the focus genes are grouped together in a network by chance.A high number of focus genes within a dataset leads to a higher network score.To identify molecules upstream of the affected genes in the dataset, that potentially explained the observed expression changes, the 'Upstream Regulator Analysis' (URA) tool within IPA was used.This tool predicted upstream regulators and inferred their activation state by calculating a Z-score to assess the match of observed and predicted up/down regulation patterns.Z-score is particularly suited for pathway analysis since it serves as both a significance measure and a predictor of the activation state of the regulator: activated (Z value >2) or inhibited (Z value <2) [34].The Downstream Effects Analysis (DEA) was applied and used the methodology of URA for the inference and impact on biological functions and diseases that are down-stream of the genes with altered expression.The goal was to identify those biological processes and functions that were likely to be casually affected by up-and down-regulated genes of our dataset.Graphical presentation of gene-gene interactions and de-regulated genes for enriched pathways are visualized in networks that contain up to 35 genes with an associated score derived from a p-value, indicating the expected likelihood of the genes being present in a network compared to that expected by chance.
To further interpret the biological meaning of DEGs induced in whole blood after Lactobacillus rhamnosus consumption for 28 days, we compared the overlap between our gene dataset and Hallmark gene sets from the Molecular Signature Database (MSigDB) [35] so common processes, pathways and underlying biological themes could be identified.The gene sets in the collection that best overlap with the query genes were supported by an FDR adjusted p-value generated from the hypergeometric distribution for the number of genes in the intersection of the query set with a set from MSigDB [35].To link transcriptome changes induced by probiotic treatment with corresponding patterns produced by human cells in response to biologically active compounds a cross-database analysis using Connectivity-map, C-MAP (build02, http:// www.broad.mit.edu/cmap/) was done.The C-MAP is a collection of over 7,000 genome-wide transcriptional expression profiles from cultured human cells treated with over 1300 bioactive small molecules and simple pattern-matching algorithms that together enable the discovery of functional connections between drugs, genes and diseases through the transitory feature of common gene-expression changes [36].

Fecal DNA RT-PCR analysis
DNA from stool samples provided by participants on days 0, 28, and 56 was isolated using the QIAamp DNA Stool mini-kit (Qiagen, Valencia, CA) [37].Briefly, 250 mg of a homogenized one-gram fecal sample was weighed and immediately re-suspended with lysis buffer.After heating the suspension at 95°C to increase DNA yield, removal of inhibitors, and proteinase K digestion was done before DNA was bound to a column, washed, and eluted in TE buffer.DNA concentration was determined by the NanoDrop method (Thermo Fisher Scientific, CA).Briefly, 40ng of fecal DNA per sample was used as a template for real time PCR amplification using primers and probes that differentially amplify variable regions within the 16S ribosomal DNA specific for total bacteria [38], Bifidobacterium species [39], and Lactobacillus species from the casei [40] and non-casei subgroups [39].Similarly, relative quantification of LGG abundance was done using a set of primers and probe designed to amplify a highly conserved and ubiquitous tuf-gene expressed as a single copy and universally distributed in Lactobacillus species [41,42] and used to determine bacterial abundance marker within other probiotic species [37].The C T values that were generated expressing the target gene's copy quantity were converted to number of gene copies using standard curves constructed by serially diluting purified fragments of each bacterial gene target.The size of the fragment was verified and molarity was determined by DNA 1K chip using the Experion Automated Electrophoresis System (Biorad, Hercules, CA).A linear relationship was established between the C T value and number of target gene copies ranging between10 1 to 10 10 copies/mL and this relationship was subsequently used to estimate values of log 10 target gene copy numbers in fecal samples [43].All molecular assays were performed on the 7500-Real time PCR System(Perkin Elmer) using a 25 μL PCR amplification mixture containing 1X Thermo-start QPCR master mix with ROX (Abgene, Rochester, NY), forward, reverse, probe and an equivalent of 20 ng of DNA per reaction.The amplification conditions were 50°C for 2 min, 95°C for 10 min, and 40 cycles at 95°C for 15 seconds, and 60°C for 1 min.Mean copy number (expressed as log 10 target gene copies per gram of feces) was calculated and compared among treatment groups.A one-way repeated measures analysis of variance (ANOVA) model was fit to analyze different bacterial species expressed as copies per gram of feces (cpg) using SAS v9.3 PROC GLIMMIX to specify a lognormal distribution and heterogeneous compound symmetric covariance structure to model correlations among days measured on the same subject and to obtain pair-wise means comparisons among days.Statistical significance among days was reported when p<0.05.

LGG treatment compliance and clinical signs
Compliance with LGG based on day 28 (range day 24-day 32) capsule count was 100% in 11 (73%) subjects; between 90-99% in 2 (13%) subjects and 84% in 1 (7%) subjects.Compliance for the final subject could not be estimated because the subject did not return her capsules [18].LGG treatment compliance was also verified by monitoring changes in Lactobacillus rhamnosus abundance in patient fecal samples.A species specific real time PCR assay against a 106 base pair (bp) fragment of the tuf gene was designed for identification of Lactobacillus rhamnosus species after alignment and comparison with closely related Lactobacillus species using the Clustal alignment program [44] (S2 Fig) .Forward and reverse primer and probe reagents for LGG detection were tested for specificity using DNA from bacterial reference strains as templates for real time PCR analysis and construction of standard curves as previously described [37].After 28 days of LGG treatment, there was greater than a three hundred fold increase in LGG copies per gram (cpg) in feces collected (42.05 x 10 5 ±8.18) when compared to baseline (0.12x 10 5 ±0.08) levels or a seven hundred fold increase when compared to day 56 (0.06 x10 5 ±0.04) levels (P<0.05).Significant differences in LGG copies were not detected between baseline and Day 56.Relative abundance of Lactobacillus species from the casei group were also significantly increased at day 28 (12.87x 10 5 ±2.24) when compared to baseline (0.98x 10 5 ±0.27) or Day 56 (1.58x 10 5 ±0.46)(P<0.05).No other differences were detected in total bacterial counts (Eubacteria), or in Bifidobacterium species or Lactobacillus species from non-casei group (Table 1).Distribution of blood cell differential data and complete plasma chemistry panels for each participant at baseline (Day 0), day 28 and day 56 were within normal range.No outliers or abnormal patterns were observed at baseline or during LGG feeding (D28 and D56) [18].

Whole blood RNA analysis
Individual gene levels expressed as reads per kilo base per million (RPKM) were compared in a preliminary test among RNA input levels of 100, 250, 500 and 1000ng from a single patient.A matched paired analysis was performed between different RNA input levels and only at 100ng were the count data statistically different from the other input levels (p<0.001).RPKM values were shown to be statistically similar between the 250, 500 and 1000 ng levels, suggesting that a minimum input of 250ng RNA could be used with as much confidence as at the level of 1000 ng (S3 Fig) .Based on available RNA quantities, an input of 500 ng was chosen for library preparation and sequencing, if participants had the complete three time point set of high quality RNA (RQI > 8.0) samples.From the fifteen study participants, three samples (401-57 from day 28, 402-28 from day 56 and 409-45 from day 0) were discarded due to low quality, one due to low RNA yield (406-76 day 28) and an additional fourth subject (430-82) was not included in the sequencing analysis due to lack of clinical compliance (S2 Table ).Therefore, thirty-two high quality RNA samples from 11 participants were used for the final sequencing analysis (10 participants x 3 time points/subject, 1 participant X 2 time points/subject).Sample randomization of all RNA samples consisted of including an equal number of different time points on each flow cell so as not to repeat the same subject on one flow cell.A mean average of 127.8 ± SD 55.7 million reads per sample was generated.Alignment results showed an average of 76.2±SD 33.7 million unique exon reads from each sample mapped to the human genome similarly to what has been described in other experiments with human blood samples [45] (S3 Table ).Reads that uniquely mapped to the reference genome were summarized into gene level expression counts before statistical analysis on platforms edgeR, DESeq2 and TSPM, for the detection of differentially expressed genes.

Differential Expression of Genes (DEG)
Our study design had two experimental factors: Subjects (11 levels) and time (three levels per subject).The study was analyzed using a paired sample model in which subjects were used as the blocking factor.Our main goal was to identify genes that were differentially expressed between baseline (day 0) and day 28 after probiotic consumption and between base line and day 56 when probiotic consumption had been suspended for 28 days to see any possible residual probiotic effect.Differential expression analysis was performed on 25,990 annotated genes using the edge-R and DESeq2 Bioconductor packages, the two stage-Poisson model (TSPM), R Script and Cuffdiff analysis tool from Cufflinks.Volcano plots illustrate the general gene expression pattern detected by edgeR, DESeq2 and TSPM using a threshold log fold change of 0.6 (absolute fold change 1.5), with an adjusted FDR p-value<0.05or <0.1 to capture highly abundant marginal changes in gene expression depending on the analysis platform used (Fig 2).All platforms normalized the count data for library size and removed genes with zero counts across all samples.For edgeR, count data from each gene was run unfiltered (n = 25,990 genes) and also with an inclusion filter of at least 0.1 counts per million (cpm) (n = 13,891 genes), representing a minimum gene count of at least 3 (depending on the library size) in all samples (S4 Table ) as suggested in other studies in order to improve statistical power by decreasing the number of multiple comparisons to adjust for and to reduce the possible bias of very small counts with no biological significance [20,46,47,48].EdgeR-generated DEG using non-filtered data (DEG = 2, FDR p-value<0.1),and with 0.1cpm inclusion filter in all samples (DEG = 139, FDR p-value<0.1)indicated that the gene encoding the low affinity receptor for Fc fragment of Immunoglobulin E (IgE), FCER2, was the top common DEG detected in edgeR analyses platforms with a significant 1.7 fold decrease in expression at day 28 (FDR p-value<0.05)(Table 2).Lymphocyte antigen 86 gene, LY86, was also down-regulated at day 28 in edge-R analyses with a lower FDR p-value = 0.05 only in 0.1 cpm filtered dataset.An additional group of 137 DEG (111 down, 26 up) with an adjusted FDR p-value<0.1 were only detected in filtered edgeR-dataset (Table 2).DEG were not detected in either edge-R analyses between day 56 and day 0 after LGG consumption had ceased for 28 days (data not shown  2).Cuffdiff differential expression analysis also detected similar fold changes as DESeq2 for common DEG; however none reached statistical significance (data not shown).
To relate gene expression changes to previously described functional profiles, DEG were also overlapped with 50 richly annotated gene sets from the MSigDB database (http://www.broadinstitute.org/gsea/msigdb/index.jsp) which are used as hallmark gene sets that summarize and represent specific well defined biological states or processes [35].Our dataset presented a significant overlap with 16 down-regulated genes encoding proteins involved in oxidative phosphorylation, 7 genes encoding proteins in response to IL-2 and 5 genes coding for proteins in response to IFNg stimulation (S5 Fig) .In addition, genes typically upregulated in adipogenesis and transplant rejection were also down-regulated in our dataset, indicating that dietary consumption with Lactobacillus rhamnosus is predicted to induce a down regulation of genes involved in response to these biological processes.To find correlations between our intervention with L. rhamnosus and its similarity at the transcriptional level to response profiles associated with pharmaceutical and other biologically active compounds, the Connectivity map (C-MAP) database was also used [36].C-MAP results showed that the in vivo transcriptome obtained after a 28-day LGG intervention shared a large similarity to the transcriptome obtained after exposing human cell lines to compounds with anti-neoplastic effects (i.e.MG-132, demecolcine, decitabine, tyrphostin), anti-inflammatory action (proteasome inhibitors MG-132 and MG-262, 1-5-isoquinolinediol) for management of hypertension (sulmazole, chlortalidone),vomit inducers (i.e.emetine, cephaeline) or compounds that control apoptosis (H-7 and other topoisomerase inhibitors) (Table 4).

Discussion
This study provides the first transcriptomic sequencing effort to determine gene expression changes in human WBC from healthy elderly individuals after daily consumption of probiotic Lactobacillus rhamnossus GG-ATCC53103 (LGG).Bioinformatics analysis identified a discrete set of LGG-induced DEG in WBC of elderly patients consuming LGG that returned to baseline levels after 28 additional days without LGG consumption.Monitoring the presence of LGGderived DNA in the feces as a measure of compliance confirmed a significant increase of LGG following 28 days of consumption and a return to baseline levels after consumption was discontinued.These data suggest a LGG-dependent modulation of the WBC transcriptome in healthy elderly humans.Lactobacillus species have been extensively studied for their immune modulating activities [1,8].Different studies have shown variable effects on immunity and inflammation using a variety of Lactobacillus rhamnosus strains which has made a generalized interpretation of results difficult [2,4,6,65].L. rhamnosus bacterial cells and components have been shown to interact with a wide variety of host cells present in blood and intestinal tract such as epithelial and dendritic cells, macrophages and neutrophils [10,11,66,67] resulting in the secretion of pro-and anti-inflammatory cytokines.The response of explanted human peripheral blood mononuclear cells from normal or probiotic fed humans to bacterial products and immune simulators in vitro [68,69,70], or studies using animal models [2,71,72] has suggested some regulatory function activated by Lactobacillus species for modulating immunity and inflammation.However, a more robust transcriptomic evaluation of WBC from humans consuming probiotics for a prolonged time has not been previously completed.Thus, it was the aim of this study to identify DEG in human WBC from an open label Phase I study of elderly subjects participating in daily LGG consumption for a period of 28 days followed by a period equally as long without the probiotic consumption.
Increasing sequencing depth and ever-expanding coverage of next generation sequencing technology has made RNA-Seq an attractive approach for the identification of DEG in response to several different stimuli [73,74].Molecular profiling of circulating blood cells has been associated with physiological, toxicological and pathological events originating from different tissues and organs in the body making it a rich source for potential biomarker identification [33,75,76,77,78,79] for the evaluation of treatment responses [45,80,81].Our study consisted of whole blood RNA samples averaging 70M reads.This degree of depth is well beyond previous recommendations of 20M reads for detection of differentially expressed genes in a species with fully annotated genome [82].The number of biological replicates used in this study per time point (n = 11) is considered to be relatively high for achieving a statistically powerful analysis when compared to the minimum of 3-6 replicates recommended for minimal inference [23,83].Overall, more power is gained by increasing the number of biological replicates relative to technical replication and sequencing depth due to the improved estimation of sample variance [23].Appropriate handling of RNA-Seq data is essential to account for the presence of systematic variation between samples as well as differences in library composition.There is no general consensus on which method performs best when analyzing data from human WBC generated by RNA-Seq.Selecting an optimal analysis method was a challenging task as this field of research is actively growing and ongoing efforts to assess and cross-validate the different available analysis methods are being made [23,84,85,86,87,88].
We opted to use a multiple platform approach that incorporated four of the most popular statistical methods, a practice that has been recommended in several recent RNA-Seq studies to control for false discoveries [88].A comprehensive evaluation of these packages along with a handful of other studies that have analyzed DEG in PBMC of healthy [89] or sick subjects [45], and from isolated human B-cell subsets [90], neutrophils [91] human derived cell lines [92] or human skin biopsies [33] indicate that DESeq2 and edgeR are both well equipped to account for differences in library size and composition; features that are typical of RNA-Seq data [84].
It has been suggested that high variability between biological replicates (over-dispersion) necessitates the use of a distribution model that incorporates mean and dispersion parameters to better model the mean-variance relationship such as the negative binomial model [93], that is implemented in DESeq2 and edgeR [22].Our data is in agreement with prior observations that show edge-R performing better when analyzing data with larger fold changes.The low expressing genes (<3 counts) that were designated as differentially expressed by edge-R, but exhibited large fold changes (>1.5 in 15 genes) likely do not have a biological significance due to their very low counts.DESeq2 treated these genes as outliers and omitted them from the analysis (Table 2).Alternatively, the TSPM package, which operates on a per gene basis and the Cufflinks module "Cuffdiff" that uses RPKM (Reads per Kilobase per million base reads) [94]transformation, partially coincided with edge-R and DESeq2 but fold changes were considerably lower or no statistical inferences could be made, likely due to differences in how these methods account for biological variability [93] Thus, only DEG data produced by edgeR and DESeq2 was further used for data mining and elucidation of affected biological pathways.RNA-seq derived expression patterns have previously shown to provide considerable high sensitivity and accuracy and to be consistent with gene detection by quantitative PCR (QPCR) as the gold standard method for validation of changes in gene expression [48,78].In our study, QPCR of DEG identified by RNA-seq analysis was not performed as sufficient RNA from all subjects was not available after globin depletion.However, the relatively modest changes found in gene expression were provided with biological context after they were related to functional changes that reflect which cellular pathways and processes were modulated by transcriptional networks and if these changes have any clinical or pharmaceutical relevance.DEG data was used to 1) reconstruct pathways and regulatory networks using Ingenuity pathway analysis (IPA); 2) compute overlaps with hallmark gene sets that represent specific well defined biological processes in the Molecular signature database (MSigDB) and 3) find functional connections among drug, genes and diseases using the Connectivity Map (C-MAP).Comparison of gene counts revealed distinct gene expression profiles only when day 28 samples were compared against day 0. No changes were detected when day 56 was compared to day 0 or day 28.From the 25,990 genes detected by RNA-Seq, a small subset was differentially expressed in response to LGG treatment: 0.5% (DEG = 139) and 3.8% (DEG = 987) by edge-R and DESeq2 respectively ranging from log 2 fold change of 0.5 to 1.8 (absolute fold change 1.4-3.5)(FDR<0.1)(Table 2 and S5 Table).When we compared DEG lists generated by all platforms, the top down-regulated DEG were FCER2, encoding the low affinity receptor for immunoglobulin E (IgE) [49] and LY86, that encodes a glycoprotein physically associated with RP105 (a toll like family protein) to form a RP105/MD-1 complex expressed in immune cells that has most recently been involved in the patho-physiological regulation of the innate immune system and inflammation [95].Interestingly, consumption of LGG has been associated with reduced allergic symptoms in a randomized placebo control trial of atopic eczema in neonatal and infants [15,96] possibly by the induction of regulatory cytokines [97].LGG has also been shown to decrease synthesis of OVA specific IgE and IgG2a levels with induction of regulatory T-cells and suppression of OVA induced airway hyper responsiveness in a murine model [65,98].Possible mechanisms of action that have been proposed include a suppression of the Th2 response in respiratory organs mediated by probiotic induced T-regulatory cells or dendritic cells [65,99].Based on our findings, the possibility that LGG ameliorates the allergic hypersensitivity response through the down regulation of FCER2 receptor should be considered as an alternate mechanism to explore.
An additional common pool of 93 DEG (75 down-and 18 up-regulated) identified by all platforms included several transcriptional regulators, lectins, ribosomal proteins, and receptors among several molecules with similar fold changes but different statistical significance (Table 2).Data mining of DEG by prediction of functional responses based on known molecular interactions previously published was used to understand the biological impact of LGGinduced DEG [30,33,81].Downstream transcriptomic analysis identified myeloid cell activation, and cell chemotaxis as the prominent processes predicted to be inhibited by LGG treatment.Data mining with IPA incorporated expression of downstream target genes from experimental data and compiled knowledge or reported relationships between regulators and their known targets to infer the underlying causes of their observed transcriptional changes and likely outcomes [34].There was consensus among the different analysis platforms on the significantly activated networks that were identified (Table 3).The genetic network with the highest score (46), identified as Cell to cell signaling Interaction and Inflammatory response contained a series of down-regulated genes encoding transmembrane receptors-TNFRSF17 and OLR1, extracellular enzymes LTF and ELAINE, lectins CLEC11A, and binding proteins: S100A8 and S100A12 that have been associated with induction of transcription factor NF-Kap-paB and additional down regulated protein coding genes RNASE1, PF4 and CAMP known to have a direct effect on the expression of monocyte chemoattractant, CCL2 (Fig 3).Additional biological processes identified by downstream effect analysis included the decreased activation and chemotaxis of myeloid cells including phagocytes and neutrophils and a decrease in many genes coding for pro-inflammatory chemokines: CXC-motif ligand 3(CXCL3), pro-platelet basic protein CXC-motif ligand-7 (PPBP), chemokine C-C motif ligand 2 (CCL2), platelet factor 4 (PF4); antimicrobial peptides: defensin alpha 1 (DEFA1), azurocidin 1(AZU1), cathelicidin antimicrobial peptide (CAMP), cathepsin G (CTSG); S-100 calcium binding proteins: (S1000 A12 and S100A8), and lectin galactoside-binding soluble 1 (LGALS1) involved in chemotaxis and activation of myeloid cells (S7 Table ).The most inhibited upstream regulators of inflammation (negative Z-score) were the transmembrane receptor, CD40 and pro-inflammatory cytokine TNFa, known to be associated with the initiation of inflammation.The miRNA-146a-5p microRNA, an important negative regulator of inflammation, was also predicted to be increased (positive Z-score) [100].Further comparison of our transcriptomic data with existing annotated gene sets from the MSigDB database also supported a down regulation of genes involved in processes like oxidative phosphorylation and response to pro-inflammatory IL-2 and IFN-g cytokine stimulation, indicating that LGG is capable of affecting genes associated with the establishment of the inflammatory response albeit a low level of induction.
Our study identified a discrete set of DEG with small changes in the WBC transcriptome of elderly subjects between the ages of 65 to 80 years consuming a daily ration of LGG for 28 days.
Our analysis was based on the RNA extracted from WBC.This approach included cell analysis of neutrophils that seem to be population particularly responsive to LGG.The anti-inflammatory activity of LGG on myeloid cells has been shown by inhibition of both PMA and Staphylococcus aureus induced formation of neutrophil extracellular traps (NETs), production of reactive oxygen species and phagocytic capacity of neutrophils while protecting against cell cytotoxicity [67].However, these results raise intriguing questions regarding the immune modulating effects of LGG in subjects facing an infection where an inflammatory response is required.Here, we predicted functional responses based on known molecular interactions previously published, in a group of healthy elderly patients with no associated clinical effects during the intervention period [18].The relatively modest changes in gene expression and the absence of any significant changes in clinical parameters [18] indicate that LGG is a safe product when used under the conditions delivered.
To further evaluate the biological impact of host trancriptome changes induced by 28-days of LGG consumption, we compare our in vivo transcriptomic changes with existing data in the Connectivity map pipeline that describes cell transcriptional responses of human cell lines to bioactive molecules that play a role in disease prevention or host immune stimulation.Interestingly, our results indicated that LGG consumption induced transcriptomic changes in WBC that mimic the response induced by proteasome inhibitors [101] which anti-inflammatory effect have been attributed mainly to attenuated activation of pro-inflammatory Nuclear Factor Kappa-light-chain enhancer of activated B cells (NF-κβ), a transcription factor that positively regulate many genes that encode pro-inflammatory cytokines [102].A high connectivity score was also found with compounds with anti-neoplasic effects and compounds that are effective against amoebal infection and control apoptosis as previously described for human intestinal mucosa responses after short term exposure to Lactobacillus rhamnosus [8].

Conclusions
The analysis of WBC may provide a more robust and comprehensive approach for detecting changes in the transcriptome of circulating inflammatory and immune cells that are also representative of other tissues sites in the body.The current study indicated that whole genome expression analysis can be used to identify important pathways, functions and networks in response to probiotic consumption in humans.Although the modulation of the WBC transcriptome by LGG was modest, the data suggested than an anti-inflammatory effect of LGG could be induced by daily probiotic consumption over a period of four weeks.The changes in gene expression and subsequent analysis of functionally related pathways indicated activation of molecular circuits that could modulate host inflammation.However, such predictions will need to be validated in future studies involving placebo-fed control groups, with consumption of LGG in the presence of a provocation such as an infection, and with the inclusion of other subject populations.

Fig 3 .
Fig 3.  Ingenuity top gene network interaction reflecting immune response-related transcriptome changes after consumption of LGG.Nodes in the interaction network are encoded by differentially expressed genes detected by edge-R in blood from subjects consuming LGG for 28 days, up-regulated genes are depicted in shades of green and down-regulated genes are in shades of red.Transcriptional information derived from IPA knowledge database on interactions between the nodes (activation, expression, molecular cleavage or phosphorylation) was projected onto the interaction map with predicted downregulation effects represented with blue dashed lines and upregulation effects with orange lines.From this interaction map it can be seen that several downstream genes including growth factors, peptidases, Gcoupled receptors and cytokines that are known to be regulated by NF-KB transcription factor are downregulated.

Fig 4 .
Fig 4. Downstream effect analysis (DEA) on whole blood cells of subjects consuming LGG for 28 days.(A).The visualization is a hierarchical heatmap generated from edgeR analysis with filtered data where the major boxes represent a family (or category) of related functions.Each individual colored rectangle is a particular biological function or disease and the color indicates its predicted state: Increasing (orange), or decreasing (blue).Darker colors indicate higher absolute Z-scores.In this view the size of the rectangle is correlated with increasing overlap significance (p-value).The image has been cropped for better readability.(B) Heat-map comparison of Diseases and Biofunctions affected across all 4 analysis (edgeR 0.1 cpm/all, edgeR 0.1cpm/ 22, DESEq2, TSPM).Similarly color represents predicted state.(C).Individual Z-scores and mean Z-scores per each Bio Function affected.The Z-score algorithm is designed to reduce the chance that random data will generate significant predictions.Negative Z-scores indicate a down-regulation of Biofunction, positives Z-scores indicate an up-regulation of function.Absolute Z-score values higher than 2.0 can be used to make biological predictions.doi:10.1371/journal.pone.0147426.g004

Table 1 .
Relative abundance of bacterial species in fecal samples after LGG treatment.
).The DESeq2 package detected a larger number of DEG (282 down-regulated, 51 up-regulated) changing by at least 1.2 fold with a FDR adjusted p-value<0.05,includingFCER2andLY86among the top four genes with an additional 654 DEG (412 down-regulated, 242 up-regulated) at a higher FDR adjusted p-value threshold of <0.1 (Fig2) (S5 Table).Similar to edgeR, no DEG were detected with DESeq2 analysis at day 56 when compared to baseline levels (data not shown).
Genes that met the count abundance criteria with mean counts of at least 1 in a minimum of 2 samples with non-zero counts (n = 19,575) were used for TSPM analysis.A total of 890 and 63 DEGs were identified with an over-dispersed and Poisson gene distribution, respectively.At day 28 -, 953 DEG (574 down-regulated, 379 up-regulated) with adjusted FDR p-value<0.1 were identified, only 29 with a FDR-adjusted p-value <0.05 (S6 Table), however, most of the changes were less than the 0.6 log fold cutoff (Fig 2).At day 56, only a few DEG with Poisson distribution were detected (adjusted FDR p-value <0.1, log fold <0.6) (data not shown).When edgeR, DESeq2 and TSPM DEG lists were compared 95 common DEG (77 down-regulated, 18 up-regulated) (FDR p-value <0.1) were identified across all three analysis platforms (S4 Fig).Several DEG (n = 19) with very low cpm were detected by edgeR and TSPM but not by DESeq2 (i.e, RNASE1, SIGLEC11,C1orf132, ZNF593, SFTPD, CBLN3, SLC35E2,GLIS3, PXMP2, C10orf98, FUT10, COCH, ESM1, LYPD2, CLEC11A, LIPC, SYCE1L, LBRC24,PLEKHM3) (Table (46)highest IPA network score corresponded to edgeR results when the 22 sample filter was applied.Lower scores with less focus molecules were generated from TSPM results.Molecular networks with scores > 20 (p-value<1E -20), involving processes such as Cellular movement, Immune Cell Trafficking, Hematological system development and function, Cell to Cell Signaling and Interaction, and Inflammatory Response, were identified as the top common networks in response to LGG treatment (Table3).The molecular network with the highest score(46)related to Cell to Cell Signaling and Interaction and Inflammatory [49,50]e included the top down-regulated DEG identified across platforms, FCER2 (CD23) (FDR adjusted p<0.05) (Fig3) that encodes the low affinity transmembrane glycoprotein receptor that modulates IgE synthesis and homeostasis in B cells[49,50].Potential stimulatory signals for FCER2 expression from other molecules such as RNASE1 and human BCR complex

Table 2 .
Common whole blood DEG identified by different RNA-seq analysis platform in elderly subjects after a 28 day treatment with LGG.

Table 3 .
Predicted top molecular networks affected by LGG treatment after 28 day intervention.

Table 4 .
Connectivity-map analysis results for the interventions of healthy adults with Lactobacillus rhamnosus GG.