A Module of Human Peripheral Blood Mononuclear Cell Transcriptional Network Containing Primitive and Differentiation Markers Is Related to Specific Cardiovascular Health Variables

Peripheral blood mononuclear cells (PBMCs), including rare circulating stem and progenitor cells (CSPCs), have important yet poorly understood roles in the maintenance and repair of blood vessels and perfused organs. Our hypothesis was that the identities and functions of CSPCs in cardiovascular health could be ascertained by analyzing the patterns of their co-expressed markers in unselected PBMC samples. Because gene microarrays had failed to detect many stem cell-associated genes, we performed quantitative real-time PCR to measure the expression of 45 primitive and tissue differentiation markers in PBMCs from healthy and hypertensive human subjects. We compared these expression levels to the subjects' demographic and cardiovascular risk factors, including vascular stiffness. The tested marker genes were expressed in all of samples and organized in hierarchical transcriptional network modules, constructed by a bottom-up approach. An index of gene expression in one of these modules (metagene), defined as the average standardized relative copy numbers of 15 pluripotency and cardiovascular differentiation markers, was negatively correlated (all p<0.03) with age (R2 = −0.23), vascular stiffness (R2 = −0.24), and central aortic pressure (R2 = −0.19) and positively correlated with body mass index (R2 = 0.72, in women). The co-expression of three neovascular markers was validated at the single-cell level using mRNA in situ hybridization and immunocytochemistry. The overall gene expression in this cardiovascular module was reduced by 72±22% in the patients compared with controls. However, the compactness of both modules was increased in the patients' samples, which was reflected in reduced dispersion of their nodes' degrees of connectivity, suggesting a more primitive character of the patients' CSPCs. In conclusion, our results show that the relationship between CSPCs and vascular function is encoded in modules of the PBMCs transcriptional network. Furthermore, the coordinated gene expression in these modules can be linked to cardiovascular risk factors and subclinical cardiovascular disease; thus, this measure may be useful for their diagnosis and prognosis.


Introduction
Circulating stem/progenitor cells (CSPCs) contribute to the maintenance of the normal functions of blood vessels and tissues and their repair and regeneration [1]. These cells may also promote tumor growth by facilitating neovascularization or the development of tumor stroma [2]. CSPCs and other leukocytes mediate these actions through the release of paracrine factors [3] and occasionally by transdifferentiation [4]. The numbers and functions of CSPCs are impaired by exposure to cardiovascular risk factors, such as aging, diabetes, hyperlipidemia, or hypertension (for a review, see [5]). Moreover, the frequency of CSPCs was inversely related to subclinical vascular diseases, including endothelial dysfunction and arterial stiffness [6].
A major obstacle to progress in this field has been a lack of consensus regarding the precise molecular markers that define these regenerative pathways [7]. This problem is compounded by the limited accuracy and reproducibility of the current methods used to quantitate CSPCs, such as flow cytometry [8] and in vitro colony formation assays for 'early' [9] or 'late' [10] progenitor cells. Potential novel tools that may be used to address these issues include the emerging network sciences as applied to biology and medicine [11].
Transcription progressively enables primitive cells to acquire a differentiated phenotype [12], whereas the expression of primitive genes is indicative of cell stemness in both bone marrow and blood [13]. However, the reason for the presence of mRNA for tissuespecific differentiation genes in circulating cells is less clear. Because more than 80% of the genes expressed in human peripheral blood are also expressed in other body tissues [14], mRNA profiling of leukocytes has been proposed as an accessible window to the multi-organ transcriptome [15]. Additionally, the transcriptional landscape, including those of adult hematopoietic stem cells and adult leukocytes, is organized as a modular network of co-expressed genes [16]. Cardiovascular disease-associated transcriptomic signatures are known to exist in peripheral blood [17]; however, none has yet been found to specifically contain CSPC markers or be directly relevant to vascular function in healthy subjects.
Our hypothesis was that the origins of primitive and tissuespecific mRNAs in peripheral blood mononuclear cell (PBMC) samples would be primarily, although not exclusively, in CSPCs. If supported by data, then the coordinated expression of CSPCderived mRNAs should be detectable in peripheral blood transcriptional profiles and reflect the function of the corresponding tissues, similar to the actual tissue-specific CSPCs. Here, we developed and functionally validated such a method, which applies network science to transcriptomic analyses.
Because the high-throughput charting of a transcriptome either produces many irrelevant hits or is often too insensitive for specific targets [18], predesigned gene panels are increasingly used for the detection of gene expression signatures in tissues [19] and the assessment of pluripotency [20] or differentiation hierarchy in stem cells [21]. To detect rare transcripts, the most reliable technique to date remains quantitative real-time PCR (qRT-PCR), which is accurate, precise, more sensitive than microarrays, and more specific for mature transcripts than RNA sequencing [18]. qRT-PCR has been used to generate transcriptional networks from as few as 18 transcription factors [22] to as many as 280 of the 'most-used' hand-picked stem cell markers [21]. The transcriptional signatures of individual CSPC-associated markers have been previously detected using qRT-PCR in human and animal peripheral blood at baseline and after major cardiovascular and neurovascular events [23].
As reported here, microarrays (specifically, Affymetrix Gene-Chips) were unable to detect many of the stem cell-related genes in PBMCs samples, a finding that confirmed previous studies [21]. Thus, to create a sensitive and efficient assay, we designed a gene panel that included the most-recognized stem/progenitor cell and tissue differentiation markers (Table S1). These markers were simultaneously measured in PBMCs using qRT-PCR and then subjected to a network analysis and validated for vascular function.

Subjects
Institutional Review Board (IRB) approvals were obtained from The Ohio State University and Emory University, and all subjects signed written informed consent forms. We recruited 26 healthy volunteers at Ohio State University and 20 hypertensive subjects at Emory University (Table 1 and Table 2, respectively).

Blood pressure and pulse wave measurements
Brachial blood pressure (BP) and radial pulse wave values were measured in the healthy subjects using a manual oscillometric monitor with a standard adult cuff and the SphygmoCor device (AtCor Medical, Sydney, Australia), respectively. The latter provides a validated generalized transfer function to convert the peripheral radial arterial pulse wave into the equivalent central aortic arterial pulse wave, which was used to analyze the derived aortic pressure waveform and extract the aortic pulse pressure (AoPP) and the augmentation index (AIx) of the pulse, a surrogate measure of vascular stiffness [24].

Magnetic resonance elastography (MRE)
MRE is a magnetic resonance imaging (MRI)-based method that directly determines organ stiffness and has been validated in a study of hypertensive patients [25]. In aortic MRE, vibrations are applied to the abdomen by an electro-mechanical driver, which transmits mechanical waves to the aorta. A phase-contrast MRI sequence synchronized to externally applied vibrations measures the wave displacement field, which is mathematically converted to stiffness [26].

Isolation of PBMCs
Blood was collected into a BD Vacutainer K 2 EDTA (BD Bioscience, Franklin Lakes, NJ), diluted 1:1 with Hanks balanced salt solution (HBSS) (Invitrogen Life Technologies, Grand Island, NY), layered onto one volume of Lymphocyte Separation Medium (Cellgro Mediatech Inc., Manassas, VA), and centrifuged at 7006g for 30 min at room temperature. The mononuclear cells were collected, diluted 1:1 with washing buffer (PBS supplemented with 2 mM EDTA and 2% FBS) (Gemini Bio-Products, West Sacramento, CA), and centrifuged at 3006g for 7 min. The pellet was resuspended in 5 mL of 0.8% ammonium chloride solution (STEMCELL Technologies Inc., Vancouver, Canada) for 5 min to lyse any remaining erythrocytes. To remove as many platelets as possible, the cells were washed two more times as described above with two volumes of washing buffer.

RNA extraction and qRT-PCR
Total RNA was isolated using the RNeasy Mini Kit (Qiagen, Valencia, CA) according to the manufacturer's protocol, tested for quality, and stored at 280uC until use. The VILO kit (Life Technologies/Invitrogen, Grand Island, NY) was used to reverse transcribe 150 ng of total RNA. Primers (Qiagen) were diluted 1:20 with molecular-grade water, and 5 mL/well were added to 384-well plates using a Biomek FX Laboratory Automation Workstation (Beckman Coulter, Inc., Brea, CA). The plates were left to dry overnight in a sterile hood and stored covered at 220uC until use. qRT-PCR was performed using SYBR Green (Qiagen) and a 7900HT Real-Time PCR System (Life Technologies/ Applied Biosystems, Foster City, CA) operated in standard mode. All of the runs contained a dissociation step. The samples were amplified in duplicate in a total volume of 5 mL. The results are expressed as the relative copy number (RCN), defined as RCN = 2 2DCq 6100, where DCq is the difference Cq(target) -Cq(reference) [27]. As a reference for normalization, we used the median Cq values of three endogenous controls (beta-2 microglobulin, GAPDH and RPL13). Data were were uploaded to Gene Expression Omnibus (GSE56327).

Fluorescence RNA in situ hybridization (ISH)
Cells (up to 1610 6 ) from eight volunteers were fixed in suspension (4% formaldehyde in PBS for 1 h) and deposited onto Superfrost Plus microscope slides (Fisher Scientific, Pittsburg, PA) by cytospin centrifugation (1806g, 5 min) in a Cytospin 2 Shandon Centrifuge (Block Scientific, Inc., Bohemia, NY). The slides were dried at 37uC for 1 h and stored in 100% ethanol at 280uC until use. ISH was performed using the QuantiGene ViewRNA kit (Affymetrix/Panomics Solutions, Santa Clara, CA) according to the manufacturer's protocol with the following probe sets: FSHR-FITC, NES-Cy3, and KDR-Cy5, and the nuclei were counterstained with DAPI (Sigma-Aldrich, St. Louis, MO). In preliminary tests, we found that a 1:1000 dilution of proteinase K was optimal. The samples were analyzed with an Olympus FB1000 confocal microscope (Olympus America Inc., Melville, NY). The 40x objective and 3x zoom were used to acquire Zstacks from 5-8 fields per sample, for a total of 100-600 cells per subject. The acquisition settings were adjusted based on the negative control slides (no hybridization probe). The images were analyzed with CellProfiler 2.0 software [28] (http://www. cellprofiler.org/), as follows. (i) Using FluoView software (Olympus), all of the images for each Z-stack per each channel were exported as .tiff files. (ii) Using MetaMorph software (Molecular Devices, Sunnyvale, CA), all of the images from the Z-stacks were assembled into a separate .stk file for each channel. (iii) Using CellProfiler, we built four pipelines, one for each channel (blue: DAPI/nuclei, green: FSHR-FITC, red: NES-Cy3, and cyan: KDR-Cy5), which uploaded the respective stacks and generated a projection. Steps 1-3 were necessary because the CellProfiler software could only recognize MetaMorph stacks. (iv) Next, we built a pipeline for the analysis of these projections. An important step was the setting of thresholds. To detect nuclei ('primary objects'), we entered a valid range of diameters (in pixels), and the threshold was set such that .95% of nuclei (including clumped nuclei) were detected accurately. Cell boundaries ('secondary objects') were estimated by moving 10 pixels outward from the cell nucleus (Distance-N method). For RNA quantification, the thresholds for each channel were set individually based on negative controls. We measured spot numbers, spot intensities, and a variety of nuclear parameters (such as intensity, area, and texture) offered by the software. (v) Finally, the data were exported into Excel and pooled from all of the images/subjects. At this point, we manually removed all nuclear objects that were inaccurately detected such as clumps that could not be segmented into individual nuclei and small fragments that clearly corresponded to cellular debris based on size (the 'nuclear area' parameter). Finally, each cell (and its associated measurements) was given a unique ID. All of the subsequent statistical analyses and visualizations were performed using various statistical packages (vide infra).

Immunocytochemistry (ICC)
For ICC, PBMCs from four subjects were fixed in 1% formaldehyde in PBS for 1 h, cytospun as described, and used immediately. The cells were permeabilized with Triton X-100 (Fisher Biotech, Pittsburgh, PA) for 10 min at room temperature and blocked with 5% BSA + FcR Binding Inhibitor Purified (eBioscience, Inc., San Diego, CA) for 30 min at room temperature. The cells were incubated sequentially with the following antibodies: (i) rabbit polyclonal anti-FSHR antibody; (ii) Alexa Fluor-594 goat anti-rabbit antibody; and (iii) Alexa Fluor-488 monoclonal anti-human nestin antibody (Abcam, Cambridge, MA), along with Alexa Fluor-647 mouse anti-human CD309 (BD Pharmingen, San Jose, CA). DAPI was used as a nuclear stain, and the cells were mounted in Fluoromount-G (SouthernBiotech, Birmingham, Al). Alexa Fluor-488, Alexa Fluor-647 mouse IgG 1 isotype, and Alexa Fluor-594 goat anti-rabbit IgG (BD) were used as controls. The preparations were imaged with an Olympus FV 1000 confocal microscope using the 40x objective and 2x zoom, and negative controls were used for setting the lasers as described for ISH. We acquired 5-6 random images per slide, for a total of 300-700 cells per subject. In each fraction of cells (including most of those that were single-and double-positive for KDR and/or FSHR), NES and FSHR were localized to a few, usually 1-4, compact circular structures that were intimately associated with the nucleus or nuclear grooves (possibly representing spurious cross-reactions of our antibodies with either storage or degradation compartments). Therefore, we excluded these features from quantification. The images were analyzed with the CellProfiler software using a different pipeline. To adjust the threshold correction factors for each channel, we used the negative controls. The data were exported into Excel and further processed as described for ViewRNA, step 5. Both pipelines were uploaded on the CellProfiler website as VR-3Plex-FNK.cp and ICC-NFK.cp, respectively.

In silico gene expression analysis
Studies were selected from the National Center for Biotechnology Information's Gene Expression Omnibus (GEO) according to the following criteria: (i) they employed Affymetrix Human Genome U133 Plus 2.0 Arrays (platform GPL570); (ii) they used isolated PBMCs; and (iii) they included healthy subjects as controls. We chose eight studies (GSE: 8507, 10041, 11761, 14642, 19743, 21942, 27034, and 46480) and only included the GSM files that corresponded to controls in the analysis (a total of 274 GeneChips), and a study on isolated bone marrow-derived CD34 + cells (GSE 23025) [29]. All of the .cel files were imported into Expression Console (Affymetrix), and we used the MAS5 algorithm for normalization and signal and presence call detection. We then extracted the information pertaining to the genes of interest. For genes represented by several probe sets on the array, we calculated the median signal value for each probe set over all the arrays and retained the sets that had the highest value. Finally, the 'Present', 'Marginal', and 'Absent' calls were recorded for each gene, and the 'Presence Score' was calculated as a percentage of all 274 arrays. Because genes with 'Marginal' scores were found only on a small number of arrays, all of these were considered 'Absent'.

Pattern analysis and network visualization
To mine patterns from the gene co-expression matrix data, we followed the network mining and merging workflow described by Xiang et al. [30]. First, we converted a gene co-expression dataset into a unweighted graph by creating an edge between any two genes with an absolute correlation value greater than 0.7. After the graph was created, we applied the Bron-Kerbosch algorithm [31] to generate all of the maximal cliques. We then applied the  Fig. 2A). The data are expressed as the means 6 SD. Inset. Cq values for housekeeping genes used as endogenous controls. Of note, the large SD displayed by CXCR4 was due not to outliers but to the skewness of the data distribution. B. Actual RCN values of the 45 tested genes in 26 healthy subjects, indicating the coordinated expression of the majority of the genes (conventional color coding). doi:10.1371/journal.pone.0095124.g001 network merge approach [30] to these cliques under a density threshold 0.8, which guaranteed that each resulting sub-network induced a sub-matrix with an average correlation value greater than 0.8 on the original gene co-expression matrix. Finally, we visualized the discovered sub-networks using Gephi [32] (www. gephi.org).

Data analysis
ANOVA, t-test, Mann-Whitney test, and various correlation statistics (Pearson correlation, linear regression, principal component analysis, hierarchical clustering, Cronbach's alpha) were performed using JMP 10

Limitations of the microarrays in detecting mRNAs with low expression levels
We first attempted to determine the transcriptional signature of CSPCs in blood using the Affymetrix GeneChips, as expression levels of a panel of the most common stem/progenitor cell and differentiation marker genes (Table S1, with annotations). We analyzed a set of 274 microarrays compounded from public databases, representing 7 studies of PBMCs and one study of purified CD34 + stem cells, all from healthy human subjects. Gene detection, determined as a 'Present' call by the MAS5 algorithm, varied as follows. (i) Other than housekeeping genes (B2M, GAPDH, and RPL13), only leukocyte genes (CD14, CD79A, CD3E, ITGAM, PECAM1, NT5E/CD73, and PTPRC/CD45) and a few others (ACTA2, ALDH1A1, BGLAP/osteocalcin, CX3CR1, and CXCR4) had a 'Present' call on more than 90% of the arrays. (ii) Several primitive (GATA4, NANOG, NES, NKX2-5, POUF5F1/OCT4, and THY1) and differentiation (CAV3, CDH5, CNN1, FSHR, KRT14, NOS3, and TEK/TIE2) genes were completely undetected (i.e., 0% 'Present' call). (iii) The other tested genes had a variable representation on the arrays (Table  S2). Surprisingly, even the mRNA for CD34, the marker for which the cell suspension had been enriched using magnetic immuno- selection, was detected on only 80% of the arrays. These findings confirmed the low sensitivity and/or poor reliability of the GeneChip microarrays for the detection of rare transcripts, as previously noted [33].

Quantification by qRT-PCR and organization of a PBMC transcriptional sub-system
In contrast, all of the tested mRNAs were detected in PBMCs from all of the healthy adults using qRT-PCR ( Fig. 1 and Table   Figure 4. Modular organization of a PBMC gene sub-network. A. Network representation of the genetic covariation. The thickness of each connecting line is proportional to the absolute value of the respective Pearson's correlation coefficient. Genes that were significantly correlated with the age, AIx, AoPP and BMI of the subjects are encircled (cf. Table 3). Color coding identifies the participation of genes in separate underlying clusters [30]. B. Scaling of nodes' clustering coefficient C(k) with their connectivity degrees k, as a signature of hierarchical networks. Note that the data spontaneously split into two subpopulations, suggesting distinctly organized modules (for clarity, the leukocyte genes were omitted). Members of Module 1 (right), corresponding to the functionally filtered group in A (same color convention), had higher clustering values for similar k values than those in Module 2 (left), indicating stronger transcriptional coupling. C. Genes connected to the KDR node. Note that these connections perfectly overlap those of Module 1, while PROM1/CD133 serves as a link with Module 2, arguing that KDR is a hub node of Module 1. D. Connections of another hub node, NES. The images in A, C and D are based on Pearson correlation coefficients r.|0.8| and were obtained using the software Gephi 0.8 beta (www.gephi.org). The data shown in B were also obtained using Gephi, based on the network analysis in A. doi:10.1371/journal.pone.0095124.g004 S1). Genes with the smallest relative copy numbers (RCNs) represented the primitive and tissue-specific (including cardiovascular) genes, whereas the leukocyte-associated genes had much larger RCNs (Fig. 1A). We also observed a pattern suggestive of covariation between many of the mRNA markers (Fig. 1B). Regression analysis demonstrated that indeed the expression levels of cardiovascular genes were highly correlated with each other and with those of several primitive genes ( Fig. 2A and 2B, respectively). A strong covariation was demonstrated as well between several tissue-associated genes and a different set of primitive markers (Fig. 2C). In these comparisons, we also observed poor and/or even negative correlations, e.g., between primitive or cardiovascular genes and leukocyte markers, such as CD45 (Fig. 2D).
These results were centralized in a covariation matrix (heatmap) using the hematopoietic/endothelial progenitor marker KDR/VEGFR2 as a reference [1] (Fig. 3A). The associated statistics matrix, after adjustment for multiple comparisons, revealed that a large majority of the Pearson coefficients were also statistically significant (Fig. 3B). The majority of primitive genes co-segregated with KDR in a group that contained most of the cardiovascular genes (Fig. 3A, B). This information was retrieved using unsupervised hierarchical clustering, thus objectively confirming that the majority of the primitive and cardiovascular genes aggregated together (Fig. 3C, solid bracket), while the leukocyte genes were less organized, and the tissue differen-tiation markers formed a separate cluster (Fig. 3C, dashed bracket).

Network analysis of gene markers
Because the known CSPC types are characterized by multiple combinations of molecular markers [7], which were not well captured by dendrograms due to their linear character, we also used a network representation [34]. In this analysis, genes are considered nodes connected to other genes via links (or edges), based on a preset level of probabilistic significance (in our case, Pearson correlation coefficients larger than 0.8 at p,0.05). The number of edges between a node and the other genes to which it is connected is called its degree k, and the probability of its connections with all the other genes in the network is measured by a clustering coefficient C(k) [34]. This bottom-up analysis retrieved the grouping of marker genes shown in Fig. 3A as tightly interconnected gene communities [35] (Fig. 4A). To validate the modular properties of these communities, we displayed each node's clustering coefficient C(k) as dependent on its degree k. A negative relationship on a log-log scale between k and C(k) is considered the signature of hierarchical modularity in a network [34]. In this respect, our data objectively revealed the existence of two distinctly organized modules (Fig. 4B), each composed of the same assortment of primitive and differentiation markers as the clusters described in Fig. 3. The genes in Module 1 had higher Table 3. Module 1 gene correlations with physiologic parameters.
(3) Significant correlations with BMI were found only in females. (4) Pearson's correlation coefficient. (5) Genes that were significantly correlated with all four parameters are in bold. To further confirm the modular nature of this transcriptional network, some nodes (called hubs) were directly connected with all of the others in the corresponding module and with key nodes in the neighboring communities [34]. Based on this criterion, two nodes (KDR and NES, Fig. 4C, D) were connected with all of their neighbors, and both extended to PROM1/CD133 as a common contact node in Module 2, a property that prompted us to further investigate these nodes at the single-cell level (see below).

Cardiovascular functional validation of genes from Module 1
To determine the physiological significance of the genes contributing to these co-expression patterns, we studied their individual relationships with the following characteristics of the blood donors: (i) age; (ii) the augmentation index (AIx) of their pulse, a surrogate measure of vascular stiffness [36]; (iii) blood pressure parameters (measured using applanation tonometry [37]); and (iv) body mass index (BMI). Significant correlations (p,0.05) were observed between these physiological variables and a group of 15 genes (Table 3), all of which belonged to the same module (Module 1, Fig. 3C, bracket; Fig. 4A, encircled; and Fig. 4B).
These 15 genes were the following (as annotated in Table S1): (i) primitive: CD117/KIT, CD338/ABCG2, NANOG, NOTCH4, and POU5F1/OCT4; (ii) primitive/endothelial: CD34, KDR/ VEGFR2, NES, NOS3/eNOS, OLR1/LOX-1, and TEK/ TIE2, as well as ALPL (alkaline phosphatase, also mesenchymal), CNN1, and FSHR (neovascular); and (iii) primitive/cardiac: NKX2-5 and CAV3. The relationship between the Module 1 genes and BMI showed marked differences between the sexes; in females only, these genes were positively correlated with BMI. Additional Module 1 genes (and a few genes from Module 2) were significant in only two or three of the four tests (Table 2) and/or exhibited lower network connectivity at the interface of the two modules (Fig. 4B) and were thus not included in the definition of Module 1.

Personalized representation of gene analysis
We sought to consolidate the individual gene expression information into a parameter that would combine the contributions of all of the 15 Module 1 genes that exhibited significance in the above correlations with the physiological variables ( Table 2). For each healthy subject, we calculated a modular index (MI) (also known as a metagene) [17], the average of the standardized RCN values of the module's genes (Fig. 5). As expected from the dependence of individual genes on the respective physiological variables, the MI values of all the subjects exhibited negative correlations with age (Fig. 5A), vascular stiffness (Fig. 5B), and aortic pulse pressure (Fig. 5C), as well as a positive correlation with BMI in females (Fig. 5D).
Additionally, we investigated whether these 15 genes were sufficient for the ability of MI to correctly characterize the correlation between the metagene and the respective physiological variable (e.g., age, AIx, and blood pressure). Therefore, we calculated the Cronbach's alpha coefficient for internal consistency, defined as the correlation between different test items (in our case, the genes), that determines whether these items collectively represent the same general construct [38]. High Cronbach's alpha coefficients are generally desirable, but if too high (conventionally .0.95), they may indicate item redundancy [38]. For the genes contained in the metagene MI, the Cronbach's coefficient was indeed .0.95, indicating that this set of variables was not only consistent but also saturated.
For a more detailed representation of gene-specific information, we also generated radial diagrams that displayed the individual values of analyzed genes as the percentages vs. the median levels in  Fig. 2A. Note the pattern differences in Module 1 genes between females with normal BMIs (upper row) and those with higher BMIs (lower row). B. Association of aortic stiffness (left-side images) with radial gene profiles (right-side graphs) of two subjects: left, female, 56 years, average stiffness of 5.25 kPa; right: male, 52 years, average stiffness of 6.17 kPa. Aortic stiffness was measured by MRE, as described the Methods section; the local elasticity distribution is color coded, as shown in the scale at right (in kPa). doi:10.1371/journal.pone.0095124.g006 the population. For example, we compared the personalized diagrams of female blood donors with normal weights against those with BMIs in the overweight range (Fig. 6A). We noted that the sector of the diagram containing Module 1 genes was better represented in the latter group (in agreement with the positive correlation with BMI found for Module 1 genes in women, Fig. 5D), while the relative amplitude of Module 2 members decreased progressively.
The benefit of this representation was also illustrated by comparing gene expression patterns in subjects with various degrees of vascular stiffness, objectively determined by MRE. The results showed a reduction of gene expression in both modules with increasing aortic stiffness in these otherwise healthy subjects (Fig. 6B).

Modifications of PBMC gene expression in hypertensive patients
The genes from Module 1 had inverse correlations with several cardiovascular risk factors (age, blood pressure, and aortic stiffness), which suggested that even stronger alterations might be found in patients with established vascular pathologies. To test this hypothesis, we utilized a cohort of cardiovascular patients treated for hypertension in an outpatient cardiovascular clinic (Table 3). We found that the expression levels of the tested genes in PBMCs from these patients closely paralleled those in healthy subjects (Fig. 7A). However, with a few exceptions involving non-significant differences (Fig. 7A), the RCNs were reduced compared with the healthy population, an observation also captured using their metagene MI (Fig. 7B). Covariation heat-maps revealed obvious disruptions in the co-expression patterns of these genes in the patients compared with the healthy controls (Fig. 8A, B; compare with Fig. 3A, B). The hierarchical clustering was also sensitive to these rearrangements (Fig. 8C). The network representation showed that the connectivity of nodes within and between both modules was largely increased (Fig. 8D), as they collapsed in a common, highly coupled sub-network (Fig. 8E). Remarkably, the range of connectivity degree k values of these module's nodes in patients was reduced (23-26, Fig. 8E) compared with that in the healthy subjects (17-25, Fig. 4B), arguing for lower connection variability among node genes.
The radial diagrams indicated that in addition to the conspicuous reduction in expression levels in both modules (Fig. 9A), the patients treated with the diuretic thiazide displayed gene levels closer to normal than those taking other medications (Fig. 9B). The patients with hypertension exhibited unchanged levels of ALPL, a tissue-type alkaline phosphatase gene expressed by many non-vascular cells (including osteoblasts [39]) but also by vascular lineage CSPCs [40]. The following genes also exhibited module-independent variations among patients: (i) ALDH1A1, a marker of primitivity with numerous cardiovascular implications [41]; (ii) CXCR4, the receptor for SDF-1, which is essential for CSPC recruitment [42]; and (iii) ST3GAL2, a transcriptional marker of SSEA-4 [43], which has been associated with adult bone marrow-derived mesenchymal stem cells [44], and with the controversial [45] very small embryonic-like stem cells [46].

Cellular origins of gene co-expression
To identify the origins of gene expression we analyzed mRNA and protein expression at the single-cell level. We used cytospun PBMCs probed for the Module 1 hub nodes KDR and NES. To these, we added the follicle-stimulating hormone receptor (FSHR), a marker of neovascular endothelium [47], which was strongly correlated with the other two nodes ( Fig. 2A and Table S1). For mRNA detection, we performed in situ hybridization, followed by an automated image analysis. In all of the tested samples, we found multiple cells expressing these marker mRNAs in various proportions (Fig. 10A, B). To determine whether the Module 1 covariation pattern was derived from the coordinated presence in the circulation of cells separately expressing these genes, we performed two-by-two comparisons of the frequencies of singlepositive cells in the same individuals, which did not show a correlation (Fig. 10C-E, insets). Instead, in support of a genuine within-cell co-expression mechanism, we found a high proportionality of the signals from all three mRNAs only within triplepositive cells (Fig. 10C-E). Given that nuclear morphology is sensitive to a cell's transcriptional activity [48], we examined whether nuclear parameters obtained by image analysis could distinguish the cells co-expressing the Module 1-related mRNAs form the other PBMCs. Indeed, nuclear shape and texture analysis showed that the nuclei of cells triple-positive for KDR/NES/FSHR exhibited a distinct morphology (Fig. 10F).
Finally, we searched for the proteins encoded by the mRNAs of interest by performing immunocytochemistry on parallel slides and found cells expressing all possible combinations of these antigens (Fig. 11A). The nuclear morphology of the triple-positive cells revealed by immunocytochemistry was similar to that detected by in situ hybridization, i.e., a larger apparent nuclear area and less textured compared with the other cells (Fig. 11B), thus confirming that the two methods detected the same cellular populations. In support of this observation, the frequency distributions of the antigen expression classes (i.e., single, double, or triple expressers) were also similar to those detected using mRNA in situ hybridization in blood samples from the same individuals (Fig. 11C).

Discussion
Here, we report several novel observations derived from an analysis of transcriptional activity in PBMCs: (i) unlike GeneChips, which failed to reliably detect approximately half of the gene targets, qRT-PCR detected all 45 members of a panel of primitive and differentiation marker genes in all the tested human blood samples; (ii) based on their strong covariation, the target genes segregated into two major clusters, which exhibited the connectivity properties of modules in a bottom-up reconstituted hierarchical transcriptional network; (iii) one of modules contained most of the primitive and cardiovascular differentiation markers; (iv) this module also correlated with several cardiovascular risk factors in the healthy blood donors, contributing to a cardiovascular-specific metagene; (v) the origin of genetic covariation was  found within individual cells of a subpopulation with distinctive nuclear properties; (vi) the levels of gene expression in both modules were significantly reduced in hypertensive patients; and (vii) the connectivity within the PBMC gene modules was largely amplified in hypertensive patients, leading to the fusion of these modules into a common sub-network.
The properties of these gene modules were consistent with what would be expected if they derived from CSPCs; in particular, Module 1 could be the signature of circulating endothelial progenitor cells. The dependence of Module 1 genes on BMI (in women) is in agreement with the finding that CSPCs are under the influence of adipose tissue, both as a source of pro-angiogenic chemokines (e.g., adiponectin, known to induce the release of CSPCs from bone marrow [49]) and as a possible direct originator of CSPCs [50]. However, these gene modules were not equated with any known underlying cellular classes for the following reasons: (i) the stochastic nature of gene expression [51,52]; (ii) cell plasticity, manifested as a transcriptionally dependent propensity for transdifferentiation [53]; and (iii) the possible horizontal redistribution of mRNA among different cell classes, possibly via extracellular vesicles acting as intercellular RNA carriers [54], which are particularly active among bone marrow stem cells [55]. Instead, we consider that although the Module 1 markers are primarily expressed by EPCs [7], they also receive contributions from other leukocytes with roles in maintaining vascular function, such as angiogenic (TEK/TIE2 + ) monocytes [56] or angiogenic T cells [57]. Other circulating bone marrow-derived cells of less certain nature also contribute to the maintenance of microvascular tone and normal blood pressure [58].
Equally important, many T lymphocytes positive for CD31 and CXCR4 were found at the core of the in vitro-formed colonies known as 'early' EPCs [59], which are bordered by KDR/ VEGFR2 + monocytic cells [60]. Despite their obvious nonendothelial nature, the numbers of these heterogeneous, bloodderived cell aggregates were highly correlated with the vascular function of the blood donors [61]. In a larger context, leukocytes Mean Int., mean intensity; Med. Int., median intensity; TDV, texture difference variance; TC, texture contrast; TV, texture variance; TSA, texture sum average. The data represent the means of standardized values 6 SEM. doi:10.1371/journal.pone.0095124.g010 Figure 11. Detection of cells expressing representative Module 1 node proteins using immunocytochemistry (ICC). A. Cells expressing various levels of KDR (white), NES (green), and/or FSHR (red). Nuclei are blue (DAPI). a, b: NES-FSHR double-positive cells; c: a triple-positive cell. Bars: 5 mm. B. A nuclear morphology analysis revealed alterations in the triple-positive cells detected using ICC that were comparable to those found using ISH (see Fig. 10 for abbreviations). The data represent the means of standardized values 6 SEM; a total of 1655 cells were analyzed. C. Frequencies of cells positive for the three marker genes, detected using ISH (gray bars) and ICC (black bars) (n = 2094 and 1655 cells, respectively; none of the intermethod comparisons were significant, demonstrating that they detected the same cell populations). The data represent the means of individual blood donors 6 SEM. doi:10.1371/journal.pone.0095124.g011 often express other tissue-specific genes, revealing a refined but poorly understood transcriptional cross-talk between blood and perfused tissues [15]. Thus, we consider that the gene expression pattern detected using our method diverges from the simple notion of circulating progenitor cells into a more complex underlying biological reality that is highly meaningful for vascular function. In addition, we assume that the Module 1 genes do not derive from circulating adult endothelial cells detached from the vascular intima, because the endothelial functional marker von Willebrand factor was not present among these genes. The fact that a class of cells with distinctive nuclear properties, suggestive of a transcriptionally active euchromatin, coordinately express the neovascular markers KDR, NES and FSHR, provides additional evidence that Module 1 genes derive from primitive rather than differentiated cells. However, we consider highly unlikely that all the 15 members of Module 1 to be simultaneously expressed by only one cell category, and thus we maintain that the origin of Module 1 is an emergent property of the CSPCs system. Vascular stiffness determines an individual's susceptibility to atherosclerotic plaque formation by predisposing the intimal endothelium to increased permeability to lipoproteins and the accumulation of monocytes [62]. Therefore, the presence of endothelial differentiation genes associated with primitive genes in the structure of Module 1 and their inverse correlation with AIx indicate a protective role of the respective blood cells in vessel health. Of note, AIx did not exhibit a direct relationship with the age of the blood donors in our limited population, arguing that the dependence of AIx on Module 1 components is not indirectly mediated through the effect of age on the expression of these components. The decrease in the levels of gene expression in PBMCs with an increase in the blood donor's age may be due to variations in the number of gene-expressing cells, changes in these cells' transcriptional activity (such as the known aging-sensitive dependence of transcription on the methylation of CpG islands in gene regulatory elements [63]), or both. Additionally, the inverse association between Module 1-derived MI and central aortic pulse pressure (AoPP), which was recently shown to predict the future development of hypertension in healthy human populations [64], suggests a potential role of Module 1 genes in protecting small resistance vessels.
Decreased collective expression of the genes in both modules was observed among the patients with established (and treated) hypertension. This finding is consistent with the reduction of CSPCs, including CD34-positive cells [65], in patients with hypertension, possibly as a direct effect of angiotensin [66] or in response to anti-hypertensive treatments [67]. Other novel observations of this study are that in the hypertensive patients, there is increased covariation of the genes in both modules and the distinction between these two modules vanishes. The increased network connectivity in the patients, despite the overall reduction in expression levels, suggests an amplified transcriptional coordination. In terms of gene network organization, the observed reduced informational heterogeneity is the signature of a lessdifferentiated state of the marker's originating cells, which is consistent with the higher transcriptional network entropy of primitive cells [68]. In support of this possibility, hematopoiesis is amplified in cardiovascular patients, based on blood gene expression signatures [17].
Among the limitations of this study is the relatively small population sample size. Despite this limitation, the correlations between genes were strong and significant, highlighting the power of our bottom-up network reconstruction method to extract meaningful information from small human populations. Admittedly, the set of genes examined here could still be an incomplete representation of the actual underlying module from where our target markers were extracted. This topic is worthy of future exploration, although we demonstrated that the number of components was sufficient for the use of Module 1 in the current form as an aggregate biomarker (or metagene) of vascular function. Finally, the other module that surfaced from our analysis (Module 2 in normal subjects) remains to be explored to determine its significance for the repair capacity of blood, which may occur in an organ-and/or disease-specific manner.
In conclusion, the results reported herein constitute the proof of concept for a novel bottom-up approach that is more sensitive and more accurate than the currently used high-throughput methods for the generation of a peripheral blood transcriptional network module useful for studying the collective contribution of circulating cells to vascular and tissue maintenance and repair.   (1) In those instances when a gene was represented by several probe sets on the array, we calculated the median signal value for each probe set over all of the arrays and retained the sets that had the highest value. (2) Presence Score was calculated as percentage from all 274 arrays, using MAS5 algorithm (Affymetrix Expression Console). The following GEO datasets were used to generate these data: GSE8507, GSE10041, GSE11761, GSE14642, GSE19743, GSE21942, GSE27034, and GSE46480 (3) Percentages in this column are calculated only from the arrays used in this study [1]. Reference