Systems Biology Approach Reveals Genome to Phenome Correlation in Type 2 Diabetes

Genome-wide association studies (GWASs) have discovered association of several loci with Type 2 diabetes (T2D), a common complex disease characterized by impaired insulin secretion by pancreatic β cells and insulin signaling in target tissues. However, effect of genetic risk variants on continuous glycemic measures in nondiabetic subjects mainly elucidates perturbation of insulin secretion. Also, the disease associated genes do not clearly converge on functional categories consistent with the known aspects of T2D pathophysiology. We used a systems biology approach to unravel genome to phenome correlation in T2D. We first examined enrichment of pathways in genes identified in T2D GWASs at genome-wide or lower levels of significance. Genes at lower significance threshold showed enrichment of insulin secretion related pathway. Notably, physical and genetic interaction network of these genes showed robust enrichment of insulin signaling and other T2D pathophysiology related pathways including insulin secretion. The network also overrepresented genes reported to interact with insulin secretion and insulin action targeting antidiabetic drugs. The drug interacting genes themselves showed overrepresentation of insulin signaling and other T2D relevant pathways. Next, we generated genome-wide expression profiles of multiple insulin responsive tissues from nondiabetic and diabetic patients. Remarkably, the differentially expressed genes showed significant overlap with the network genes, with the intersection showing enrichment of insulin signaling and other pathways consistent with T2D pathophysiology. Literature search led our genomic, interactomic, transcriptomic and toxicogenomic evidence to converge on TGF-beta signaling, a pathway known to play a crucial role in pancreatic islets development and function, and insulin signaling. Cumulatively, we find that GWAS genes relate directly to insulin secretion and indirectly, through collaborating with other genes, to insulin resistance. This seems to support the epidemiological evidence that environmentally triggered insulin resistance interacts with genetically programmed β cell dysfunction to precipitate diabetes.


Introduction
Our understanding of genetic basis of disease risk has greatly improved in recent years owing to the advent of genome-wide association studies (GWASs) [1,2]. The genes influencing common complex or multifactorial diseases and quantitative traits were largely unknown before GWASs came into being in the year 2006 [2,3]. Results obtained from these studies suggest that multiple genetic architectures, including common genetic variants with small effects and rare variants with large effect sizes, underlie susceptibility to common diseases [1]. Nearly 1,300 GWASs covering more than 650 diseases and traits have been reported over the past several years [4]. A typical GWAS involves typing hundreds of thousands of single nucleotide polymorphisms (SNPs) in thousands of control and affected individuals, and identifying SNPs that differ significantly between the two groups in terms of allele frequency as disease or trait associated [5][6][7].
Type 2 diabetes mellitus (T2D) is a common complex disease whose pathogenic mechanisms are known to a considerable extent [8,9]. Several organs including pancreatic islets, liver, skeletal muscle, adipose tissues, gut, hypothalamus and the immune system play a role in its pathogenesis [10]. Numerous multifactorial mechanisms that include genetic and environmental factors related to obesity are involved in the development of insulin resistance and impaired insulin secretion [8,9]. Insulin resistance is associated with inactivity, obesity and ageing [8]. The insulin secreting pancreatic islet b cells respond to insulin resistance by enhancing their mass and metabolic function. T2D however develops when increase in insulin secretion by b cells is not able to keep pace with the increase in insulin resistance [8,11]. The latter thus characterizes both prediabetic condition and T2D. Prediabetic insulin resistance state however does not always lead to diabetes; enhanced secretion of insulin by b cells compensates for deficient insulin action in a considerable proportion of prediabetic individuals who do not develop T2D. Though the inability of b cells to secrete enough insulin primarily typifies T2D, the dysfunction can also be demonstrated in normoglycemic subjects [12]. Therefore, derangements in both insulin secretion and Figure 1. Schematic representation of the workflow. T2D GWAS genes do not directly relate (indicated by 'X' on the left side) to pathways associated with disease pathophysiology. Conspicuously, effect of identified risk variants on continuous glycemic measures in nondiabetic subjects chiefly explains only perturbation of insulin secretion, not insulin resistance. Further, the genes found as associated with the disease do not clearly relate to processes and pathways consistent with the known aspects of T2D pathophysiology. The main aim of the present study was to ask the question (indicated by '?' on the right side) if GWAS data when considered in conjunction with interactome, toxicogenome and disease transcriptome data reveal genome to phenome correlation in T2D. Data available in public domain for GWAS, interactome and toxicogenome was used in the analysis. For disease transcriptome, new experimental data was generated. We specifically examined if interaction network of genes reported in T2D GWAS, genes showing altered expression after treatment with various antidiabetic drugs, and genes that are differentially expressed in insulin responsive tissues in male and female T2D patients do converge on insulin secretion, insulin resistance and other T2D associated pathophysiological pathways. doi:10.1371/journal.pone.0053522.g001 insulin signaling, involved in the regulation of several processes including glucose uptake into cells, seem necessary but not sufficient in causing T2D. Based on epidemiological findings, it has been proposed that interaction between environmentally triggered insulin resistance and genetically programmed pancreatic b cell dysfunction leads to the development of T2D [12][13][14].
Over 20 major GWASs for T2D have been performed and several are underway at present [2]. A majority of identified loci have been found consistently across studies [15]. However, effect of the risk variants on continuous glycemic measures in nondiabetic subjects shows that T2D susceptibility is primarily mediated through perturbation of insulin secretion rather than insulin signaling [2,12,[16][17][18][19][20][21]. Also, genes associated with T2D poorly represent established pathways of insulin signaling [12]. Global approaches to find statistically significant overrepresentation of functional categories in T2D associated genes have, although not provided clear evidence of the potential disease mechanisms, nonetheless identified enrichment of cell cycle regulation [2,17,19]. Considering that T2D associated genes representing cell cycle regulation are expressed in pancreatic islets, and that their disease association is mediated mainly through b cell dysfunction, the genetic evidence in the disease may seem to converge, to some extent, on insulin secretion [19]. Other than this limited convergence, the associated genes do not clearly confirm other known aspects of T2D pathophysiology including insulin signaling. This has led to the suggestion that either the disease is markedly heterogeneous or the critical aspects of disease pathophysiology are insufficiently captured by the presently available databases [2,17,19].
Genes identified in GWASs when evaluated in the context of complementary systems level data such as that related to proteinprotein interactions and to and gene expression can provide insights into the mechanisms underlying pathogenesis of complex traits [22][23][24]. Here, we have combined these approaches toward deciphering genome to phenome correlation in T2D ( Figure 1). Given that T2D GWAS genes do not directly relate to disease pathophysiology, our main aim was to examine if this genome to phenome correlation gap can be abridged by considering GWAS genes in conjunction with physical and genetic interaction, and gene expression data.

GWAS Genes
A catalog of SNP associations up to a p value cutoff of 1610 25 , a threshold commonly used for preliminary selection of SNPs in GWASs, exists in public domain [25]. As genes at this cutoff are considered meaningful for enrichment analysis [26], we retrieved genes reported in T2D GWASs at p value thresholds up to 10 25 (Dataset S1) and examined enrichment of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways therein. Maturity onset diabetes of the young (MODY), a Mendelian form of diabetes in  T2D, was the only pathway that showed enrichment in this analysis ( Figure 2). Whereas 10 28 cutoff genes showed enrichment of MODY only at a borderline significance, those at 10 25 threshold showed a robust overrepresentation. Of the five 10 25 cutoff genes in MODY, HNF1A and HNF1B were absent in 10 28 set while the latter alone was missing from 10 27 and 10 26 gene lists. These genes, like most of the other MODY genes, encode transcription factors that directly or indirectly affect expression of insulin and other proteins related to pancreatic b cell development and/or glucose metabolism [27]. Association of HNF1A and/or HNF1B with T2D has earlier been reported in candidate gene studies [28][29][30]. We thus focused on the gene list comprising 93 genes established from a GWAS cutoff of 10 25 . Dubbed ''T2D genome'' henceforth, we used this gene set in subsequent analysis.

Drug-gene Interactions
We used a toxicogenomic approach to further investigate if T2D interactome represents pathophysiology related pathways including insulin signaling. The Comparative Toxicogenomics Database (CTD) documenting gene-environment relationships has been used previously towards understanding T2D etiology [24]. We searched for genes that interact with antidiabetic drugs using CTD (Dataset S3). We then examined if T2D interactome is enriched in genes interacting with these drugs. For enrichment analysis to be statistically meaningful, gene lists associated with only those drugs were considered which had a minimum of five genes in common with T2D interactome. Notably, significant enrichment or a trend for the same was observed for all the drugs tested ( Figure 3). Further, genes interacting with all the antidiabetic drugs combined showed enrichment of various pathways relevant in T2D pathophysiology ( Table 2), as observed previously for T2D interactome ( Table 1). Cumulatively, the toxicogenomic analysis supported the importance of T2D interactome in unraveling genome to phenome correlation.

Genome-wide Expression Profiling
Next, we generated microarray expression profiles of skeletal muscle, visceral adipose and subcutaneous adipose from male and/or female T2D patients ( Figure 4). For each tissue tested, expression profiles were generated from three diabetic and three non-diabetic individuals. Also, each individual was profiled four times. The differentially expressed genes between diabetic and nondiabetic control groups were identified ( Figure 5 and Dataset S4). Significant genes at adjusted p value were subjected to pathway enrichment analysis. For all comparisons except female subcutaneous adipose, these genes showed enrichment of one or more pathways ( Table 3). Female visceral adipose showed a large number of enriched pathways including ErbB, Wnt, MAPK, T cell receptor, B cell receptor and Toll-like receptor signaling, and regulation of actin cytoskeleton, which, as mentioned above, have all been implicated in T2D. Enrichment of valine, leucine and isoleucine degradation in male visceral adipose is also consistent with metabolomic studies in T2D [50,51]. Similarly, enrichment of ECM-receptor interaction in male skeletal muscle is in consonance with known changes in the composition of the extracellular matrix in insulin-resistant muscle [52][53][54][55][56][57]. Quantitative real time PCR (qRT-PCR) confirmed differential expression in the same direction, or a trend for that, of all the genes tested to validate microarray results ( Figure 6). The genes used in qRT- PCR validation represented all the enriched pathways ( Table 3) besides others. These results demonstrated the robustness of our genome-wide expression profiling.

Convergent Pathway
Although we found enrichment of several pathways consistent with T2D pathophysiology in our microarray analysis, insulin signaling was conspicuous by its absence in the list of overrepresented pathways ( Table 3). Statistical corrections may be overly conservative to the point of being counterproductive in interpreting microarrays with genetic knowledgebases [58]. We thus explained this counterintuitive result by arguing that application of such corrections for identifying differentially expressed genes in microarrays and for pathway enrichment analysis of these genes may be together responsible for causing false negatives in our results. Since our main goal was to decipher genome to phenome correlation, we examined if T2D interactome (Dataset S2) is enriched for differentially expressed genes in various samples at unadjusted p value threshold ( Figure 5 and Dataset S4), called ''T2D transcriptome'' from now on. Enrichment will be expected if there is a genome to phenome correlation. Also, if enrichment is indeed observed at this level then genes overlapping between T2D interactome and T2D transcriptome will be expected to overrepresent pathways consistent with the disease pathophysiology.
Indeed, the overlaps between interactome and transcriptome gene sets were found to be statistically significant even after adjusting the p values for multiple testing ( Figure 7). Also, the overlapping genes were enriched in T2D pathophysiology related pathways including insulin signaling in female visceral adipose and male skeletal muscle ( Table 4). In male visceral adipose and female subcutaneous adipose, insulin signaling was however not enriched. Twelve pathways were common to all tissues and both genders ( Table 4). The common pathways were related to cancer, cell cycle, adherens junction, focal adhesion, pathogenic Escherichia coli infection and TGF-beta signaling. To examine known role of these pathway(s) in T2D pathophysiology, the PubMed was searched using the key words ''diabetes AND type AND 2 AND insulin AND (signaling OR action OR resistance OR sensitivity) AND (secretion OR pancreatic OR islets OR beta)'' in combination with term(s) representing each of these common pathway. The retrieved abstracts and/or papers and the references therein were manually curated to identify a pathway that is known to play a role in both insulin secretion and insulin signaling related  function. Literature search revealed that of these pathways TGFbeta signaling is particularly notable in that it is clearly known to play a crucial role in both insulin signaling as well as in insulin gene expression and pancreatic b cell function [36,[59][60][61]. Given this, we examined if removal of TGF-beta signaling genes from the list of T2D interactome-T2D transcriptome commonality genes negatively affect enrichment of insulin signaling in female visceral adipose and male skeletal muscle. There were 10 and 7 TGF-beta signaling genes in female visceral adipose and male skeletal muscle ( Table 5), in that order. We removed these genes from 200 and 137 commonality genes ( Figure 7) in T2D interactome-female visceral adipose and T2D interactome-male skeletal muscle comparisons and examined pathway enrichment in the resulting gene sets. Remarkably, compared to complete gene sets, 190 and 130 genes that remained after removing TGF-beta genes, in that order, showed less pronounced enrichment of pathways in general, and insulin signaling in particular ( Table 6). Importantly, T2D genome includes one of the TGF-beta signaling genes, CDKN2B. This gene is associated with T2D in diverse populations at genome-wide significance level (Dataset S1). Cumulatively, our analysis identified TGF-beta signaling as a connecting link between genome and phenome in T2D ( Table 7 and Figure 8).

Discussion
Candidate T2D genes identified in GWASs combined does not clearly confirm known aspects of disease pathophysiology. Our systems level analyses bridge this genome to phenome correlation gap. Bioinformatic analyses of disease associated genes using interactome and toxicogenome data first led us to connect T2D candidate genes identified in GWASs with disease pathophysiol- Correlations between all the eight groups of samples analyzed in microarrays are plotted as a dendrogram. As expected, muscle and adipose form separate clusters. Also, in adipose cluster, subgroups of adipose type and gender are observed. Globally normalized data was used for constructing the dendrogram. Con, control subjects; T2D, diabetic patients; SA, subcutaneous adipose; VA, visceral adipose; SM, skeletal muscle. doi:10.1371/journal.pone.0053522.g004 Figure 5. Numbers of up-and down-regulated genes in multiple insulin responsive tissues in T2D patients. Expression profiles of skeletal muscle, visceral adipose and subcutaneous adipose from male and/or female subjects were generated using Illumina HumanHT-12 v3 Expression BeadChip arrays that contain more than 25,000 annotated genes. IIlumina custom error model was used to identify up-and downregulated genes in T2D as compared to controls, with or without Benjamini and Hochberg correction for multiple hypotheses testing. The differentially expressed genes were identified at 613 Diff score threshold of Illumina custom algorithm, corresponding to a p value of 0.05. doi:10.1371/journal.pone.0053522.g005 ogy including aberrant pancreatic b cell development and function, and insulin sensitivity. We then experimentally validated this connectivity using transcriptomic analysis of multiple insulin responsive tissues from male and female diabetic patients. Our simple, intuitive and straightforward approach has been remarkably successful in uncovering genome to phenome correlation in diabetes. In general, we find that candidate T2D genes identified in GWASs can explain disease pathophysiology when the associated genes are considered together with their protein and functional level interactors. In other words, the physical and genetic interaction network of T2D associated genes overall relates well with the disease pathophysiology. Our toxicogenomic analysis supports this. This evidence was finally validated in our transcriptomic analysis. It is a long standing debate whether impaired insulin action or insulin secretion deficiency is the primary defect in T2D [12,16]. Epidemiological evidence has previously suggested that genetically programmed pancreatic b cell dysfunction interacts with environmentally triggered insulin resistance to cause T2D [12][13][14]. Our results may seem consistent with this notion.
We find tissue and gender differences in genome-wide expression profile in T2D. In males, differentially expressed genes in visceral adipose showed enrichment of valine, leucine and isoleucine degradation, and propanoate metabolism, whereas those in skeletal muscle showed enrichment of extracellular matrix-receptor interaction. In females, diverse pathways including that related to neurotrophin signaling, cancer, immune response, intercellular communication, and pathogenic Escherichia coli infection are enriched in differentially expressed genes in visceral adipose, whereas no pathway show enrichment in subcutaneous adipose. As such, convergence of pathways is largely absent if gene expression profiling is analyzed in isolation. Fold change was calculated for two to six technical replicates, each representing three biological replicates. The source of RNA used in qRT-PCR analysis was same as in microarray profiling. A total of 47 genes were used for validation. The rationale behind the subsets of genes selected was two fold. One, the genes should represent, wherever applicable, one or more enriched pathways (Table 3) in a given condition. Second, the genes should maximally represent those which are differentially expressed at adjusted p value cutoff (Dataset S4) in more than one condition, so that validation of microarrays can be examined more widely. The selected genes, besides others, covered all the pathways that were enriched in the above microarray gene lists. Notably, up-or down-regulation observed in qRT-PCR, shown in red and green, respectively, was consistent with microarrays, for all comparisons. doi:10.1371/journal.pone.0053522.g006 However, when genes that are both differentially expressed in T2D as well as are known to interact with GWAS signals are analyzed, we find enrichment of numerous pathways in all the tissues and both genders, with several enriched pathways in common to all the conditions. Also, statistical significance of enrichment in general is greatly increased in the latter set of genes than the former. This clearly demonstrates the advantage of convergent analysis in examining genome to phenome correlation in complex disease like diabetes. Whereas interactome guided analysis of differentially expressed genes uncovered several pathophysiologically relevant pathways in female visceral adipose and male skeletal muscle, only one of them, TGF-beta signaling, was revealed, based on a literature search, in female subcutaneous adipose and male visceral adipose. This literature based analysis was supported when representatives of TGF-beta pathway were removed from the transcriptome-interactome intersection genes related to female visceral adipose and male skeletal muscle, and the resulting set was subjected to pathway enrichment analysis.
Compared to complete set of intersection genes, the TGF-beta deleted list showed less pronounced enrichment of pathways including insulin signaling. Although this literature based analysis may not be very robust due to inherent limitations, and hence other pathways may also possibly be important, it at least identifies TGF-beta signaling as one that can connect genome to phenome in T2D. Of all the candidate genes identified in T2D GWASs at acceptable significance level, CDKN2B is the only one that represents this signaling pathway. It remains a possibility that several other genes in TGF-beta signaling are associated with the disease but we are not clear about them because they are either below the significance level used in GWAS reporting, are not yet discovered in disease association studies due to technical limitations or are not characterized and codified well enough in terms of function. A notable example here is that of SMAD3, a TGF-beta signaling gene. A SNP in SMAD3 though did not show evidence of association in the original GWAS in T2D (nominal p = 0.0006) the disease candidacy of the gene was nonetheless uncovered in a subsequent pathway enrichment analysis [62]. This supports our genome to phenome correlation analysis. The interactome network and transcriptomic analysis provided here offer novel means to mine GWAS data that is not available in public domain and identify novel candidate genes in T2D. We anticipate that new evidence for association of genes in TGF-beta pathway will emerge from data mining.
Already, some of the most recent findings do seem consistent with our systems level analysis. For example, a newly developed joint meta-analysis approach has recently identified additional loci associated with fasting insulin and other insulin resistance related traits [20]. Interestingly, genes localized nearby some of the associated SNPs are known to play a functional role in insulin signaling. Remarkably, one of the strongest positional candidates, PPP1R3B, is known to interact at protein level with a single gene, PPP1CA, which in turn is involved physically in SMAD signaling protein-protein interactions and functionally in TGF-beta signaling pathway [63]. Furthermore, we find that T2D interactome does not only include PPP1CA but also several of its interactors. Twenty of the total 111 PPP1CA interactors in the BioGRID are present in our T2D interactome of 561 genes. Given the total BioGRID space of 14,306 genes, the T2D interactome is highly enriched for PPP1CA interactors (p = 0.000000008). This demonstrates the power of our systems model. Another support for our model comes from a recently reported functional study in which the alpha-2-HS-glycoprotein (ASHG) has been identified as an adaptor protein that links saturated fatty acids to toll-like receptor 4 thus stimulating inflammatory pathways leading to insulin resistance [64]. Interestingly, ASHG is a known antagonist of TGF-beta cytokines including TGF-beta1 [65,66]. Of the nine protein interactors of ASHG in the BioGRID, one, SMAD3, is present in our T2D interactome. Although small numbers preclude any statistical analysis, it is tempting to find this gene overlap notable.
The above discussion remarkably converges on the TGF-beta signaling effector SMAD3. TGF-beta signaling is involved in the regulation of insulin gene transcription, pancreatic islets b cell function, and glucose tolerance and energy homeostasis [36,[59][60][61]. SMAD3 is known to localize at insulin gene promoter and repress insulin gene transcription [61]. SMAD3 knock-out mice are associated with improved glucose tolerance and insulin sensitivity [36]. Exhibiting altered expression of genes related to adipogenesis, lipid accumulation, and fatty acid b oxidation, these mice show resistance to obesity and insulin resistance induced by high fat diet [36,59]. Further, levels of TGF-beta1 have been found to positively correlate with adiposity in human subjects [59]. Also, systemic blockade of TGF-beta signaling has been found to protect mice from obesity, diabetes and hepatic steatosis [59]. Indeed, pharmacological manipulation of TGF-beta signaling is considered to offer a potential therapeutic strategy in obesity and diabetes [59,60].
Most recently, one of the eight new susceptibility loci reported in a large scale association analysis in T2D at genome-wide significance, the top signal maps to ZMIZ1 [67]. Notably, ZMIZ1 is known to interact with SMAD3 at protein level [68]. In the association study [67], the authors used an expanded set of susceptibility loci to define pathways and networks underlying T2D pathogenesis. Remarkably, the most connected gene that the authors found in their protein-level interaction analysis is that encoding CREBBP, a co-activator known to regulate SMAD3dependent transcription [69]. Furthermore, the top four previously unreported T2D genes identified in a recently conducted T2D GWAS in an Indian population include the TGF-beta signaling gene TGFBR3 [70]. Known to affect phosphorylation and nuclear localization of SMAD3, TGFBR3 is involved in TGF-beta/ SMAD3 dependent signaling [71]. The above results demonstrate the robustness of our systems model identifying TGF-beta/ SMAD3 signaling as central to genome to phenome correlation in T2D. Above all, SNPs associated with gene expression in liver, visceral adipose and subcutaneous adipose and with T2D in GWASs have previously been found to be enriched in various R2D candidate pathways including TGF-beta signaling [72]. Overall, our global analyses seem consistent with the accumulating evidence implicating TGF-beta signaling in T2D related pathophysiology.
The newly arrived technology of whole genome/exome sequencing is expected to accelerate identification of rare variants with large effect sizes, thus helping us achieve in near future an even greater understanding of the genetic basis of complex phenotypes [2,6]. The recently accomplished deep sequencing of human exomes has indeed suggested that rare variations contribute substantially to human phenotypic variation and disease susceptibility [73]. Availability of post-GWASs era data for T2D will be crucial in examining genome to phenome correlation in greater details. Emerging methods in pathway-wide analysis and integrative network based analysis of genetic association data in complex disorders will further help accelerate identification of variations that are causally linked to phenotypes [20,22]. It will be interesting to see if newer results support our genome to phenome correlation analysis in general and candidacy of TGF-beta signaling in particular.
In conclusion, genetic association evidence in T2D correlates well with disease pathophysiology including insulin secretion deficiency and insulin resistance. The systems biology framework that has emerged from the present analysis may prove valuable in further refining our understanding of genetic determinants and molecular pathways in the pathogenesis of T2D.

Ethics Statement
All study participants provided written, informed consent under protocols specifically approved by the ethics committee of Sawai Man Singh Medical College, Jaipur.  [4,25]. It presently includes 1,300 publications and 6,581 SNPs, and contains gene names reported by the authors in the original paper. GWASs that attempted to assay at least 100,000 SNPs in the initial stage are only included in the catalog. In our analysis, we retrieved the ''reported genes'' in T2D GWASs using p value thresholds of ,10 28 , 10 27 , 10 26 and 10 25 . As our main objective was to examine correlation between genes already implicated in T2D with disease pathophysiology, we based our analysis on reported candidate genes instead of genes in linkage disequilibrium around each associated SNP.

Pathway Enrichment
The freely available Database for Annotation, Visualization, and Integrated Discovery (DAVID) was used for examining enrichment of KEGG pathways in gene sets. DAVID, that integrates data from multiple functional databases, is frequently used for revealing biological themes underlying large gene sets (http://david.abcc.ncifcrf.gov/) [74][75][76][77]. Standard method of analysis was followed, with Homo sapiens as background and EASE score enrichment p values globally corrected using Benjamini-Hochberg technique [78]. A modified Fisher Exact p value, EASE score relate to gene-enrichment in annotation terms. KEGG, a public domain database commonly used for gene enrichment analysis and pathway visualization [22,23,72,79], houses a total of 199 unique human pathways representing 5197 unique genes/ proteins (http://www.genome.jp/kegg/pathway.html) [72].

Interactome Network
The public database BioGRID extracts and annotates in-depth physical and genetic interactions reported in the primary peerreviewed literature and houses the data that is explicitly corroborated by experimental evidence in an organized form to enable various tasks such as computational analysis of biological networks and prediction of gene/protein function (http:// thebiogrid.org/) [80][81][82][83]. We used physical and genetic interaction data sets for Homo sapiens in BioGRID v.3.1.89. The total number of unique nodes and edges in these data sets were 14,306 and 67,659, respectively. To visualize BioGRID networks off-line and retrieve genes therein, we used the public domain tool Osprey (v.1.2.0) (http://biodata.mshri.on.ca/osprey/servlet/Index) [84]. Only direct interactions were used in the present analysis.

Drug-gene Interactions
Genes that interact with antidiabetic drugs pioglitazone, troglitazone, rosiglitazone, metformin, tolbutamide, glyburide, glipizide, gliclazide, nateglinide, repaglinide, sitagliptin, saxagliptin, bromocriptine, acarbose, vildagliptin, liraglutide and exenatide were identified using the publicly available CTD, Mount Desert Island Biological Laboratory, Salisbury Cove, Maine (http:// ctdbase.org/) in June, 2012. CTD is a repository of manually curated chemical-gene, chemical-disease and gene-disease relationships from the literature [85][86]. The database, documenting over 200,000 gene-environment relationships from over 26,000 publications, has been used previously towards understanding    [24]. We used chemical-gene interaction query for each drug in CTD, and retrieved genes using the default settings ''increases, decreases, affects'' and ''any'' interaction. These interactions are of various types such as expression, abundance and activity.

Surgical Tissue Samples
Biopsies were obtained from abdominal subcutaneous and visceral fat tissue and skeletal muscles of patients undergoing abdominal surgery under general anesthesia. The muscle samples were obtained from the vastus lateralis by Bergstrom needle biopsy. Primary indications of surgery were non infective and non malignant conditions, namely, cholelethiasis, hernia and trauma. The average age of the patients undergoing surgery was 58 years (range 37 to 85 years). Antidiabetic therapy mostly included sulphonylurea or insulin treatment. None of the patients were on pioglitazone ever. Once surgery was planned, all the diabetic patients were put on insulin therapy. No patient took metformin after surgery was planned. The levels of glycated hemoglobin (%HbA 1c ) determined in nondiabetic and diabetic patients were 5.7560.33 and 9.4460.82, respectively. The difference between the two groups was significant (p = 0.003). The BMI of nondiabetic and diabetic patients were 24.4861.2 and 25.0061.81, respectively. The difference between the two groups was insignificant (p = 0.81). The biopsy samples were obtained from tissue exposed and wasted during surgery. The sample was immediately rinsed in saline and stored in RNA later (Ambion, Austin, TX) solution, initially at 4uC and later stored at 280uC till further use. All study participants provided written, informed consent under protocols originally approved by the ethics committee of Sawai Man Singh Medical College, Jaipur.

RNA Isolation
The total RNA from biopsy samples was isolated using the mirVana TM miRNA Isolation Kit (Ambion). The quantity and quality of the isolated RNA were determined Nanodrop-1000 (Thermo Fischer Scientific) and Agilent 2100 Bioanalyzer (Agilent Technologies), respectively. RNA with RIN (RNA integrity number) value in the range of 5 to 8 was used for further analysis. These RNA samples had both 260/280 nm absorbance and 28S/ 18S rRNAs peak ratio of two. Although RIN of 7 or more would have been ideal, RNA isolated from surgical samples may not always be of very high integrity. As biopsy specimens were limited,

Microarrays
Three biological replicates with four technical replicates each were used to generate expression profiles. The biological replicates were matched with respect to gender, average age, average height, average weight, average edible oil consumption, and vegetarian/ non-vegetarian diet. Also, all the patients were free from alcohol use and smoking. Starting with 500 ng of total RNA, Illumina TotalPrepTM RNA Amplification Kit (Ambion) was used for preparing first and second strand cDNA, purification of cDNA, in vitro transcription to synthesize biotin labeled cRNA, and  purification of the labeled cRNA, in that sequence. The quantitation of cRNA was performed using Nanodrop-1000. Illumina HumanHT-12 v3 Expression BeadChip arrays, containing more than 48,000 probes representing more than 25,000 annotated genes, were hybridized with 750 ng of labeled cRNA samples. Hybridization and washing were performed according to the manufacturer's protocol. The arrays were scanned and read using Illumina iScan System, and the data extraction, average normalization and downstream analysis performed using Illumina GenomeStudio V2010.1. IIlumina custom error model was applied to identify differentially expressed genes, with or without Benjamini and Hochberg correction for multiple hypotheses testing, as mentioned in the results section. The upregulated or downregulated genes were retrieved using 613 Diff score threshold of Illumina custom algorithm. This Diff score corresponds to a p value of 0.05. Dendrogram of sample groups was constructed in GenomeStudio using globally normalized gene expression data. Correlation algorithm was used to construct the dendrogram.

Quantitative Real Time PCR
Three biological and two to six technical replicates were used for quantitative real time PCR (qRT-PCR) analysis. The same source of RNA that was used previously for microarray analysis was used for qRT-PCR. Notably, validation of same samples used for microarray analyses is typically reported [89,90]. The amplification reactions were carried out in 7500 Real Time PCR System (Applied Biosystems). RNA was reverse transcribed into cDNA using High capacity cDNA Archive kit (Applied Biosystems) following manufacturer's recommendations. With 50 ng of RNA, PCR was carried out using Custom TaqManH Array Standard 96 well Plates. Each assay consisted of two sequence-specific PCR primers and a TaqMan assay-FAM TM dye-labeled MGB probe. 18S rRNA was used as an endogenous control. Data was generated using software SDS 2.1 and C T values were calculated. All genes were detectable under the detection thresholds (C T #36) recommended by Applied Biosystems. To compare 18S rRNA and target gene, relative quantification was performed using comparative C T method. Briefly, this method involved averaging duplicate samples of each target and endogenous control in both calibrator (i.e. control) and treatment samples [i.e. DC T (absolute C T value -endogenous control C T value) and DD C T (DC T for each gene -DC T for a common reference gene)]. The fold change was calculated according to the formula 2 2(DDCT) , where DDCT was the difference between DC T target and the DC T calibrator value. ABI gene expression assay IDs and the corresponding genes used in qRT-PCR validation experiments are listed in Dataset S5.

Gene Overlap Significance Test
Common genes between two sets were identified using freely available Bioinformatics & Research Computing tool (http://jura. wi.mit.edu/bioc/tools/compare.php). Hypergeometric distribution probability was used for testing significance of gene matching. The p values for gene overlaps were obtained using Microsoft Office Excel 2007. Bonferroni correction was applied for adjusting p values for multiple hypotheses testing.

Supporting Information
Dataset S1 Genes reported in T2D GWASs at different levels of significance. This table lists all the genes that are reported in various GWASs for T2D at p value thresholds of 10 28 , 10 27 , 10 26 and 10 25 . There are 46, 62, 71 and 93 genes in these lists, in that order. The gene lists were retrieved from the catalog of published GWASs made freely available by NHGRI.

(XLS)
Dataset S2 Genes in the total interactome and T2D interactome. This table lists all the genes that constitute the total interactome and the direct interaction network of genes reported in T2D GWASs at p value cutoff of 10 25 . These genes are 14,306 and 561 in number, in that order. The interactome data for Homo sapiens in BioGRID was used in combination with the visualization tool Osprey to retrieve gene lists. (XLS) Dataset S3 Antidiabetic drug interacting genes. This table provides list of genes which interact with antidiabetic drugs pioglitazone, troglitazone, rosiglitazone, metformin, tolbutamide and glyburide. There are 121, 172, 518, 76, 17 and 23 genes in these lists, in that order. The gene lists were retrieved from Comparative Toxicogenomics Database (CTD).

(XLS)
Dataset S4 Up-and down-regulated genes in T2D patients compared to nondiabetic controls. This table lists genes that showed up-or down-regulation in genome-wide expression analysis in various tissues in male and/or female T2D patients. Lists of differentially expressed genes in male skeletal muscle, female subcutaneous adipose, female visceral adipose and male visceral adipose are given for both adjusted and unadjusted Diff score cutoff of 6 13 which corresponds to a p value of 0.05. At Table 7. TGF-beta signaling genes in T2D genome, interactome and transcriptome, and antidiabetic drug toxicogenome.