Detection of Independent Associations of Plasma Lipidomic Parameters with Insulin Sensitivity Indices Using Data Mining Methodology

Objective Glucolipotoxicity is a major pathophysiological mechanism in the development of insulin resistance and type 2 diabetes mellitus (T2D). We aimed to detect subtle changes in the circulating lipid profile by shotgun lipidomics analyses and to associate them with four different insulin sensitivity indices. Methods The cross-sectional study comprised 90 men with a broad range of insulin sensitivity including normal glucose tolerance (NGT, n = 33), impaired glucose tolerance (IGT, n = 32) and newly detected T2D (n = 25). Prior to oral glucose challenge plasma was obtained and quantitatively analyzed for 198 lipid molecular species from 13 different lipid classes including triacylglycerls (TAGs), phosphatidylcholine plasmalogen/ether (PC O-s), sphingomyelins (SMs), and lysophosphatidylcholines (LPCs). To identify a lipidomic signature of individual insulin sensitivity we applied three data mining approaches, namely least absolute shrinkage and selection operator (LASSO), Support Vector Regression (SVR) and Random Forests (RF) for the following insulin sensitivity indices: homeostasis model of insulin resistance (HOMA-IR), glucose insulin sensitivity index (GSI), insulin sensitivity index (ISI), and disposition index (DI). The LASSO procedure offers a high prediction accuracy and and an easier interpretability than SVR and RF. Results After LASSO selection, the plasma lipidome explained 3% (DI) to maximal 53% (HOMA-IR) variability of the sensitivity indexes. Among the lipid species with the highest positive LASSO regression coefficient were TAG 54:2 (HOMA-IR), PC O- 32:0 (GSI), and SM 40:3:1 (ISI). The highest negative regression coefficient was obtained for LPC 22:5 (HOMA-IR), TAG 51:1 (GSI), and TAG 58:6 (ISI). Conclusion Although a substantial part of lipid molecular species showed a significant correlation with insulin sensitivity indices we were able to identify a limited number of lipid metabolites of particular importance based on the LASSO approach. These few selected lipids with the closest connection to sensitivity indices may help to further improve disease risk prediction and disease and therapy monitoring.


Introduction
Metabolic syndrome represents a cluster of metabolic and cardiovascular abnormalities including obesity, hypertension, hyperglycemia and dyslipidemia. A common pathophysiological mechanism linking these traits is insulin resistance which in turn is closely associated with abnormalities in glucose and lipid metabolism.
Intracellular lipid oversupply may lead to cellular damage that underlies diabetes [1][2][3]. Socalled glucolipotoxicity with accompanying insulin resistance is followed not only by disturbances in tissue fat metabolism but is also reflected in alterations of various circulating plasma/ serum lipid subspecies [4][5][6][7]. Recently, we demonstrated that hypertension, an integral part of the metabolic syndrome, was specifically associated with decreased levels of free cholesterol and ether lipids [8].
The objective of this study was to apply shotgun mass spectrometric analysis [9,10] for the comprehensive and quantitative estimation of the plasma lipidomic profile in a total of 90 male individuals with a broad range of insulin sensitivity including NGT (n = 33), IGT (n = 32) and newly detected T2D (n = 25). The collection of high dimensional data which contains a large number of potential covariates, yet a very limited sample size is often called "small n and large p problem". This requires specific statistical data mining methods for detecting independent associations between molecular lipid species and insulin sensitivity indices including HOMA-IR, GSI, ISI, and DI. Here, we demonstrate that LASSO selection procedure was a valuable tool to detect distinct lipidomic signatures of the four investigated insulin sensitivity indexes. Their close association with different insulin sensitivity indices may link these molecular markers with subtle changes of glucose tolerance.

Research Design and Methods Subjects
The study comprised 90 white male subjects who were attending the PRAEDIAS (Prevention of diabetes) study at the Dresden University Hospital Carl Gustav Carus. In brief, subjects who were at risk for development of diabetes owing to a family history of T2D, obesity and/or hyper-/dyslipoproteinemia were examined. Clinically overt diabetes was an exclusion criterion.
The study was conducted in accordance with the guidelines proposed in the Declaration of Helsinki. All subjects signed a written consent to participate in the study. The study was approved by the local competent authority the Ethik-Kommission an der Technischen Universität Dresden (EK 139092001).
After overnight fasting subjects underwent a comprehensive clinical and metabolic characterization including an oral glucose tolerance test (OGTT, 75g oral glucose challenge). Accordingly, study participants were grouped: plasma glucose at time point 0 (PG0) <6.1 mmol/l and 2h post challenge (PG120) <7.8 mmol/l-NGT; PG120 = 7.8-11.1 mmol/l-IGT; PG120>11.1 mmol/l-T2D. Subjects included into the latter group were diagnosed by their first pathological OGTT. Notably, they had no clinical symptoms and no medication of diabetes mellitus.
Blood samples for lipid profiling were taken at time point 0 of OGTT. EDTA plasma was prepared by centrifugation at 4°C and 3000g for 10 min, immediately shock-frozen in liquid nitrogen and stored at -80°C until analysis.

Conventional clinical chemistry
Plasma total cholesterol, HDL and LDL cholesterol, triglycerides, free fatty acids, glucose, insulin, C-peptide, leukocytes, C-reactive protein (CRP), and HBA 1c were measured by routine clinical chemistry as previously described [8].

Calculation of insulin sensitivity indices
The sensitivity indices HOMA-IR, ISI, and DI were estimated as described previously [11]. GSI was introduced recently by Kazama et al. [12] and is calculated from plasma glucose and insulin levels at 0, 30, 60, 90 and 120 min OGTT by a formula based on an autoregressive model, see also S1 Table. FT MS/MS+ (AGC = 5e4, IT = 3s) at the targeted resolution of 140k at m/z 200 (as described elsewhere).
Performance control. A series of control blood plasma (sample volume: 1.25, 2.5, 5.0, 10.0, 15.0 μl) was measured to prove detection linearity of quantified lipid species for the expected total lipid concentrations of 2.0-28.3 mM. For the analyzed samples of this screen the total lipid concentration in blood plasma varied from 5.6-18.7 mM. Within the tested concentration range the adjusted R 2 was greater 0.98 for the quantified lipid analytes.
Data processing. FT MS and FT MSMS spectra were interpreted by LipidXplorer software as described [9,14]. 198 species from 13 major lipid classes were quantified across the plasma samples (S2 Table).
Annotation of lipid species. Lipid species were annotated according to common rules (DOI 10.1194/jlr.M033506). Abbreviation of tri-and diacylglycerols were different to avoid confusion with colorimetric specified quantitities for same lipid classes. Further, sphingolipid hydroxyl groups are assigned by separate counter following the double bond counter. Statistical analysis. The initial statistical analysis focused on detecting group effects on the lipid species as well as the lipid classes. As the group variable contains the NGT, IGT and T2D groups an ANOVA was used to identify differences in group means after having log-transformed and normalized the lipid values. In total, 202 p-values were returned that were corrected for multiple testing using the Benjamini-Hochberg procedure which is a preferred correction procedure analyzing high dimensional data. As an ANOVA is only able to determine whether one of the group means significantly differs from the others, these calculations were followed by pairwise t-Tests in order to conclude which of the group means differ significantly. Here we also used a Benjamini Hochberg correction to account for multiple testing. Similarly, correlations of each insulin sensitivity index and lipid species was estimated using the Spearman coefficient followed by correlation tests. As the tests were performed for each of the lipids in combination with each of the insulin sensitivity indices this also produces a multiple testing problem which was again addressed by a Benjamini-Hochberg correction.
In order to estimate the influence of lipids on four insulin sensitivity indices (HOMA-IR, GSI, DI and ISI) in a multivariate sense we only included either all lipid species or all lipid classes as covariates in our models. The analysis was corrected for age, BMI, WHR and systolic blood pressure including these covariates in the models explaining each of the four insulin sensitivity indices.
The huge number of covariates, however, makes it impractical to use standard parametric procedures such as linear regression from a technical point of view having only observed 90 individuals, on the other hand the interpretation of such a model would be almost impractical due to a huge number of estimated regression coefficients. In practice many features may be irrelevant and only a few covariates will actually influence the response. Including irrelevant features to a model would then lead to overfitting as these variables can be considered as noise variables that will cover the signal which would cause better fits on training samples and poorer performances on test samples. Hence, including irrelevant features would lead to a model that would not be able to generalize the trend in the data due to the noise variables.
Thus, data mining tools were applied to the data at hand that permit a fair analysis of such data. We used LASSO [15], Support Vector Regression (SVR, [16]) in combination with a pvalue based feature selection using correlation tests as well as a Benjamini-Hochberg correction prior to applying the SVR and Random Forests for each of the four regression tasks. In order to get an overview on Random Forests as well as variable importance which are major outcomes of this method we refer to Strobl et al. [17]. These three methods were validated using a crossvalidation technique called Monte Carlo Cross Validation with 100 iterations [18]. After validation we selected the model that minimizes the mean absolute error on the test data as the model with the lowest error is the most valid. However, it turned out that the LASSO model either returned the lowest mean absolute error or performed as good as the other methods. In the latter case we decided to proceed with the LASSO model as it was as valid as the other two models, however easier to interpret.
Finally, we extracted the final models for each of the four sensitivity indices by running the method that is the most valid according to the mean absolute error on the entire data set and estimating the final regression coefficients as well as R-Squared statistics for all four regression models.

Results
Baseline anthropometric data and glycemic status are presented in Table 1. Compared to NGT individuals, patients with newly diagnosed T2D were older, had a significantly increased WHR and a higher systolic blood pressure. According to WHO criteria (RR!140/90) fifty one individuals were hypertensive (thirteen with NGT, nineteen with IGT, nineteen with T2D). IGT and diabetic subjects had higher HbA 1c , plasma glucose, insulin and C-peptide levels. Insulin sensitivity indices GSI, ISI and DI decreased, and HOMA-IR increased with increasing insulin resistance among individuals (see Table 1). While all sensitivity indices were significantly different between NGT and T2D subjects, ISI and DI additionally differed significantly between NGT and IGT individuals. As expected by us, the indices are closely related among themselves. Correlation analysis revealed that, except DI and HOMA-IR as well as DI and GSI, the indices were significantly associated (Spearman-rho correlation coefficients from -0.676 to 0.575).
Shotgun  Table 2 displays plasma concentration of lipid classes in plasma of individuals in the NGT, IGT and T2D groups, where the content of an individual lipid class was determined by summing up absolute concentrations of all identified species and is expressed as μmol/l. As shown in Table 2, cholesteryl ester were the most abundant among blood plasma lipids, followed by phosphatidylcholines, triacylglycerols, and free cholesterol consistent with a previous report [19].
Out of 13 lipid classes, eight (TAG, DAG, CE, Cer, PC, PI, PE, Chol) were significantly and gradually increased in IGT and T2D subjects when compared with NGT individuals (Table 2). Additionally, SMs were significantly increased in T2D individuals when compared to NGT subjects. S3 Table shows the plasma concentrations of each lipid species in the three groups with distinct glucose tolerance and the p-values of differences between NGT, IGT, and T2D individuals. Fig 1 presents the relative errors of insulin sensitivity indices using three regression models (LASSO, RF, and SVR) that included the common risk factors age, BMI, WHR, systolic blood pressure, and 198 individual lipid species. As shown in Fig 1, differences between these three approaches were only marginal. Due to better interpretability, however, in the present study the LASSO model was applied for subsequent analyses. To unravel potential associations between circulating lipids and insulin sensitivity markers, we performed correlation analyses between lipids and four insulin sensitivity indices across all individuals using Spearman's nonparametric rank correlation coefficient followed by correlation tests that have been adjusted for multiple testing using Benjamini Hochberg correction. As demonstrated S4 Table, out of 198 lipid molecular species 126 correlated significantly with either HOMA-IR, a marker of insulin resistance, and/or the surrogate sensitivity indexes GSI, ISI and DI. The most striking (as evaluated by rho, p-value, and number of lipid species within one lipid class) associations were found between HOMA-IR and TAGs (35 significant positive correlations), and ISI and TAGs (33 significant negative correlations). GSI and DI were significantly negatively associated with 21 TAGs and 14 TAGs, respectively. Therefore, it is not surprising that a considerable amount of lipid classes was either significantly positively (HOMA-IR) or significantly negatively (ISI, DI) related to insulin sensitivity indices, see S1 It shows the tendency that TAGs of lower carbon number and double bond content were associated with increased positive correlation to insulin resistance (HOMA-IR) and increased negative association to insulin sensitivity (ISI). A similar relationship of TAG carbon number and acyl chain number with insulin sensitivity surrogates was also found for GSI but not for DI. Moreover, correlation analyses revealed a close positive association between ISI and ether lipids, LPCs and SMs and predominantly negative relations of HOMA-IR to LPCs and SMs (see S4 Table). Insulin resistance is often accompanied with changes in desaturase activities In the present study, D5D tended to decrease from NGT to T2D individuals while D6D and D9D were increased significantly in IGT and T2D subjects ( Table 3). All desaturase activities showed significant associations to three or more insulin sensitivity indices Table 4.
To identify a lipidomic signature of each investigated insulin sensitivity index, we applied LASSO selection procedure. This model considered all measured lipid species simultaneously; age, BMI, WHR, and systolic blood pressure were included in the model. After LASSO selection, the plasma lipidome contributed to 3% variability in DI, 45% variability in GSI, 52% variability in ISI, and to maximal 53% variability in HOMA-IR which was calculated using R-Squared statistics.   For the four investigated insulin sensitivity indexes LASSO selected lipid species from 8 lipid classes: most represented were TAG species (10), followed by DAGs (3) For HOMA-IR ten distinct lipid species were selected, the highest positive regression coefficient was obtained for TAG 54:2, the highest negative regression coefficient for LPC 22:5. For GSI eight molecular species were selected including PC O-32:0, highest positive regression coefficient, and TAG 51:1, highest negative regression coefficient. LASSO procedure selected eight species for ISI with SM 40:3:1 having highest positive regression coefficient and TAG 58:6 having highest negative regression coefficient. LASSO selected only one lipid species for DI-TAG 46:1, regression coefficient: -0.097. Age, systolic blood pressure and BMI were selected for HOMA-IR, GSI, and ISI, while BMI and systolic blood pressure were selected for DI.

Discussion
Lipid abnormalities have been shown to be causally involved in the pathogenesis of T2D and its cardiovascular complications. In this respect, the key concept of glucolipotoxicity is increasingly recognized as a potential molecular mechanism mediating insulin resistance and its clinical consequences [20,21]. Up to now, clinicians traditionally operate with integrated indices of lipid metabolism including circulating free fatty acids, total cholesterol, and triglycerides. Accurate plasma lipidomic profiling, however, suggests that the interaction between lipid composition and disease is subtle and might contribute to disease risk prediction and therapy monitoring [8,[22][23][24].
Male subjects investigated in our study, showed a broad range of insulin sensitivity, that was identified by four distinct surrogate markers. HOMA-IR is a simple but effective measure of insulin resistance in fasting steady state conditions while ISI is a more complex estimation of insulin sensitivity incorporating glucose and insulin levels at the beginning (0 min) and end (120 min) of OGTT and body weight. ISI and GSI have been shown to correlate well with the insulin sensitivity index obtained from the euglycemic hyperinsulinemic clamp, the gold standard for measurement of insulin resistance [12,25]. DI has been shown to be a valuable surrogate measure of ß-cell function reflecting the ability of the ß-cell to compensate for insulin resistance. While the static sensitivity index HOMA-IR mainly reflects hepatic insulin sensitivity, dynamic indices, including both fasting and stimulated glucose and insulin levels, reflect hepatic as well peripheral insulin sensitivity [26]. Consequently, different surrogate indices capture different pathophysiological aspects of glucose intolerance [27,28]. These differences might be an explanation for the fact that LASSO selection procedure included distinct individual lipid species for each investigated insulin sensitivity index. Only DAG 38:5 was included in three of four sensitivity indexes models: HOMA-IR (positive regression coefficient), GSI and ISI (negative regression coefficients). Accumulating evidence suggests that lipid-induced insulin resistance is at least partially mediated by diacylglycerols that activate novel protein kinase C isoforms with subsequent inhibition of insulin action in liver and skeletal muscle [29,30]. Interestingly, DAG species, specifically DAG 16:0/22:5 and DAG 16:0/22:6, were also associated with increased blood pressure and the liability of incident hypertension, an integral part of the metabolic syndrome [31]. Hypertriglyceridemia is an independent risk factor of T2D. In support of this, the present study demonstrates a close association between TAG species and insulin sensitivity indices, particularly HOMA-IR and ISI. We observed a signature in which TAGs with lower carbon number and double bond content were associated with higher positive correlation to HOMA-IR and negative association to ISI. A similar lipid pattern of TAG and HOMA-IR association was also reported previously [32]. Rhee et al. [32] identified 6 TAG species that were associated with increased risk of diabetes and 3 TAGs that were related to decreased risk of T2D. In the present study, TAGs were among the most frequently selected lipid species after LASSO procedure, especially for HOMA-IR (five positive regression coefficients-TAG 54:2, TAG 53:5, TAG 46:0, TAG 46:1, TAG 46:2) and for ISI (four negative regression coefficients-TAG 58:6, TAG 55:6, TAG 46:0, TAG 52:7) thus emphasizing the pathophysiological importance of specific TAGs for insulin resistance. In conclusion, TAG and DAG plasma profiling provides a more differentiated view on changes in lipid homeostasis of individuals with glucose intolerance and, moreover, may improve diabetes risk prediction [33,34].
Insulin resistance is characterized by specific changes in the fatty acid profile in plasma and skeletal muscle membranes including increased palmitic (FA 16:0) and decreased linoleic (FA 18:2 n-6) acid [35,36]. One mechanism for the shifted fatty acid pattern may include activation of Δ6 and Δ9 desaturase by insulin [37]. Typically, insulin resistance is associated with an increase of D6D and D9D activity as well as a decrease in D5D activity [36]. Similar results were obtained in the present study. Consequently, D6D and D9D were significantly positively related to HOMA-IR, while D5D was negatively associated to HOMA-IR (Table 4).
SMs are structural components of tissue membranes, important cellular messengers, and are abundant in nerve cells. In the present study, a substantial part of SM molecular species showed a strong negative relation to HOMA-IR and a significant positive association to ISI and DI suggesting a positive role for SMs in insulin sensitivity. Accordingly, SM 40:3:1 had the highest positive regression coefficient for ISI after LASSO selection. Our findings confirm previous investigations showing that that several SM species were downregulated in diabetes [33]. In contrast, Hanamatsu et al. [38] reported a positive correlation of several SM species in serum with HOMA-IR in obese individuals.
Ether phosphatidylcholines (PC O-) are suggested to have antioxidant and cardioprotective properties. Downregulation of PC O-s has been reported in hypertension [8] and Crohn's disease [39]. In addition, higher levels of PC O-s were associated with familial longevity [40]. Only sparse previous studies document an involvement of ether lipids in the pathogenesis of T2D. Pietilainen et al. [41] found that PC O-s were associated with better insulin sensitivity. In a targeted lipidomics study of a prediabetic pig model Renner et al. report a decrease of several circulating PC O-species with time [42]. In support of this, our study documents the highest positive regression coefficient of PC O-32:0 for GSI. However, PC O-s did not play a significant positive or negative role for the other insulin sensitivity indices HOMA-IR, ISI, and DI.
In summary, using sensitive shotgun lipidomics, we demonstrate in the present study varying relationships of plasma lipid species to four distinct insulin sensitivity indices. After LASSO selection the plasma lipidome explained up to 53% variability of the sensitivity indices. Close association of insulin sensitivity indices with a large number of molecular lipid species reflects the importance of changes in lipid homeostasis in the pathogenesis of T2D.