First-trimester proteomic profiling identifies novel predictors of gestational diabetes mellitus

Background Gestational diabetes mellitus (GDM) is a common pregnancy complication associated with adverse outcomes including preeclampsia, caesarean section, macrosomia, neonatal morbidity and future development of type 2 diabetes in both mother and child. Current selective screening strategies rely on clinical risk factors such as age, family history of diabetes, macrosomia or GDM in a previous pregnancy, and they possess a relatively low specificity. Here we hypothesize that novel first trimester protein predictors of GDM can contribute to the current selective screening strategies for early and accurate prediction of GDM, thus allowing for timely interventions. Methods A proteomics discovery approach was applied to first trimester sera from obese (BMI ≥27 kg/m2) women (n = 60) in a nested case-control study design, utilizing tandem mass tag labelling and tandem mass spectrometry. A subset of the identified protein markers was further validated in a second set of serum samples (n = 210) and evaluated for their contribution as predictors of GDM in relation to the maternal risk factors, by use of logistic regression and receiver operating characteristic analysis. Results Serum proteomic profiling identified 25 proteins with significantly different levels between cases and controls. Three proteins; afamin, serum amyloid P-component and vitronectin could be further confirmed as predictors of GDM in a validation set. Vitronectin was shown to contribute significantly to the predictive power of the maternal risk factors, indicating it as a novel independent predictor of GDM. Conclusions Current selective screening strategies can potentially be improved by addition of protein predictors.


Methods
A proteomics discovery approach was applied to first trimester sera from obese (BMI �27 kg/m 2 ) women (n = 60) in a nested case-control study design, utilizing tandem mass tag labelling and tandem mass spectrometry. A subset of the identified protein markers was further validated in a second set of serum samples (n = 210) and evaluated for their contribution as predictors of GDM in relation to the maternal risk factors, by use of logistic regression and receiver operating characteristic analysis.

Results
Serum proteomic profiling identified 25 proteins with significantly different levels between cases and controls. Three proteins; afamin, serum amyloid P-component and vitronectin could be further confirmed as predictors of GDM in a validation set. Vitronectin was shown to contribute significantly to the predictive power of the maternal risk factors, indicating it as a novel independent predictor of GDM. PLOS

Introduction
Gestational diabetes mellitus (GDM), defined as glucose intolerance diagnosed during pregnancy, is a common complication of pregnancy associated with preeclampsia, caesarean section, macrosomia, neonatal morbidity and future development of type 2 diabetes (T2D) in both mother and child [1][2][3][4]. The prevalence of GDM depends on the population and diagnostic criteria used. In Denmark the prevalence is close to 2% by the current diagnostic criteria [5,6], however, the implementation of the World Health Organzation 2013 (WHO13) guideline is expected to increase the prevalence of GDM in Denmark substantially, posing a major challenge to the healthcare system and economy [7][8][9]. Obesity is a major risk factor of GDM and obese pregnant women have an up to eight times higher risk of developing GDM compared with normal-weight pregnant women [10]. Thus, the increasing prevalence of overweight and obesity worldwide constitutes a major accelerator of future incidence of GDM. In addition to obesity, a number of well-described risk factors based on maternal characteristics and history are associated with GDM including diabetes in first degree relatives, ethnicity, previous GDM pregnancy and previous macrosomia [11]. Different combinations of the risk factors have been used to develop various prediction models, some of which have been implemented for routine selective GDM screening [12].
The quest for effective screening tools for early and accurate prediction of GDM has intensified within the last decade, as this would create an open window for preventive treatment in terms of lifestyle interventions. To this end, a number of biochemical markers has been investigated for their potential use as biomarkers of GDM, as recently reviewed by Powe [13], however, so far none of them has been proven powerful enough for clinical use in terms of diagnostic sensitivity and specificity. The majority of the studies has centred on individual protein markers of insulin resistance and inflammation such as adiponectin, sex hormone binding globulin (SHBG) and C-reactive protein [14][15][16][17][18][19][20][21][22], with a few studies also investigating the potential of combining biochemical markers with maternal risk factors [17,23,24].
In a recent study, we used targeted mass spectrometry [25] to address the GDM predictive power of a panel of proteins, which have previously been suggested as markers of either GDM or T2D. Here, the multimarker models only marginally improved the performance of adiponectin, indicating a need for identification of new superior markers.
In this present nested case-control study we report the findings of an elaborate proteomics discovery study. A total of 548 proteins were quantitated by shotgun proteomics in first trimester serum samples for identification of potential protein biomarkers, that are completely novel in relation to GDM. A selected number of potential markers was further measured in a second set of samples and evaluated for their individual contributions as predictors of GDM when combined with maternal risk factors. This procedure revealed vitronectin as a novel independent predictor of GDM.

Samples and clinical data
The samples used in this nested case-control study was procured from a biobank of~20,000 first trimester serum samples collected from the routine screening for Down syndrome at protection of natural persons with regard to the processing of personal data and on the free movement of such data and repealing Directive 95/ 46/EC (General Data Protection Regulation). Additional access to clinical data requires the approval of a data processing agreement between the data controller (MO) and an external data processor. Additional access to clinical data requires the approval of a data processing agreement between the data controller (MO) and an external data processor. Upon request to Odense University Hospital, Office of Legal Affairs, ouh.pd@rsyd.dk, such agreement will be processed by MO and the core research team for further application and approval by the Danish Data Protection Agency.
Odense University Hospital (2008-2012) as described in our previous work [25]. All 270 samples included in the current study have been taken between gestational week 8 + 3 days and week 13 + 6 days. Inclusion criteria for patients were; singleton pregnancies, body mass index (BMI) �27 kg/m 2 and HbA 1c values of <6.5% (48 mmol/mol) at the time of GDM diagnosis. The BMI was reported at the first ultra sound screening taking place between gestational week 11 + 0 days and week 14 + 0 days and the BMI �27 kg/m 2 cut-off was chosen in accordance with the Danish GDM screening guidelines [11]. The 135 GDM cases were matched to 135 controls based on the year of sampling and BMI. In addition to the already collected clinical data [25], information on the maternal risk factors; previous GDM, family history of diabetes and previous birth of a child with macrosomia, were manually retrieved from patient medical records. All aspects of the study were approved by the local ethics committee (S-20130092) and exemption was given for obtaining written informed consent.

Discovery proteomics; sample preparation and nanoLC-MS/MS analysis
A proteomics discovery workflow was applied to 60 serum samples (30 GDM cases and 30 controls, selected at random) and a reference pool of first-trimester serum from >10 individuals. Of each sample, 15 μl serum was depleted of the 14 most abundant proteins using an Agilent Human 14 Multiple Affinity Removal Spin Cartridge (Agilent Technologies, Santa Clara, CA, USA) according to the manufactures manual. Of the reference pool 240 μl serum was depleted. The depleted serum samples were switched to 100 mM Triethyl ammonium bicarbonat (TEAB) buffer by spin filtration on Amicon Ultra-4 Centrifugal Filter Device (Merck Millipore Ltd., Cork, Ireland), dried and reconstituted in a fixed volume, 30 μl, 100 mM TEAB prior to denaturation, reduction, alkylation and trypsinization using a protocol modified from Overgaard et al [26], (for detailed description see S1 Supporting Methods). Peptides were purified using Oasis HLB 10 mg cartridges (Waters, Milford, MA, USA), dried and reconstituted in 40 μl 100 mM TEAB. The peptide concentration of each sample was measured on a Nano-Drop (Thermo Scientific) and 50 μg was labelled with tandem mass tag (TMT) 10-plex (Thermo Scientific, Waltham, MA, USA) according to the manufactures manual. The aminereactive TMT 10-plex targets all peptides. It consists of 10 separate isobaric mass tags, which upon fragmentation in MSMS give rise to reporter ions of 10 different masses. Relative quantitation is achieved based on the ion intensity of each of the reporter ions normalized to that of a common reference pool. For each TMT 10-plex experiment 2 replicates of the reference pool, 4 samples from GDM cases and 4 samples from control subjects were labelled. The experiment was repeated 8 times to accommodate all 60 samples. For each TMT 10-plex experiment all samples were pooled in 1:1 ratios and 50 μg was purified on custom made Poros R2/R3 (Thermo Scientific) micro columns, dried, reconstituted and subjected to hydrophilic interaction liquid chromatography (HILIC) fractionation (13-17 fractions) on a TSK Amide-80 3 μm column (Tosoh Bioscience, Stuttgart, Germany) using a Agilent 1200 series high performance liquid chromatography (HPLC) system (Agilent, Santa Clara, CA, US) [27,28]. The lyophilized fractions were reconstituted in 0.1% Formic acid (FA) and analysed by liquid chromatography tandem mass spectrometry (LC-MS/MS) on an Easy 1000 nano-flow LC/ orbitrap Q Exactive HF system (Thermo Scientific) using a custom made 2 column setup (Reprosil-Pur 120 C18-AQ, Dr. Maisch, Ammerbuch-Entringen, Germany), a two hour gradient and a top 20 shot gun proteomics setup. Raw data was exported to proteome discoverer 2.1 (Thermo Scientific) and searched against the Swiss-Prot human database using mascot (1% false discovery rate (FDR) and 5 ppm). For relative quantitation samples were normalized to the average of the 2 replicates of the reference pool included in each TMT 10-plex experiment.

Targeted proteomics; sample preparation and MRM-MS analysis
For validation by targeted proteomics the remaining 210 samples (105 cases and 105 controls) were subjected to multiple reaction monitoring mass spectrometry (MRM-MS) analysis. Here, only pre-specified peptides were subjected to relative quantitation, by normalization of their MS peak area of selected precursor/product ion pairs to that of the corresponding spiked-in heavy isotope labelled peptides. In short 15 μl of diluted serum, 1:20 in 50 mM ammonium bicarbonate, was denatured, reduced, alkylated and trypsinized essentially as described by Overgaard et al [26] (for detailed description see S1 Supporting methods). Individually adjusted amounts of 9 heavy isotope labelled standard peptides SpikeTides_L (JPT Peptide technologies, Berlin, Germany), were added to each sample in approximation of a 1:1 ratio to the endogenous light peptides. Peptide purification was done using Oasis HLB 10 mg cartridges (Waters, Milford, MA, USA). Samples were dried, reconstituted in 0.1% FA and run on an Easy-nLC II nano LC system equipped with a 2 column setup (C18, 2 cm, i.d. 100 μm and C18, 10 cm, i.d. 75 μm [Thermo Scientific]). Peptides were eluted with a four-step 60 min gradient of 0.1% FA in acetonitrile at a flow rate of 300 nl/min and analysed on a TSQ Vantage triple quadrupole mass spectrometer, equipped with a Nanospray Flex ion source (Thermo Scientific), in selected reaction monitoring mode. Coefficient of variation (CV) calculations (intra-and inter-assay) were done by including a serum pool in triplicate in the analysis described above. Each triplicate was analysed twice on the nanoLC-MS/MS system. Dilution curves (S1 Fig) for each peptide were made in triplicate by adding different concentrations of the heavy isotope labelled peptide to the same pool of serum which was then processed as described above. The lower limit of quantification for each peptide was derived from the calibration standard curves (S1 Fig and S1 Table). MRM-MS data are represented as the ratio of endogenous light peptide to heavy isotope labelled peptide (S2 Table).

Data analysis and statistics
Orbitrap Q Exactive HF and TSQ Vantage raw files were processed using proteome discoverer 2.1 (Thermo Scientific) and Pinpoint 1.3 (Thermo Scientific), respectively. Proteomics data was sorted in Excel 2010 and transferred to an SPSS 21.0 (IBM) database, along with the clinical data, for statistical analysis. The statistical tests used were; Mann-Whitney U test, Fisher's exact test, ROC analysis and binominal logistic regression. A significance level of p <0.05 was applied to all statistical tests in this study.

Results
The experimental setup is illustrated in Fig 1 and consisted of a proteomics discovery part for large scale identification of novel potential protein biomarkers of GDM (n = 60) and a targeted proteomics part for validation of selected candidate markers from the discovery part (n = 210).

Clinical data
In addition to the clinical data previously described [25], the maternal risk factors; previous GDM, previous birth of a child with macrosomia and family history of diabetes, were included in this study to assess the individual predictive potential of the protein biomarkers when combined with these predictors (Table 1). In both the discovery and validation set women with GDM gave birth significantly earlier than the control women. This is explained by the routine use of labour induction for women with GDM two weeks before term and translates to the significant lower birth weight and length, and for the discovery set also abdominal circumference. For the validation set, women with GDM were older and had a higher rate of caesarean sections. In both the discovery set and the validation set, there were significantly more women with previous GDM (p = 0.025 and p = 8.0×10 −6 ) and a family history of diabetes (p = 0.012 and p = 1.3×10 −11 ) in the case group as compared to the control group, whereas no such differences were seen for previous macrosomia. As previously reported [25], maternal age was also significantly different between cases and controls for the validation set.

Identification and validation of novel potential protein biomarkers of GDM
The proteomics discovery approach, utilizing TMT labelling, was applied to individual first trimester serum samples from 30 GDM cases and 30 control patients. Initially 1015 proteins were identified (1% FDR, 5 ppm), amounting to 548 proteins with a mascot score above 22, two or more unique peptides and presence in more than 73% of the samples. The relative serum levels of these 548 proteins were subjected to Mann Whitney U testing and receiver operating characteristic (ROC) analysis, revealing a significant difference between GDM cases and controls for 25 proteins ( Table 2). None of the significant markers remained so after FDR correction by the Benjamini-Hochberg method for 548 variables. The best performing single protein was secreted phosphoprotein 24 with an uncorrected p-value of 0.0003 and an area under the curve (AUC) of 0.770.
In order to explore the putative predictive potential of the protein markers identified by the proteomics discovery approach, an MRM-MS assay was developed, ultimately capable of measuring the relative serum levels of 6 of the original 25 proteins (see S1 Supporting Methods, S1 and S2 Tables, S1 and S2 Figs). The 6-plex assay was applied to the 210 remaining validation serum samples and here the levels of the three proteins; afamin, serum amyloid P-component (SAMP) and vitronectin could be confirmed as significantly different between GDM cases and controls (Table 3).

Logistic regression of protein biomarkers and clinical risk factors
To further evaluate the biomarker potential of afamin, SAMP and vitronectin as compared to maternal age and the maternal risk factors currently used in routine GDM screening, a number of binominal logistic regression analyses were performed using the data from the validation set (n = 210). Previous macrosomia was excluded from this analysis as it showed no significant difference in the univariate analysis, BMI was also excluded as cases and controls were matched on this parameter. The logistic regression analyses were evaluated by ROC analysis and are listed in Table 4 and S3 Table (extended). When combining family history of diabetes, previous GDM and maternal age in multivariate models, the predictor previous GDM no longer contributed significantly and therefore it was omitted from the remaining analyses (S3 Table). Consequently, each of the three proteins were assessed for their individual contribution as predictors of GDM by combining them with maternal age (model 1), family history of diabetes (model 2) or both (model 3), Tables 3 and 4. When combined with maternal age alone all three proteins contributed significantly to the model. In addition, vitronectin remained a significant contributor when combined with family history of diabetes alone or together with maternal age. Taken together this indicates that vitronectin is an independent predictor of GDM despite the very modest increment in model AUC (0.806 versus 0.798). Finally, we performed a logistic regression analysis, initially

Discussion
Here we have used proteomic profiling of first-trimester sera from obese women with or without GDM to obtain a catalogue of novel biomarker candidates. Of the 548 proteins originally quantified in the discovery study, 25 were identified as significantly different between cases and controls. Six of these were further measured in the validation set where afamin, SAMP and vitronectin could be confirmed as potential predictors of GDM, with only vitronectin adding significantly to the maternal risk factors already used for routine screening. Afamin has recently been associated with GDM by Tramontana et al [29,30], whereas vitronectin, to our knowledge, is new in this aspect. Higher blood levels of vitronectin have been associated with the risk of metabolic syndrome and T2D [31], making it a plausible candidate biomarker of First-trimester serum biomarkers of gestational diabetes mellitus GDM. Increased levels of vitronectin have previously been identified as a risk factor of preeclampsia [31,32] and preeclampsia could thus be considered as a potential confounder in our study. However the number of patients with preeclampsia was not significantly different between our case and control group, also; when removing the individuals with preeclampsia from the data set (n = 12 + n = 5), vitronectin remained a significant contributor in all of the models 1, 2 and 3 with p = 0.002, p = 0.040 and p = 0.046 respectively. As for the credibility of the discovery study, it is noteworthy that adiponectin and SHBG were among the 25 proteins showing initially significantly different levels by the Mann Whitney U test (Table 2). Both proteins have been intensively investigated as biomarkers of GDM [15,20,21,33] and in our previous study, we found adiponectin and SHBG to be significantly different between GDM cases and controls in the same cohort using different assay methods [25]. Furthermore, the confirmation of afamin, SAMP and vitronectin as novel predictors of GDM, by validation with a different method in a second set of samples, also support the validity of the discovery data as a source of potential GDM markers, despite the lack of significant FDR corrected p-values. By contrast, the fact that three proteins could not be confirmed as markers of GDM in our validation set, further emphasize the requirement for subsequent verification in omics-type studies. Due to the sensitivity limit of targeted MS, we were unable to verify the top candidate secreted phosphoprotein 24 (SPP24) using the multiplex validation assay. Further studies are underway to address the potential of this candidate in GDM prediction.
To our knowledge the work presented in this study is the most comprehensive proteomicsbased study in GDM so far. In our biomarker workflow we have performed in depth serum proteomic profiling by depletion of the 14 most abundant serum proteins, TMT-labelling, HILIC fractionation and orbitrap-based LC-MS/MS. By using a fairly large number of individual samples, as opposed to pooled samples [34] or very small (n<10) sample size [35], the discovery study allowed for the display of the inter-individual proteome differences and thereby enabled the identification of protein biomarkers associated with small but consistent significant differences between cases and controls. Likewise, the inclusion of maternal risk factors allowed us to better evaluate the actual contribution of the identified protein predictors in comparison to the GDM screening strategy already clinically employed. In our previous study [25] we have already shown that maternal age could aid in discriminating GDM cases from controls and suggested that it should be included in the Danish screening strategy. This suggestion is made further relevant by the present study where the maternal risk factors have been included and maternal age has been shown to be an independent discriminator, adding to the predictive power of the maternal risk factors separately or combined.
As for the limitations of our study; a well-known weakness of the nested case-control study design, is the potential inflation of biomarker performance, as compared to population-based studies. This however, is a compromise that must be made to better accommodate the rather laborious analytical protocols. In our proteomics workflow, the use of immuno-affinity depletion of the most abundant serum proteins is a known source of experimental bias. While the method allows for a more in depth proteomic profile (more proteins identified), it will also inadvertently remove some untargeted low abundant proteins due to unspecific co-depletion.
While evidence based strategies of GDM screening, diagnosis and treatment continue to be highly debated [36,37] a number of clinical applications for early pregnancy blood-based biomarkers are now emerging. Firstly, universal screening based on a fasting or non-fasting oral glucose challenge test is cumbersome for both patients and healthcare providers and it is performed late in pregnancy. Thus, the benefits of early blood-based screening in combination with risk factors could pave the way for a much better alternative; not only for GDM risk stratification, but also for stratification of GDM-related outcomes such as macrosomia (large for gestational age), preterm birth, caesarean section, hypertensive disorders and shoulder dystocia. Secondly, in Denmark, where a selective screening model based on maternal characteristics and history are recommended, as much as 40% of the pregnant women are referred to oral glucose tolerance test (OGTT) testing, implying the need for a more specific screening model. To this end, addition of one or more biomarkers seems attractive and adaptable to a clinical setting.
In conclusion, this study has provided a comprehensive overview of potential serum protein biomarkers for early GDM prediction and identified vitronectin as a novel independent predictor. Evidently, multivariate models using maternal risk factors in combination with biochemical predictors may improve the discriminative power compared to risk factors alone.
First-trimester serum biomarkers of gestational diabetes mellitus However, the GDM field still awaits the magic bullets to appear from biomarker research; a quest that is long but exiting and driven by novel technological developments.