The blood metabolome of incident kidney cancer: A case–control study nested within the MetKid consortium

Background Excess bodyweight and related metabolic perturbations have been implicated in kidney cancer aetiology, but the specific molecular mechanisms underlying these relationships are poorly understood. In this study, we sought to identify circulating metabolites that predispose kidney cancer and to evaluate the extent to which they are influenced by body mass index (BMI). Methods and findings We assessed the association between circulating levels of 1,416 metabolites and incident kidney cancer using pre-diagnostic blood samples from up to 1,305 kidney cancer case–control pairs from 5 prospective cohort studies. Cases were diagnosed on average 8 years after blood collection. We found 25 metabolites robustly associated with kidney cancer risk. In particular, 14 glycerophospholipids (GPLs) were inversely associated with risk, including 8 phosphatidylcholines (PCs) and 2 plasmalogens. The PC with the strongest association was PC ae C34:3 with an odds ratio (OR) for 1 standard deviation (SD) increment of 0.75 (95% confidence interval [CI]: 0.68 to 0.83, p = 2.6 × 10−8). In contrast, 4 amino acids, including glutamate (OR for 1 SD = 1.39, 95% CI: 1.20 to 1.60, p = 1.6 × 10−5), were positively associated with risk. Adjusting for BMI partly attenuated the risk association for some—but not all—metabolites, whereas other known risk factors of kidney cancer, such as smoking and alcohol consumption, had minimal impact on the observed associations. A mendelian randomisation (MR) analysis of the influence of BMI on the blood metabolome highlighted that some metabolites associated with kidney cancer risk are influenced by BMI. Specifically, elevated BMI appeared to decrease levels of several GPLs that were also found inversely associated with kidney cancer risk (e.g., −0.17 SD change [ßBMI] in 1-(1-enyl-palmitoyl)-2-linoleoyl-GPC (P-16:0/18:2) levels per SD change in BMI, p = 3.4 × 10−5). BMI was also associated with increased levels of glutamate (ßBMI: 0.12, p = 1.5 × 10−3). While our results were robust across the participating studies, they were limited to study participants of European descent, and it will, therefore, be important to evaluate if our findings can be generalised to populations with different genetic backgrounds. Conclusions This study suggests a potentially important role of the blood metabolome in kidney cancer aetiology by highlighting a wide range of metabolites associated with the risk of developing kidney cancer and the extent to which changes in levels of these metabolites are driven by BMI—the principal modifiable risk factor of kidney cancer.

underlying the Mendelian randomization analyses are available at the University of Bristol data repository, data.bris, at https://doi.org/10.5523/ bris.33bq35s9lbos026r1xukxijoqu. Individual level data and GWAS results from EPIC-Norfolk can be requested by bona fide researchers for specified scientific purposes via the study website (https:// www.mrc-epid.cam.ac.uk/research/studies/epicnorfolk/). Data will either be shared through an institutional data sharing agreement or arrangements will be made for analyses to be conducted remotely without the necessity for data transfer. For information about accessing individual level data and GWAS results from the INTERVAL BioResource, please contact helpdesk@intervalstudy.org.uk. For information about accessing individual level data from the Fenland Study please contact datasharing@mrcepid.cam.ac.uk. The Biocrates GWAS results from the Fenland Study and the z-score-based metaanalysis are available at: https://omicscience.org/ apps/crossplatform/. Metabolon GWAS results will be made available via www.omicscience.org and can until then be requested by contacting omicscience.org@gmail.com.

Introduction
Kidney cancer is the 14th most common cancer worldwide, with renal cell carcinoma (RCC) making up the majority of cases [1]. There are important geographical variations in kidney cancer incidence that are only partly understood [2]. Excess bodyweight and related conditions, such as hypertension, diabetes, and related metabolic perturbations, are among the most robustly implicated risk factors for kidney cancer, with support from both traditional observational studies and genetic studies [2][3][4][5][6][7]. For instance, in the United Kingdom, an estimated 24% of kidney cancer cases are attributable to overweight and obesity, making this the leading modifiable risk factor for the disease [8]. Germline mutations responsible for an inherited predisposition to kidney cancer (a small proportion of kidney cancer cases) have a key role in regulating cellular metabolism [9], and this, together with evidence of extensive metabolic reprogramming within tumours themselves [10], have led to the characterisation of kidney cancer as a metabolic disease. However, the molecular mechanisms predisposing kidney cancer remain largely unknown. Given the likely metabolic underpinnings of kidney cancer, studies of circulating metabolites, the downstream products of cellular regulatory processes, may improve our understanding into pathways relevant to kidney cancer aetiology [11].
Metabolite variations are the result of genetic and nongenetic factors and provide a readout of physiological functions [12]. Metabolomics technologies based on mass spectrometry (MS) and nuclear magnetic resonance (NMR) have enabled the systematic quantification of hundreds of metabolites (the "metabolome") from a single biological sample. The analysis of metabolites has enabled a more thorough exploration of an individual's metabolic status, providing important insights into the biological pathways leading to diseases such as cancer [11,13,14] and has enabled the discovery and development of new drug targets [15]. Already, global metabolic profiling of blood [16][17][18][19], urine [20][21][22][23][24], and tissue samples [24][25][26][27] has been used to characterise kidney cancer and identify novel potential diagnostic biomarkers. However, because of the cross-sectional or retrospective design of these studies, they could not inform the identification of biomarkers for incident disease development. Prospective cohort studies, where healthy individuals initially donate blood at recruitment and are longitudinally followed over time for incident disease, can circumvent many of the problems of retrospective study designs-particularly where the focus is on identifying risk factors for disease onset.
The aim of this study was to identify circulating metabolites associated with the development of kidney cancer in a prospective case-control framework. We used 2 complementary metabolomics platforms [28] to quantify over 1,000 metabolites in blood samples donated by research participants later diagnosed with kidney cancer along with matched control participants. In a series of follow-up analyses, including a 2-sample mendelian randomisation (MR) analysis, which uses genetic variants as proxies for an exposure of interest [29], we evaluated the extent to which the metabolomic signature of disease risk could be explained by body mass index (BMI), the leading modifiable risk factor for kidney cancer.

Analytical strategy (Fig 1)
The primary analysis was predefined and involved investigating the association between circulating levels of metabolites and kidney cancer risk using pre-diagnostic metabolomics measurements in a case-control study nested within multiple large-scale prospective cohorts (the MetKid consortium). Adjustment for known risk factors for kidney cancer (BMI, hypertension, alcohol consumption, and smoking) [2] was then carried out to evaluate the extent to which these could explain the associations between blood metabolites and kidney cancer risk.
A natural complementary analysis would have been to interrogate the potentially causal role for the identified risk-associated metabolites in kidney cancer aetiology through MR analyses. However, given the methodological constraints of MR in this context, specifically, widespread pleiotropic instruments, which would violate the MR assumptions, we chose not to pursue this analysis. Our analysis plan was therefore revised, and as a secondary analysis, we rather used a 2-sample MR approach to estimate the causal effect of BMI on the blood metabolome. This analysis complemented the main risk analysis by quantifying the extent to which BMI-the central risk factor of kidney cancer-influenced the identified risk metabolites. This study is reported as per the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) and STROBE-MR guidelines (S1 and S2 Tables) [30,31].  Table; details of the cohorts are described in the S1 Methods). Cases were defined as participants diagnosed with incident malignant neoplasm of the kidney or renal pelvis (International Classification of Diseases for Oncology, 3rd Edition [ICD-O-3] code C64/C65) who gave a blood sample at recruitment. In each independent cohort, one randomly selected control without history of kidney cancer was matched to each case based on age, sex, and date of blood collection. There were small variations between the cohorts in the tightness by which controls were matched to cases according to their age and date of blood draw (see S1 Methods), owing to inherent differences in demography and availability of controls. The study was approved by the International Agency for Research on Cancer (IARC) Ethics Committee.

Metabolite data acquisition and quality control
Plasma and serum samples from 2,614 participants (1,307 cases and 1,307 controls) were analysed. Samples from all cohorts were analysed using the Biocrates targeted MS assay. Samples from EPIC and NSHDS (n = 1,596) were additionally analysed using Metabolon's untargeted MS platform. Samples from matched case-control pairs were assayed in adjacent wells (in random order) and in the same analytical batch. Laboratory personnel were blinded to case-control status of the samples. An overview of the quality control (QC) pipeline is shown in S1 Fig. All the QC steps were performed for each cohort separately before pooling the data.
Targeted metabolomics-Biocrates. All samples from EPIC and MCCS were assayed at the IARC, while samples from NSHDS, HUNT, and the Estonian BB were assayed by the Metabolomics Core Facility of the Genome Analysis Center of the Helmholtz Zentrum München [32]. The targeted metabolomics approach was based on LC-ESI-MS/MS and FIA-E-SI-MS/MS measurements using the AbsoluteIDQ p180 Kit (BIOCRATES Life Sciences, Innsbruck, Austria). The assay allows simultaneous quantification of 188 metabolites using 10-μL plasma or serum. Sample preparation and MS measurements were performed as described in S1 Methods. The median intra-and inter-batch coefficients of variation (CV) were 5.6% and 6.9%, respectively (interquartile range = 1.7% and 2.8%, respectively). The lower limits of detection (LODs) were set to 3 times the values of the zero samples (PBS solution).
Values lower than the lower limit of quantification (LLOQ) or higher than the upper limit of quantification (ULOQ), as well as lower than batch-specific LOD (for compounds semiquantified: acylcarnitines, glycerophospholipids (GPLs), and sphingolipids), were imputed with half of the LOD/LLOQ or the ULOQ. For NSHDS, metabolites with internal standard out of range were left as missing (n = 205). Metabolites with less than 100 values above LOD/LLOQ in any individual cohort were excluded from the analyses. In our samples, a total of 164 metabolites were retained for statistical analyses (30 acylcarnitines, 21 amino acids, 10 biogenic amines, 88 GPLs, 14 sphingolipids, and the sum of hexoses). In addition to individual metabolites, 22 ratios or sums selected for their capacity to provide detailed insight into a wide range of disorders of the metabolic disease spectrum were computed (listed in Table B in S3 Table). Among them, the Fischer ratio, a clinical indicator of liver metabolism and function, was calculated as the molar ratio of branched chain amino acids (leucine + isoleucine + valine) to aromatic amino acids (phenylalanine + tyrosine). Lower Fischer ratio values are associated with liver dysfunction. Untargeted metabolomics-Metabolon. Untargeted metabolomic analyses were performed at Metabolon (Durham, North Carolina, United States of America) on a platform consisting of 4 independent ultra-high performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) methods. Detailed descriptions of the platform and workflow to identify features, including extraction of raw data, peak identification, and internal quality control (QC) processes can be found in the S1 Methods and in published work [33][34][35]. Samples from EPIC and NSHDS were processed as 2 independent experimental batches. The median intra-batch CV were 5% and 4% for EPIC and NSHDS, respectively, while the median inter-batch CV were 11% for both EPIC and NSHDS. A variety of curation procedures were Conceptual framework of the study design. This study includes 3 main analytical steps: (i) the investigation of the associations between circulating levels of metabolites and kidney cancer risk using pre-diagnostic measurements in a case-control study nested within multiple large-scale prospective cohorts; (ii) the assessment of the causal effect of BMI, the leading modifiable risk factor for kidney cancer, on circulating metabolites levels; and (iii) the evaluation of the overlap between the metabolic footprint of BMI and that of kidney cancer risk. The orange X's indicate the time at which a participant is diagnosed with kidney cancer when his follow-up is stopped. Controls  carried out by Metabolon to ensure that a high-quality data set was made available for statistical analysis and data interpretation (S1 Methods). Each metabolite was rescaled to set the median equal to 1 and missing values imputed with the minimum observed value. Data returned for EPIC comprised a total of 1,308 metabolite features, 982 of known identity (named biochemicals) and 326 compounds of unknown structural identity (unnamed biochemicals). Data returned for NSHDS comprised a total of 1,302 metabolite features, 979 of known identity (named biochemicals) and 323 compounds of unknown structural identity (unnamed biochemicals). A total of 1,275 metabolites were available across the 2 data sets with the total number of unique metabolites reaching 1,335. Metabolites were categorised by Metabolon as belonging to 1 of 8 mutually exclusive chemical classes: amino acids and amino acid derivatives (subsequently referred to as "amino acids"), carbohydrates, cofactors and vitamins, energy metabolites, lipids, nucleotides, peptides, or xenobiotics. An asterisk ( � ) at the end of the metabolite name indicates the metabolite identity has not been confirmed by comparison with an authentic chemical standard. After the exclusion of metabolites for which less than 100 participants had values recorded (86 and 176 for EPIC and NSHDS, respectively), 1,230 metabolite features remained for analysis (1,222 and 1,126 for EPIC and NSHDS, respectively; 1,118 in common).

Statistical analysis
Primary statistical analysis: Prospective observational analysis of circulating metabolites and kidney cancer risk. Log-transformed and standardised (z-score) metabolite concentrations were used in all analyses. Crude conditional logistic regressions were performed to estimate the odds ratio (OR) for kidney cancer per 1 standard deviation (SD) increment in log-transformed metabolite concentrations, conditioning on the individual case-control sets. To consider multiple comparisons while accounting for the correlation between the different metabolites, we estimated the effective number of independent tests (ENT) performed as the number of principal components explaining more than 95% of the variance in our metabolite matrices. Metabolites with p-values equal or below 0.05/ENT in the pooled analyses and equal or below 0.05 in at least 2 cohorts independently were deemed robustly associated with kidney cancer risk. For these metabolites, we carried out additional conditional logistic regressions adjusted for BMI, smoking history (smoking status: never, former, current smokers, and pack years of smoking), lifetime alcohol consumption (in g/day), and hypertension (ever/never). To avoid comparing different sets of participants due to missingness in risk factor data, we restricted these analyses to study participants with complete risk factor information.
To further characterise the epidemiological properties of the association between metabolites and kidney cancer risk, we also carried out conditional logistic regression stratified by age at blood collection, sex, country, BMI, waist-to-hip ratio, smoking status, alcohol consumption, hypertension, and time to diagnosis (number of years between blood draw and diagnosis).
Secondary statistical analysis: mendelian randomisation and profile comparison analyses. We initially investigated pleiotropy among potential SNP instruments for the circulating metabolites associated with kidney cancer risk in prospective analyses (Biocrates and Metabolon) with a view to conducting a 2-sample MR analysis for metabolites (as the exposure) and kidney cancer risk (as the outcome). SNP-metabolite associations were extracted from the largest genome-wide association studies (GWASs) currently available for circulating metabolites and included summary statistics for 174 Biocrates metabolites [36] (N = ranged from 8,569 to 56,040 for different metabolites, depending on the platform used in each contributing study) and 913 Metabolon metabolites (N = 14,296). Specifically, pleiotropy was assessed by estimating the variance explained in all metabolites by the single nucleotide polymorphisms (SNPs) (i.e., the potential "instruments") associated with each of our candidate risk metabolites (see S1 Methods for more details of how instruments were selected). Where the variance explained in other metabolites (i.e., those not associated with risk in the prospective analysis) was similar to that explained in the candidate risk metabolite, we inferred low metabolite specificity for current GWAS results, and thus violation of the MR assumptions necessary to infer potential single exposure causality.
To evaluate the extent to which the metabolomic signature of disease risk could be explained by BMI, we first conducted a 2-sample MR analysis to provide estimates of the causal relationships between BMI and circulating metabolites (Biocrates and Metabolon). A total of 549 independent SNPs (R 2 < 0.01) that were robustly associated with BMI at genomewide significance were selected as instruments from the largest GWAS meta-analysis for BMI from the Genetic Investigation of Anthropometric Traits (GIANT) consortium (n = approximately 700,000 [37]; see Table C in S3 Table). SNP-exposure associations were extracted from the BMI GWAS meta-analysis [37], and SNP-outcome associations were extracted from the metabolite GWAS described above. A BMI effect estimate was generated for each metabolite measured and calculated as an SD unit increase in log-transformed metabolite level per SD increment in BMI. The primary MR analysis was conducted using the inverse-variance weighted (IVW) method [38]. We performed the following sensitivity analyses to attempt to account for potential unbalanced horizontal pleiotropy: (1) MR-Egger regression to test overall directional pleiotropy and provide a valid causal estimate, taking into account the presence of pleiotropy [39]; and (2) weighted median [40], which provides a consistent estimate of causal effect if at least 50% of the information in the analysis comes from variants that are valid instrumental variables. To account for multiple testing, we used the same p-value threshold as used in our observational analyses (p < 8.3 × 10 −4 and p < 1 × 10 −4 for Biocrates and Metabolon, respectively).
To examine the extent to which kidney cancer-associated metabolites are driven by BMI, we assessed the correlation between the kidney cancer-associated metabolite profile (metabolites associated with kidney cancer risk in the prospective observational analyses) and the BMI-associated metabolite profile (metabolites associated with BMI levels in the MR analyses) using Spearman rank correlation analyses. Effect estimates from both the prospective and MR analyses were divided by the standard error of the estimate before conducting the correlation analyses.
Negative control analyses. The presence or absence of overlap between metabolite profiles flagged by prospective analysis and those derived from BMI MR is only informative in the context of a null or negative control comparator. To allow this, we repeated the profile comparison analysis described above (with BMI as the exposure) in an analysis in which we used dental disease as a negative control exposure (i.e., an exposure not likely to be a risk factor for kidney cancer) and one that we would therefore expect to deliver a null. This strategy of repeating an experiment under conditions that are expected to deliver a null result has previously been advocated within observational epidemiology [41]. In our analysis of the causal relationship between dental disease and circulating metabolites, 47 independent (R 2 < 0.01) SNPs that were robustly associated at genome-wide significance (p < 5 × 10 −8 ) were selected from the largest GWAS for dental disease (n = 487,823) (detailed information for instrumental variables for dental disease are presented in Table D in S3 Table). SNP-exposure associations were extracted from the largest dental disease GWAS meta-analysis [42], and SNP-outcome associations were extracted from the metabolite GWAS described above. Effect estimates were calculated as SD unit increase in metabolite levels per logOR increase in dental disease. Methods used in the 2-sample MR analyses were as described above.

Population characteristics and metabolites overview
Demographic and baseline characteristics for the 1,305 cases and 1,305 matched controls are presented in Table 1. The mean age at diagnosis for cases was 65.6 years (SD = 9.79), and cases were diagnosed on average 8 years after blood collection. The majority (58%) of samples were collected after fewer than 6 hours of fasting. Overall, 186 metabolites or ratios/sums of metabolites were measured using the Biocrates assay on 2,610 samples (all cohorts), and 1,230 metabolites were measured using the Metabolon platform on 1,596 samples (EPIC and NSHDS cohorts). Mean concentrations of the 1,416 metabolites by case-control status are shown in Table E in S3 Table. Prospective observational analysis of circulating metabolites and kidney cancer risk We identified 25 metabolites robustly associated with kidney cancer risk (i.e., metabolites associated with risk after correction for multiple testing in the pooled analysis and nominally significant in at least 2 cohorts; Fig 2, Table 2). Among these metabolites, 12 were measured with the Biocrates assay, and 13 were measured with the Metabolon platform. Two metabolitesglutamate and 1-linoleoyl-GPC (18:2) (known as lysoPC a C18:2 in Biocrates)-were measured on both platforms and resulted in similar risk association estimates (for glutamate OR: Among 274 metabolites involved in amino acid metabolism, we found 4 positively associated with kidney cancer risk, including glutamate, formiminoglutamate, hydantoin-5-propionate and the Fischer ratio (p-values between 1.25 × 10 −4 and 5.11 × 10 −7 ). For example, the relative odds of kidney cancer associated with an SD increment in log-transformed glutamate levels was estimated at 1.39 (95% CI: 1.20 to 1.60) when measured on the Metabolon platform. Another amino acid, cysteine-glutathione disulphide was inversely associated with risk (OR: 0.77, 95% CI: 0.69 to 0.86, p = 7.42 × 10 −6 ). The 2 peptides gamma-glutamylvaline (p = 1.22 × 10 −7 ) and gamma-glutamylisoleucine (p = 1.07 × 10 −6 ) were positively associated with risk. Finally, we found beta-cryptoxanthin negatively associated with kidney cancer risk (OR: 0.73, 95% CI: 0.65, 0.83, p = 4.83 × 10 −7 ), while an unidentified metabolite (X-12096) was positively associated (OR: 1.33, 95% CI: 1.17, 1.51, p = 9.97 × 10 −6 ). Adjusting for the fasting status of the samples (more versus less than 6 hours) did not modify the OR estimates for the identified risk metabolites (Table F in S3 Table).

The influence of kidney cancer risk factors on kidney cancer-associated metabolites
We assessed the extent to which known modifiable risk factors could explain the observed associations by multivariable analyses. For all 25 metabolites found to be associated with risk in the primary analysis, we found that adjustments for BMI partly attenuated the OR estimates for some metabolites, although they all remained at least nominally significant (i.e., p-value below 0.05, Table 2). The association most modified by adjustment for BMI was that of glutamate (from 1.34, 95% CI: 1.17 to 1.53, p = 1.62 × 10 −5 to 1.24, 95% CI: 1.08 to 1.42, p = 2.46 × 10 −3 ), followed by PC ae C42:3 and PC aa C42:1 (OR increased by 6% for both metabolites: from 0.82, 95% CI: 0.74 to 0.92, p = 4.17 × 10 −4 to 0.87, 95% CI: 0.78 to 0.98, p = 1.75 × 10 −2 and 0.83, 95% CI: 0.75 to 0.93, p = 6.27 × 10 −4 to 0.88, 95% CI: 0.79 to 0.99, p = 2.59 × 10 −2 for PC ae C42:3 and PC aa C42:1, respectively). Conversely, association for PC ae C38:6 was not influenced by adjustment for BMI (OR:0.85, 95% CI: 0.77 to 0.93, p = 5.06 × 10 −4 to 0.86, 95% CI: 00.78 to 0.95, p = 1.85 × 10 −3 ). Results adjusted for all individual risk factors on participants with complete information on these risk factors are shown in Table H in S3 Table (N = 1,162 and 996 for Biocrates and Metabolon, respectively). Adjustment for smoking and alcohol consumption did not modify any OR by more than 1.5% and 1.2%, respectively, whereas adjusting for hypertension partly attenuated the associations of lysoPC a C18:1 and lysoPC a C18:2, albeit to a lesser extent than BMI (5% change for both). In fully adjusted models, risk associations remained nominally significant (p-value below 0.05) for 10 out of 25 metabolites with all effect estimates in the same direction as in the primary analysis, although, due to missing data for some risk factors, this analysis included only 581 and 498 case-control pairs for Biocrates and Metabolon, respectively.

PLOS MEDICINE
The blood metabolome of incident kidney cancer measured by Biocrates, showed a stronger association when alcohol consumption was above the median compared to lower (heterogeneity p = 0.03) (Fig D in S3 Fig); this pattern was evident for the same metabolite measured in Metabolon but was not statistically significant (heterogeneity p = 0.3) (Fig P in S3 Fig).

Two-sample mendelian randomisation and profile comparison analyses
We identified genetic instruments for 17 of the 25 risk metabolites but observed substantial pleiotropy for the instruments defined for 16 of the 17 instrumented metabolites. The total variance explained from a risk metabolite's instruments was typically similar across classes of metabolite (lipids and 1-(1-enyl-palmitoyl)-2-linoleoyl-GPC (P-16:0/18:2), for example) and

PLOS MEDICINE
The blood metabolome of incident kidney cancer far from specific to the given risk metabolite being instrumented. Further, the variance explained was often higher for an alternative metabolite compared to the risk metabolite (see Figs A-Q in S4 Fig). Following these observations, we chose not to carry out a formal MR analysis of the relation between individual metabolites and kidney cancer risk because the profound pleiotropy across metabolites clearly violates the MR assumptions. Rather, to complement the risk analyses, and to gain further understanding of how BMIthe leading modifiable risk factor of kidney cancer-might explain our findings, we conducted a 2-sample MR analysis to evaluate the extent to which the measured metabolites are driven by differences in BMI. Using the IVW method, 60 metabolites (22 Biocrates and 38 Metabolon) were associated with BMI. In an MR framework, there was consistent evidence between both platforms that BMI was associated with decreased concentrations of many GPLs and increased concentrations of several amino acids and nucleotides, as well as acylcarnitines, sphingomyelins, and several metabolites of unknown identity (S5 Fig). Estimates from MR-Egger and weighted median analyses were consistent with the IVW estimates (Tables I and J in S3 Table).
When comparing the metabolic profile of kidney cancer (metabolites associated with kidney cancer risk in the prospective analyses) and BMI (metabolites associated with BMI levels in the MR analyses), we observed moderate correlation between the BMI-driven metabolite profile and metabolite profile associated with kidney cancer risk (Fig 3) (r = 0.53, p = 2.2 × 10 −6 for Biocrates metabolites and r = 0.36, p = 2.2 × 10 −6 for Metabolon metabolites). Specifically, elevated BMI appeared to decrease levels of several GPLs that were also found inversely associated with kidney cancer risk, including 1-(1-enyl-palmitoyl)-2-linoleoyl-GPC (P-16:0/18:2) � , 1-linoleoyl-GPC (18:2) (lysoPC a C18:2), lysoPC a C18:1, and PC ae C34:3. For instance, 1 SD increment in BMI was associated with a 0.17 SD decrease in 1-(1-enyl-palmitoyl)-2-linoleoyl-GPC (P-16:0/18:2) levels ([ß BMI ], p = 3.4 × 10 −5 ). We also found that BMI was associated with increased levels of glutamate (ß BMI : 0.12, p = 1.5 × 10 −3 ), which was positively associated with kidney cancer risk. Several metabolites associated with kidney cancer risk in our prospective analysis did not appear to be strongly influenced by BMI, but we note that for all but 2 metabolites (PC ae 32:2 and PC ae 42:3), estimates were directionally concordant (i.e., positively correlated) but with the effect size estimates from the BMI MR being closer to the null than those seen in the observational analysis. Conversely, some of the metabolites that were most strongly affected by BMI (e.g., phenylalanine and valine) were not associated with kidney cancer risk.

Negative control analyses
There was little evidence that genetic predisposition to dental disease influenced circulating metabolite levels with no metabolites reaching our predetermined threshold for a statistically significant association (Tables K and L in S3 Table). We observed low correlation between the dental disease metabolite estimates from MR analyses and the kidney cancer metabolite estimates from the prospective analysis for both Biocrates (r = 0.15, p = 0.06) and Metabolon metabolites (r = 0.12, p = 0.002) (S5 Fig). None of the 25 metabolites that were associated with kidney cancer risk in prospective analyses were associated with dental disease from the MR analyses ( S5 Fig). These findings suggest that when the profile comparison analysis is conducted using a hypothetically unrelated exposure (dental disease), we see no meaningful relationship between metabolite associations from the prospective analysis and the MR.

Discussion
This study describes the relationship between the pre-diagnostic blood-metabolome and risk of developing kidney cancer based on data from 5 longitudinal population cohorts. This is the Fig 3. Scatter plot comparing the metabolite profile associated with kidney cancer from prospective observational analyses with the BMI-driven metabolite profile from MR analyses. Metabolites that are labelled have a p-value below the threshold (p < 0.05/ENTs) in the prospective pooled analyses and are nominally significant in at least 2 cohorts separately. Metabolites measured by the Biocrates platform that are below the p-value threshold are represented by triangles, those measured by the Metabolon platform that are below the p-value threshold are represented by dots, and those that are measured by either the Biocrates or the Metabolon platform that are above the p-value threshold are represented by an x. � Metabolite identity not yet confirmed by comparison with an authentic chemical standard. On the y-axis, the OR and SE were derived from the logistic regression analyses conditioned on case set estimating the associations between circulating metabolites and kidney cancer risk in 5 prospective cohorts. On the x-axis, the beta and SE were derived from the MR analyses evaluating the effect of BMI on circulating metabolites levels. BMI, body mass index; ENT, effective number of test; MR, mendelian randomisation; OR, odds ratio; SE, standard error. https://doi.org/10.1371/journal.pmed.1003786.g003

PLOS MEDICINE
The blood metabolome of incident kidney cancer first comprehensive metabolomics analysis of incident kidney cancer to be conducted using a prospective design, and as such, complements existing work characterising the metabolic profile (in tissue and biofluids) of the disease itself [16][17][18][19][20][21][22][23][24][25][26]. We investigated 1,416 metabolites in relation to the occurrence of kidney cancer using 2 complementary analytical methods and observed 25 metabolites to be robustly associated with risk. These metabolites included 14 GPLs inversely associated with risk, 5 amino acids positively associated, and 1 inversely associated with risk, as well as risk associations for a carotenoid, 2 peptides, a nucleotide, and an unidentified feature. Results of an MR analysis designed to evaluate the extent to which BMI influences the key risk-associated metabolites suggest that differences in BMI may be responsible for part of the metabolite profile associated with the development of kidney cancer.
The majority of metabolites found to be associated with kidney cancer risk in this study can be classified as GPLs. GPLs are the main component of cell membranes and are essential for maintaining cellular structure and for regulating cell signalling. The circulating metabolite associations we see here pre-diagnosis appear to intersect with the known cellular metabolic programming observed within kidney tumour tissue. For example, it has been proposed that clear cell RCC cells use exogenous lipids for membrane formation and cell signalling [44]. The relationship between lipid metabolites and prospective kidney cancer risk reported in our study could, theoretically, be capturing increased uptake of lipid metabolites by preclinical kidney carcinogenesis.
GPLs can be broadly classified into 2 types based on their biochemical structure-diacyl (aa) or acyl-alkyl (ae)-and can be further characterised according to their lipid side chain composition, specifically the number of carbons and their degree of (un)saturation (number of double bonds). The association of a subset of long chain unsaturated (mainly acyl-alkyl) PCs, lysophophatidylcholines (LPCs), and plasmalogens with reduced kidney cancer risk is consistent with some limited existing literature. Specifically, lower levels of total PC/choline have been reported in the serum of diagnosed kidney cancer patients compared to control participants [17], and numerous studies have found decreased LPCs in both tumour and normal kidney tissues [27,45,46], as well as in the circulation of kidney cancer patients [18,47]. The mechanisms underpinning these associations are not well understood, but some of these molecules (e.g., plasmalogens) have been proposed as antioxidants [48]. Low levels of plasmalogens in cancer patients have been proposed as a potential mechanism by which increased oxidative stress could drive cancer progression [49].
We assessed the extent to which known risk factors could explain the observed metabolite associations and observed that adjusting for BMI-the main modifiable risk factor for kidney cancer-partially attenuated (less than 9% change in OR) the risk association for some specific metabolites. To further understand the relation with BMI for the kidney cancer risk-associated metabolites, we estimated the causal influence of BMI on metabolite levels using MR. This analysis clearly demonstrated that some-but not all-metabolites inversely associated with kidney cancer risk are also decreased by elevated BMI (e.g., several GPLs), whereas other metabolites positively associated with risk (e.g., glutamate) are also increased by elevated BMI. The association of long chain unsaturated (mainly acyl-alkyl) GPLs with both lower risk of RCC and lower BMI is consistent with extensive literature linking lower levels of these and similar molecules to a range of common diseases that include a metabolic component such as obesity and hypertension [50][51][52], type 2 diabetes [53], type 1 diabetes development [54], and nonalcoholic fatty liver disease [55].
Glutamate was found to be positively associated with both kidney cancer risk and BMI and was also the metabolite for which adjusting for BMI resulted in the greatest attenuation in its OR estimate. Glutamate and glutamine are both found to be increased in kidney tumour tissue [44]. This observation provides further evidence of overlap between metabolites relevant to disease development and those whose levels are perturbed in the disease state [10,56]. Consistent with our findings, glutamate has previously been shown to be increased in visceral obesity [57,58], and glutamine-derived glutamate has been linked to tumour cell metabolism [59], with RCC being no exception [60]. α-Ketoglutarate, generated from glutamine-derived glutamate, enters the tricarboxylic acid (TCA) cycle providing both energy and biosynthetic intermediates [61]. A large intracellular glutamate pool is also important for nonessential amino acid synthesis in addition to cellular redox regulation [61]. Two previous NMR-based studies found lower levels of glutamine in serum of kidney cancer cases taken at diagnosis compared to controls [16,17]. While we did not identify a robust association of glutamine in our study, the point estimate was consistent with a weak inverse association with risk of kidney cancer.
A final overarching observation was that in comparison with previously published prospective metabolomics analyses on other cancer sites [62][63][64], the sheer number of metabolites found to be associated with risk in the current study suggests that the blood metabolome is particularly important in the aetiology of kidney cancer.

Strengths, limitations, and prospects for future studies
The chief strength of our study was the design of the primary risk analysis wherein control participants were individually matched to incident kidney cancer cases with pre-diagnostic blood samples from 5 independent population cohorts, a design that minimised differential bias and allowed for identification of novel and robust risk metabolites of kidney cancer. The use of 2 complementary metabolomics platforms also increased the overall coverage of the metabolome. The well-characterised cohorts offered the opportunity to carefully assess the influence of known kidney cancer risk factors (i.e., potential confounders) on identified risk-associated metabolites, as well as the robustness of their risk associations across the independent cohort studies. Well-designed prospective studies can provide compelling evidence in favour of a role of molecular risk factors in cancer aetiology, but residual confounding from imperfectly measured risk factors may still bias the association estimates. We therefore complemented the main risk analysis with a genetic analysis to assess the influence of BMI on the identified risk metabolites. We believe that this independent analysis provided important independent evidence when interpreting the relation between the identified risk metabolites and kidney cancer risk in the context of BMI-the principal risk factor of kidney cancer.
Limitations of our study include the presence of measurement error in the (semi-) quantification of metabolites. However, by using well-established platforms with built-in validation procedures along with randomisation schemes to ensure any batch variation was orthogonal to the outcome of interest (in this case kidney cancer case status), we can be confident there was no systematic bias in our estimates as a result of measurement error. In addition, the consistency in estimates we see for metabolites that appear on both platforms provides increased confidence in our results, but we note that statistical power to identify risk metabolites exclusive to the Metabolon platform was lower than for metabolites exclusive to the Biocrates platform due to the lower sample size. In this study, we focused on those metabolites that demonstrated consistency in risk associations across the 5 participating cohorts. While this approach ensured the robustness of the estimates, any risk marker present in specific populations would not be highlighted. Although we only measured metabolite levels at a single time point, we do not believe this represents a major limitation as the majority of measured metabolites have a high within person stability over time (stable over 4 months to 2 years) [65][66][67]. Another limitation of our study is the lack of detailed data on body composition. It is possible that some individual risk markers may reflect a certain adiposity distribution that is specifically strongly associated with kidney cancer risk. While the current literature on kidney cancer aetiology does not highlight any specific aspect of obesity as being particularly important in kidney cancer aetiology, evaluating the identified risk markers in relation to detailed body composition (e.g., using DEXA scan data) represents an appealing future focus of our kidney cancer research. The remaining limitations relate to the generalisability of our findings. Given evidence for specific metabolic alterations by kidney cancer histotype [10], it is possible that kidney cancer subtypes have different dependencies on circulating metabolites. In this case, findings from this study are likely most relevant to the major histological subtype-clear cell RCC-which made up 71% of kidney cancer cases. Furthermore, our study does not inform on the extent to which the identified risk markers translate to populations of non-European descent. Addressing these limitations should constitute an important focus for future studies addressing the role of the blood metabolome in the aetiology of kidney cancer.
While the results of our prospective risk analysis are consistent with circulating metabolites playing an important role in kidney cancer aetiology, it is appealing to complement such observational analyses with MR studies to further inform causal inference. However, we chose not to carry out an MR analysis on kidney cancer risk for individual metabolites for a number of reasons related to characteristics specific to circulating metabolites. Firstly, owing to high correlational structure of many metabolites, few SNPs have been found associated with specific metabolites, leading to pleiotropic instruments for most metabolites [36]. Secondly, there is a high degree of pleiotropy for metabolite-associated SNPs with modifiable risk factors and other disease endpoints. That few metabolites have a sufficient number of instruments is particularly problematic as applying statistical methods aiming to correct for these biases is not possible (e.g., MR-Egger and MR-PRESSO), nor is the use of techniques designed to evaluate the effect of multiple correlated exposures (e.g., multivariable MR [68]). While the genetic architecture of blood metabolites is complicated for the reasons outlined above, there are hundreds of independent SNPs robustly associated with BMI [37], and this gave us greater confidence in the application of this analysis [69]. Better characterising of the genetic architecture of circulating metabolites together with methodological advancements may allow for more robust causal inference in future metabolomics studies.

Conclusions
This study points to a particularly important role of the blood metabolome in kidney cancer aetiology, specifically by identifying positive risk associations for several amino acids, as well as negative risk associations with multiple lipids, including PCs, LPCs, and plasmalogens. Downstream analyses indicated that some-but not all-risk metabolites are influenced by BMI, which partly explains their associations with kidney cancer risk, whereas the risk associations for other metabolites could not be explained by known risk factors. These results provide important insight into the metabolic pathways underpinning the central role of obesity in kidney cancer aetiology and clues to novel pathways involved in kidney cancer aetiology.
Supporting information S1 Methods. Supporting information methods. (DOCX) S1  Table. Table A: Population characteristics of the kidney cancer cases and controls from 5 independent cohorts with pre-diagnostic blood samples included in our analyses, by cohort. Table B: Description of the predefined sums and ratios of metabolites measured with the Biocrates assay as described in the manufacturer's documentation. Table C: Genetic instruments associated with BMI used in 2-sample MR analyses. Table D: Genetic instruments associated with dental disease used in 2-sample MR analyses. Table E: Mean concentration of metabolites by case or control status (in μmol/L and raw area counts for Biocrates and Metablon, respectively). Table F: Associations between levels of circulating metabolites and kidney cancer risk adjusted for fasting status. Table G: Crude associations between levels of circulating metabolites and kidney cancer risk. Table H: Associations between levels of circulating metabolites and kidney cancer risk adjusted for risk factors (for metabolites robustly associated with kidney cancer risk in the crude analyses and restricted to participants with complete risk factor information).  Forest plots depicts the kidney cancer risk association for 1-linoleoyl-GPC (18:2), stratified by risk factors . Fig Q: Forest plots depicts the kidney cancer risk association for beta-cryptoxanthin, stratified by risk factors . Fig R: Forest plots depicts the kidney cancer risk association for cysteine-glutathione disulphide, stratified by risk factors . Fig S: Forest plots depicts the kidney cancer risk association for formiminoglutamate, stratified by risk factors . Fig T: Forest plots depicts the kidney cancer risk association for gamma-glutamylisoleucine � , stratified by risk factors . Fig U: Forest plots depicts the kidney cancer risk association for gamma-glutamylvaline, stratified by risk factors . Fig V: Forest plots depicts the kidney cancer risk association for glutamate (Metabolon), stratified by risk factors . Fig W: Forest plots depicts the kidney cancer risk association for hydantoin-5-propionate, stratified by risk factors . Fig X: Forest plots depicts the kidney cancer risk association for N1-methyladenosine, stratified by risk factors.  Fig G: Scatter plots of the cumulative variance explained by the genome-wide significant (p < 5 × 10 − 8) independent (R 2 < 0.01) SNPs for N1-methyladenosine (Metabolon). Fig H: Scatter plots of the cumulative variance explained by the genome-wide significant (p < 5 × 10 − 8) independent (R 2 < 0.01) SNPs for PC ae C34:3 (Biocrates). Fig I: Scatter plots of the cumulative variance explained by the genome-wide significant (p < 5 × 10 − 8) independent (R 2 < 0.01) SNPs for lysoPC a C18:2 (Biocrates). Fig J: Scatter plots of the cumulative variance explained by the genome-wide significant (p < 5 × 10 − 8) independent (R 2 < 0.01) SNPs for PC ae C34:2 (Biocrates). Fig K: Scatter plots of the cumulative variance explained by the genome-wide significant (p < 5 × 10 − 8) independent (R 2 < 0.01) SNPs for lysoPC a C18:1 (Biocrates). Fig L: Scatter plots of the cumulative variance explained by the genome-wide significant (p < 5 × 10 − 8) independent (R 2 < 0.01) SNPs for PC ae C40:1 (Biocrates) . Fig M: Scatter plots of the cumulative variance explained by the genome-wide significant (p < 5 × 10 − 8) independent (R 2 < 0.01) SNPs for PC ae C32:2 (Biocrates). Fig N: Scatter plots of the cumulative variance explained by the genome-wide significant (p < 5 × 10 − 8) independent (R 2 < 0.01) SNPs for PC ae C36:3 (Biocrates). Fig O: Scatter plots of the cumulative variance explained by the genome-wide significant (p < 5 × 10 − 8) independent (R 2 < 0.01) SNPs for PC ae C42:3 (Biocrates). Fig P: Scatter plots of the cumulative variance explained by the genome-wide significant (p < 5 × 10 − 8) independent (R 2 < 0.01) SNPs for PC ae C38:6 (Biocrates).