PAIRUP-MS: Pathway analysis and imputation to relate unknowns in profiles from mass spectrometry-based metabolite data

doi:10.1371/journal.pcbi.1006734

Table 1.

Characteristics of study cohorts and corresponding metabolomics datasets.

More »

Expand

Fig 1.

PAIRUP-MS is a suite of computational methods for analyzing metabolite signals in untargeted metabolomics data.

(a) Overview of matching method: unknown (or unshared known) signals are imputed across datasets using shared known metabolites as predictors and then paired up based on m/z and correlation across samples. Genetic data can be used to validate matched signal pairs that share similar genetic associations. Matched signals can be used to perform combined association analyses across datasets (e.g. identifying trait-associated signals in discovery and replication cohorts). (b) Overview of pathway method: using binary metabolic pathway annotations and untargeted profiling data as input, a pathway reconstitution procedure is performed to construct a numeric annotation matrix, where each signal (known or unknown) gets a membership score in each pathway (or “metabolite set”, see Methods). Genetic data can be used to validate that signals associated with a specific gene are enriched for reconstituted pathways related to the gene. The annotation matrix can be used to perform pathway analyses (e.g. identifying pathways enriched for a list of trait-associated signals).

More »

Expand

Fig 2.

Genetic validation of OE-MCDS matched signals.

GWAS were performed for “multiple” matched signal pairs (a subset of which were “reciprocal”), shared known pairs (“Shared Known”, positive control), and randomly matched pairs (average statistics shown as “Random”, negative control) in OE and MCDS, followed by meta-analysis that ignored direction of effect. For each pair, the SNP with the best meta-analyzed p-value was selected to assess directional consistency of its association in the two cohorts. (a) “Total # of Pairs”: number of signal pairs with best SNPs below p-value threshold; (b) “# of Consistent Pairs” and (c) “Fraction of Consistent Pairs”: number and fraction of pairs with directionally consistent best SNPs below p-value threshold; (d) “# of True Positive Pairs” and (e) “Fraction of True Positive Pairs”: number and fraction of true positive pairs were estimated as described in Methods. X-axes start at genome-wide significant p-value threshold (p < 5 × 10⁻⁸). Error bars for “Random” pairs were excluded due to low visibility (all close to average).

More »

Expand

Table 2.

Top loci identified during genetic validation of BioAge pathway annotations.

More »

Expand

Fig 3.

Clustered correlation heat map of 107 metabolite sets enriched (5% FDR) for both OE and MCDS BMI-associated signals.

Correlation between metabolite sets were calculated using the BioAge annotation matrix. Color key indicates correlation values between each pair of metabolite sets. Distinct clusters (blue boxes with labels A-J) are highlighted, with representative pathway names in the clustered metabolite sets shown to the right of the heat map.

More »

Expand