Common data models to streamline metabolomics processing and annotation, and implementation in a Python pipeline
Fig 5
Applications of pcpfm to analyzing biological datasets.
A) In the Bowen 2023 cardiomyocyte dataset, the pcpfm identifies most of the reported sunitinib-related features in both cell pellets and media using a standard workflow. Asari and pcpfm output both a preferred feature table and a full feature table, the former of higher feature quality and the latter more inclusive. B) The mass track for the sole feature undetected in the Bowen 2023 cell dataset is shown and the suspected undetected peak is in red box (M2_2), which fails to pass Asari’s quality requirement. C) Significant differential metabolite features between sunitinib exposure groups in cell pellets. ANOVA p-values are corrected for multiple testing by Benjamini-Hochberg method. D) Both the pcpfm and MetaboAnalystR were used to extract features from a subset of the CheckMate study. Of 202 compounds in their authentic standard library, MetaboAnalystR identified 167, while the full table from the pcpfm identified 198 of the confirmed features. E) Clustering pattern of the Ansone 2021 cohort using features differentially abundant between treatment groups. F) Example boxplots of differentially abundant features in the Ansone 2021 cohort. F201235 and F201855 (top) were mapped to the same empirical compound that was tentatively annotated as 1,2-DPPC, a pulmonary surfactant by its sole level 4 annotation. Significance was evaluated using ANOVA and post-hoc Tukey’s HSD test in E and F.