Modeling and Classification of Kinetic Patterns of Dynamic Metabolic Biomarkers in Physical Activity

The objectives of this work were the classification of dynamic metabolic biomarker candidates and the modeling and characterization of kinetic regulatory mechanisms in human metabolism with response to external perturbations by physical activity. Longitudinal metabolic concentration data of 47 individuals from 4 different groups were examined, obtained from a cycle ergometry cohort study. In total, 110 metabolites (within the classes of acylcarnitines, amino acids, and sugars) were measured through a targeted metabolomics approach, combining tandem mass spectrometry (MS/MS) with the concept of stable isotope dilution (SID) for metabolite quantitation. Biomarker candidates were selected by combined analysis of maximum fold changes (MFCs) in concentrations and P-values resulting from statistical hypothesis testing. Characteristic kinetic signatures were identified through a mathematical modeling approach utilizing polynomial fitting. Modeled kinetic signatures were analyzed for groups with similar behavior by applying hierarchical cluster analysis. Kinetic shape templates were characterized, defining different forms of basic kinetic response patterns, such as sustained, early, late, and other forms, that can be used for metabolite classification. Acetylcarnitine (C2), showing a late response pattern and having the highest values in MFC and statistical significance, was classified as late marker and ranked as strong predictor (MFC = 1.97, P < 0.001). In the class of amino acids, highest values were shown for alanine (MFC = 1.42, P < 0.001), classified as late marker and strong predictor. Glucose yields a delayed response pattern, similar to a hockey stick function, being classified as delayed marker and ranked as moderate predictor (MFC = 1.32, P < 0.001). These findings coincide with existing knowledge on central metabolic pathways affected in exercise physiology, such as β-oxidation of fatty acids, glycolysis, and glycogenolysis. The presented modeling approach demonstrates high potential for dynamic biomarker identification and the investigation of kinetic mechanisms in disease or pharmacodynamics studies using MS data from longitudinal cohort studies.

The objectives of this work were the classification of dynamic metabolic biomarker candidates and the modeling and characterization of kinetic regulatory mechanisms in human metabolism with response to external perturbations by physical activity. Longitudinal metabolic concentration data of 47 individuals from 4 different groups were examined, obtained from a cycle ergometry cohort study. In total, 110 metabolites (within the classes of acylcarnitines, amino acids, and sugars) were measured through a targeted metabolomics approach, combining tandem mass spectrometry (MS/MS) with the concept of stable isotope dilution (SID) for metabolite quantitation. Biomarker candidates were selected by combined analysis of maximum fold changes (MFCs) in concentrations and P-values resulting from statistical hypothesis testing. Characteristic kinetic signatures were identified through a mathematical modeling approach utilizing polynomial fitting. Modeled kinetic signatures were analyzed for groups with similar behavior by applying hierarchical cluster analysis. Kinetic shape templates were characterized, defining different forms of basic kinetic response patterns, such as sustained, early, late, and other forms, that can be used for metabolite classification. Acetylcarnitine (C2), showing a late response pattern and having the highest values in MFC and statistical significance, was classified as late marker and ranked as strong predictor (MFC = 1.97, P < 0.001). In the class of amino acids, highest values were shown for alanine (MFC = 1.42, P < 0.001), classified as late marker and strong predictor. Glucose yields a delayed response pattern, similar to a hockey stick function, being classified as delayed marker and ranked as moderate predictor (MFC = 1.32, P < 0.001). These findings coincide with existing knowledge on central metabolic pathways affected in exercise physiology, such as β-oxidation of fatty acids, glycolysis, and glycogenolysis. The presented modeling approach demonstrates high potential for dynamic biomarker identification and the investigation of kinetic mechanisms in disease or pharmacodynamics studies using MS data from longitudinal cohort studies.

Metabolite kinetics-biochemical aspects
Basic principles in reaction kinetics of biomolecules were described by the work of Guldberg & Waage [1][2][3] more than 150 years ago and recently resumed by Voit et al., 2015 [4] in their perspective article "150 years of mass action". The underlying concept is the law of mass action, describing the quantitative aspects of a chemical reaction under ideal conditions. If a substance C is formed by the reaction of substance A and substance B, the production of C can be described by the following equation where A, B, and C are concentrations changing over time, and k is a rate constant describing the speed of the reaction. Probably the most widely known and used modification of the original model in biochemistry is the Michaelis-Menten rate law (MMRL) introduced by Michaelis & Menten in 1913 [5] v ¼ where v is the reaction rate, V max the maximum reaction rate, S the concentration of the substrate, and K m the Michaelis constant (the substrate concentration at half of the maximum reaction rate). The Michaelis-Menten model describes the reaction kinetics of an enzyme-catalyzed singlesubstrate reaction, in which the conversion of a substrate S into a product P takes place via the formation of an intermediate complex ES, where k 1 , k 2 and k 3 denote reaction rates [4] E þ S k2 Guldberg and Waage also examined the fact that biochemical systems tend to remain in homeostasis, which is described by the equilibrium constant [6] K eq is the equilibrium constant in the general reaction aA + bB$cC + dD, where a, b, c, d are the number of molecules of A, B, C, D participating, and [A], [B], [C], [D] are the molar reaction concentrations of the reaction components at equilibrium [7]. When analyzing regulatory mechanisms of metabolite kinetics, a key question addresses the effect of external perturbations disturbing homeostasis, e.g., caused by environmental influences, nutrition, drug interventions (pharmacodynamics) or physical activity (studied in this work through clinical exercise testing). These effects can be measured and examined by longitudinal cohort studies, which investigate dynamic changes in metabolite concentrations over time.
In chronic toxicity testing, which occupies a central position in the analysis of dynamic time-course metabolic data, studies are performed to explore the influences of toxic substances on human or animal metabolism. Mechanisms of metabolite kinetics are analyzed, e.g., by investigating the effect of pesticide exposure on children [8,9], by in-vitro examination of drug induced effects in neurotoxicity using brain cell cultures [10], or by analysis of toxic effects of polymers or nanoparticles to the water flea daphnia magna [11,12]. In biotechnological process monitoring, metabolic interactions are analyzed, e.g., in studying the sensitivity of the biocatalyst Clostridium thermocellum to ethanol stress [13], in exploring the forced ageing process of Port wine [14] or by the examination and optimization of cell culture media, as e.g., of Chinese hamster ovary (CHO) cells [15][16][17]. In pharmacodynamics, time-course data are collected, e.g., to study the effect of continuous exposure of breast cancer cells to an anti-cancer chemotherapy drug on the metabolic level [18] or to explore the metabolism of albumin in patients with systemic inflammatory response syndrome after continuous venovenous hemofiltration [19]. Research questions on kinetic mechanisms in physical exercise cover fundamental work, e.g., on studying the influence of improved metabolic health on patterns in plasma metabolites [20] or analyzing the effects of aerobic exercise on oral glucose tolerance [21].
In this work, in response to an incrementally increased physical load by cycle ergometry and depending on the underlying metabolic regulatory mechanisms, metabolites are expected to show specific kinetic signatures and shape patterns. Expected kinetic response patterns include: a sustained response (mainly constant concentration over time, overlaid with biological or instrumental noise), an early response (main decrease/increase of concentrations shortly after start of activity), a halving interval response (major change in concentration at half time of physical activity, e.g., a sigmoid behavior with a plateau), a late response (strongest decrease/ increase of concentration towards the end of physical activity), and a delayed response pattern (first mainly sustained metabolite concentration, then showing a strong reaction after the end of activity during the recovery phase, respectively).

Modeling of kinetic mechanisms-computational aspects
Regarding computational and mathematical aspects of characterizing kinetic regulatory mechanisms, different approaches of fundamental models for the analysis of metabolic processes have been described in the literature: qualitative models for topological network analysis, models of flux balance analysis using stoichiometric network construction and detailed kinetic models representing metabolic processes using ordinary differential equations (ODEs) [22,23]. Furthermore, different intermediate approaches do exist, e.g., the approach of structural kinetic modeling (SKM), approximating local biochemical mechanisms within a metabolic network by a parametric linear representation [23]. An overview on different "approximative kinetic formats used in metabolic network modeling" is given by Heijnen, 2005 [24]. An example for the dynamic simulation of kinetic mechanisms in metabolism-by simulating the mitochondrial fatty acid βoxidation-is presented by Modre et al., 2009 [25]. Further examples for theoretical network models as well as dynamic kinetic simulations can be found in the context of the e-cell project [26], e.g., models for drosophila [27] or the metabolic simulation of red blood cell storage [28].
With respect to the analysis of dynamic metabolic data, Smilde et al., 2010 [29] distinguish between six groups of methods: methods based on fundamental models, predefined basic functions, dimension reduction, multivariate time series models, analysis-of-variance (ANOVA) type models, and methods based on imposing smoothness. Analyses of periodic or oscillating data can be performed using methods such as Fourier analysis, wavelet transformation or principal component analysis (PCA) with wavelets [29,30]. Hidden Markov models were presented as a way for using basic functions, allowing flexibility and adaptation in modeling [29,31]. In particular, in gene-expression analysis orthogonal polynomials were introduced for qualitative and quantitative modeling [32,33].
Alternative methods for the analysis of longitudinal metabolic data, typically used in nuclear magnetic resonance (NMR) spectroscopy, comprise weighted principal component analysis (WPCA) [34] or analysis of variance (ANOVA) simultaneous component analysis (ASCA) [35]. A statistical framework for metabolic biomarker discovery in NMR data was presented by Berk et al., 2011 [36], introducing a smoothing spline mixed effects (SME) model, combined with an associated functional test statistic. Mishina et al., 1993 [37] suggested analyzing the kinetics of biomolecules by fitting differential equations for the application in pharmacodynamics. A method for investigating between-metabolite relationships by simultaneous component analysis with individual differences constraints (SCA-IND) was presented by Jansen et al., 2012 [38]. A new method for combined analysis of proteomics and metabolomics data using integrative pathway analysis was introduced by Stanberry et al., 2013 [39]. As an example for a web-based, freely accessible online service, Metaboanalyst [40] offers the profiling of longitudinal time-course data on the basis of a multivariate empirical Bayes approach.

Metabolic biomarker discovery
Metabolic biomarkers play an essential role in clinical diagnostics because of their ability to provide specific insights by being functional endpoints of human molecular interactions [41]. The general process for the discovery, verification, and validation of metabolic biomarker candidates was described by Baumgartner & Graber, 2008 [42]. This process ranges from experimental study design, over clinical study execution, execution of bioanalytical methods and acquisition of data, consolidation and integration of data, application of bioinformatics algorithms and data mining methods for the identification of biomarker candidates, up to an independent validation of putative biomarkers by clinical trials. In their review article, Baumgartner et al., 2011 [43] give a comprehensive survey of computational data analysis strategies for the discovery of biomarker candidates from metabolic data.
A milestone in clinical application of metabolic biomarkers was set by establishing routine newborn screening programs for inherited metabolic disorders [44]. The search for novel metabolic biomarkers in disease covers a wide range of clinical application areas, e.g., the identification of metabolic markers in prostate cancer by a rule-based feature selection algorithm [45], the search of early markers as well as late markers in planned and spontaneous myocardial infarction [46,47], the investigation of metabolic mechanisms in diabetes [48][49][50] or the discovery of putative biomarker candidates in chronic kidney disease [51][52][53].

Research objectives
In this article, we present a computational methodology, aimed at the modeling and characterization of kinetic regulatory mechanisms and the discovery of dynamic metabolic biomarker candidates in physical activity. Dynamic time-course metabolic concentration data are generated from a longitudinal biomarker cohort study by standardized cycle ergometry experiments. In total, 110 metabolites (including metabolite classes of acylcarnitines, amino acids and sugars) are quantitated by a targeted metabolomics approach utilizing mass spectrometry. After a thorough examination of the measured concentration data in terms of data quality assurance and reliability, we selected a set of 30 metabolites relevant in exercise physiology and considered them for data analysis and modeling in this work.
Metabolite concentrations of 47 individuals, showing different lengths in their concentration time curves (depending on the individual's maximum physical load), are made comparable by means of data preprocessing. Biomarker candidates are selected depending on maximum fold changes (MFCs) (the amplitude of changes in concentrations) and the corresponding P-values resulting from statistical hypothesis testing. Kinetic signatures of metabolites are quantified by a mathematical modeling approach using polynomial fitting, specifying the dynamic response patterns of analyzed metabolites during physical activity. A similarity measure for characterized metabolite kinetic signatures is obtained through specification of groups of metabolites by hierarchical cluster analysis. Kinetic shape templates are identified, specifying common kinetic response patterns and enabling the classification of dynamic metabolic biomarker candidates according to their kinetic patterns. Findings are verified and interpreted through biochemical and metabolic pathway analyses associated with physical activity.

Selection of dynamic biomarker candidates
Putative dynamic biomarker candidates are selected from the pool of analyzed metabolites by combined analysis of MFCs in concentrations and corresponding P-values from statistical hypothesis testing (see section Maximum fold changes and statistical hypothesis testing). Results for this data analysis step are visualized as a volcano plot (Fig 1). The plot demonstrates log 2 values of MFCs compared to-log 10 values of P-values. A significance level of 0.001 was chosen for the selection of statistical hypothesis testing results (horizontal blue line). Moderate biomarker candidates are classified with a MFC greater than 1.20 (vertical blue line). Strong biomarker candidates are classified with a MFC greater than 1.40 (vertical green line). Detailed data of all analyzed metabolites, including MFCs, log 2 (MFCs), P-values, and-log 10 (P-values), are summarized in Table 1.
For the analyzed classes of metabolites, putative biomarker candidates could be selected and ranked according to MFCs and P-values. . For standardized visualization of profiles, the vertical axis is normalized to a range of-20% to 40% of relative concentration. Note that acetylcarnitine (C2) exceeds this specified range, showing a maximum increase in relative concentration of 67%.
Different kinetic response patterns were observed. The majority of metabolites show a sustained response, e.g., threonine, with basically constant behavior in concentration over time, however overlaid with biological or instrumental noise. An early response pattern is shown with valerylcarnitine (C5) with an early decrease in relative concentration (of approx. -16%) after starting exercise followed by an increase in relative concentration (to a maximum of 13%). Methionine could be identified as a metabolite showing a halving interval response pattern with characteristics similar to a sigmoid function, showing first a sustained reaction, then an increase in relative concentration at half time of physical activity (by approx. 13%) and followed by a plateau (at approx. 9% of relative concentration) towards the end of physical exercise. Metabolites showing a late response pattern are e.g., acetylcarnitine (C2) with a slight decrease (-10%) and then a strong continuous increase in relative concentration (up to 67%) or alanine with a continuous increase (of approx. 32%) up to the end of exercise. Glucose shows a delayed response pattern (similar to a L-curve / hockey stick function, see section Mathematical modeling) with a minor increase in relative concentration (approx. 2%) at the beginning of exercise, followed by a continuous decrease (down to-12%) and a major steep increase in relative concentration (up to 13%) after the end of exercise during the recovery phase.    Table 2. Cluster 1 consists of the two amino acids alanine and arginine. Cluster 2 and cluster 3 comprise a multitude of metabolites of similar metabolite kinetics, which show roughly sustained response patterns. In cluster 4, the metabolites octadecadienylcarnitine (C18:2) and glucose are clustered together. Cluster 5 consists of only acetylcarnitine (C2), the metabolite showing the strongest response. In cluster 6, propionylcarnitine (C3) and butyrylcarnitine (C4) are grouped together. Cluster 7 represents valerylcarnitine (C5), a biomarker candidate showing an early response pattern.

Specification of kinetic shape templates
Kinetic shape templates, serving for the classification of similar metabolite dynamics, could be specified based on the median concentration curves of each identified cluster (see section Hierarchical cluster analysis). Identified shapes and their characteristics are summarized in Fig 5, based on relative concentration changes in reference to the initial concentration at rest. Identifiers of kinetic shape templates hereby correspond to identifiers of resulting metabolite clusters from hierarchical cluster analysis. Templates for sustained response patterns, observed in the majority of metabolites, are specified by shapes 2 and 3. Shape 7 specifies a template for dynamic biomarker candidates, showing an early response pattern (valerylcarnitine (C5)). Shape 1 describes a template showing a late response pattern with a continuous increase in concentration (alanine and arginine). Shapes 5 (acetylcarnitine (C2)) and 6 (propionylcarnitine (C3) and butyrylcarnitine (C4)) define further templates for late response patterns, differing in their dynamics in concentration time courses and maximum concentration changes. Shape 4 demonstrates a template for a delayed response pattern, showing characteristics similar to a L-curve / hockey-stick function (glucose and octadecadienylcarnitine (C18:2)).

Classification of dynamic biomarker candidates
Dynamic metabolic biomarker candidates are identified and classified through a two-step analysis procedure: first, by analysis of MFCs in concentrations and statistical hypothesis testing, and second, by reviewing and characterizing specified metabolic response patterns and kinetic shape templates. The majority of metabolites show a sustained response pattern, staying within an interval of relative MFC of less than 20%, being ineligible as putative biomarker candidates. Valerylcarnitine (C5), yielding an early response pattern, was classified as early marker and moderate predictor (MFC = 1.38, P < 0.001). Methionine shows a halving interval response pattern with a sigmoid behavior, but having a moderate amplitude in concentration (MFC = 1.16, P > 0.001) and was therefore not selected as a biomarker candidate. A late response pattern with weak early decrease in concentration was observed with propionylcarnitine (C3) (strong predictor, MFC = 1.52, P < 0.001), and butyrylcarnitine (C4) (moderate predictor, MFC = 1.27, P < 0.001), both classified as late biomarker candidates. Alanine (strong predictor, MFC = 1.42, P < 0.001) and arginine (moderate predictor, MFC = 1.36, P < 0.001) showed a late response pattern with a continuous increase in concentration from the beginning of exercise and were classified as late markers. Highest concentration changes yielded acetylcarnitine (C2), demonstrating a late response pattern with a very strong increase towards the end of exercise. C2 was ranked as strong predictor (MFC = 1.97, P < 0.001) and classified as late marker. Showing basic delayed response patterns, glucose (MFC = 1.32, P < 0.001) and octadecadienylcarnitine (C18:2) (MFC = 1.21, P < 0.001) were identified as moderate predictors and classified as delayed markers.

Biochemical interpretation of findings
Thanks to an elaborate body of knowledge in biochemistry, a peculiarity within the process of data analysis in metabolomics lies in the dedicated biochemical interpretation of results [54]. This knowledge is nowadays annotated in public databases, e.g., the Kyoto Encyclopedia of Genes and Genomes (KEGG) [55], and eases the interpretation of findings in the context of annotated biochemical pathways.
In exercise physiology, various biochemical reactions in metabolism play an essential role, predominantly in carbohydrate metabolism (glycolysis and glycogenolysis), in lipid metabolism (β-oxidation of free fatty acids) and amino acid metabolism [6]. During a cycle ergometry stress test, an individual increasingly consumes adenosine triphosphate (ATP); to compensate this energy consumption and maintain homeostatic levels of ATP, its production is up-regulated, first primarily by aerobic processes (respiration), and then anaerobic fermentation. Under the low-impact conditions chosen in this study (low initial output of 50 Watt (W) and slow increase of 25 W every three minutes), the metabolic data demonstrate that the body uses both glycolysis and β-oxidation of fatty acids as readily available energy sources, before protein catabolism contributes in a substantial manner. Of course, the pools of monosaccharides and free fatty acids have to be replenished by glycogenolysis and lipolysis, respectively.
The findings of this work, i.e., identified biomarker candidates of exercise metabolism, and characterized metabolite signatures via specified kinetic shape templates, can be explained through the metabolic regulatory mechanisms in physical activity. Significant changes in concentrations of acetylcarnitine (C2) and closely related short-chain acylcarnitines (C3, C4, and C5) arise from their involvement in the β-oxidation of free fatty acids with acetylcarnitine (the single most significant finding) representing the actual end-point of the β-oxidation of evennumbered fatty acids which constitute the vast majority of dietary fatty acids and of fatty acids in the body's adipose tissue. The strong increase in concentrations of alanine and arginine are representative for an increased production of glucogenic amino acids through high glycolytic activity. This connection is most obvious for alanine, which is the corresponding amino acid of the alpha-keto acid pyruvate and is, thus, a direct mirror of glycolytic or gluconeogenetic flux [48]. The third major finding, the overproduction of glucose after the end of the exercise, is due to the inertia of metabolic regulation. In order to supply the glycolysis with enough fuel, glucose has to be released from its storage by glycogenolysis. At the abrupt end of the exercise, the increased activity of the glycogenolytic machinery cannot be stopped immediately and, therefore, leads to overcompensated glucose levels.

Discussion Summary
In this work we have presented a computational modeling and statistics approach for the identification of dynamic metabolic signatures through characterization of kinetic patterns of circulating metabolites from a physical exercise study. Dynamic time-course metabolic concentration data were obtained through clinical exercise testing using a cycle ergometry stress test.
The data of 47 individuals from four different groups were analyzed: male and female test persons, with either average physical activity or competitive athletic activity. Lactate concentrations were measured for all individuals as a gold standard for profiling physical activity. Metabolite concentrations were quantitated by a targeted metabolomics approach, combining mass spectrometry analytics with the concept of stable isotope dilution. From the initial set of 110 metabolites (including classes of acylcarnitines, amino acids and sugars), we selected a reliable and quality assured set of 30 metabolites for data analysis playing a possible role in exercise physiology.
Based on the generic process for biomarker discovery in metabolomics, a computational approach for the analysis of longitudinal metabolic concentration data was developed. Computational tools were implemented in R [56] for automating the data analysis workflow. The source file (R script file) and the underlying dataset (Microsoft Excel file) are provided as supporting information (S1 File and S2 File). Individual workload curves, differing in the number of measurements due to variability in the individual's physical capacity and exertion, were made comparable by data preprocessing steps including rescaling and linear interpolation of concentration-time curves.
Putative dynamic biomarker candidates for physical activity were selected by combined analysis of MFCs in concentrations and P-values of statistical hypothesis testing. Kinetic patterns of analyzed metabolites were characterized based on a mathematical modeling approach utilizing polynomial fitting as the method of choice. Metabolite groups, showing similar kinetic response patterns, were obtained by applying hierarchical cluster analysis to the set of characterized metabolite kinetic patterns. Kinetic shape templates could be specified according to the identified clusters, defining basic kinetic response patterns used for classification of dynamic biomarker candidates.
The following kinetic response patterns could be defined: sustained response (basically constant concentration over time, overlaid with biological and instrumental noise), early response (significant change in concentration at the beginning of exercise), late response (continuous decrease/increase towards the end of activity), and delayed response (first basic sustained response, with a strong response and steep decrease/increase in concentration after the end of the exercise during the recovery phase).
The selected two-step data analysis and modeling strategy including MFCs in concentrations and statistical hypothesis testing, and the modeling of kinetic shape templates led to the identification and classification of dynamic metabolic biomarker candidates for profiling physical activity. The highest values for MFCs and P-values in the analyzed set of metabolites were shown for acetylcarnitine (C2) (MFC = 1.97, P < 0.001), yielding a late response pattern, and being classified as strong predictor and late marker. Alanine showed the highest values in the class of amino acids (MFC = 1.42, P < 0.001) and yielded a late response pattern, being classified as strong predictor and late marker. The only considered sugar, glucose, yet playing a key role in physical activity, yielded a delayed response pattern classified as moderate predictor (MFC = 1.32, P < 0.001) and delayed marker.
In terms of biochemical interpretation, findings were verified and interpreted according to their function in metabolic pathways, associated primarily with physical exercise (β-oxidation of fatty acids, glycolysis, and glycogenolysis). Interestingly, biomarker candidates, identified with the highest predictive value, yielded late response patterns. This might be seen in the context that lactate (also a key indicator for profiling physical activity) first shows an almost sustained response pattern before yielding an exponential increase in concentration up to a maximum physical load. The primary occurrence of late response patterns can be interpreted as a consequence of evolutionary developed regulatory mechanisms in metabolism to keep the individual's metabolic system in homeostasis after external perturbations such as spontaneous physical activity.
Using our computational approach we were able to select and classify dynamic metabolic biomarker candidates and to characterize physiologically plausible metabolite kinetic patterns in physical activity, combining the strengths of statistical testing (hypothesis testing), mathematical modeling (curve fitting) and empiric data analysis (hierarchical cluster analysis).

Methodology
Experimental limitations and confounders in the analyzed data may result from uncertainties about the nutrition of test persons before exercising (recorded in questionnaires but not objectively verifiable), varying individual motivations and consequently different levels of exertion, potential issues during sample taking (e.g., incomplete removal of sweat at the point of puncture), or from general limitations of the analytical approach based on dried blood spots [57]. It should be noted, that at least two test persons obviously consumed nutritional supplements in the form of branch-chained amino acids, influencing the measurement values of xleucine (sum of leucine and isoleucine).
With reference to the selected cohort, it should be noted that the study participants formed a heterogeneous group, i.e., they differed in their level of physical activity and status of training. Therefore, the baseline concentrations and the kinetic patterns may, to a certain extent, depend on the volunteers' differences in physical fitness, or other confounding factors such as anthropometric measures or dietary habits. Although this paper is primarily focused on the methodology for deriving kinetic patterns and not so much on the discovery of exercise-related biochemical mechanisms, the results should be seen with these limitations in mind.
In terms of data preprocessing, the presented data analysis strategy reveals strong indifference towards the handling of outliers because median concentration values are selected from rescaled and interpolated concentration curves. Cut-off values for the selection of metabolic biomarker candidates (utilizing MFCs and P-values) were chosen empirically by reviewing obtained results and assuming that responses, showing changes in concentration within a range of-10% to +10%, are accepted as biochemically and analytically-caused data variability. For kinetic modeling, an empirical approach (instead of applying pre-defined mathematical basic functions) based on polynomial fitting was chosen, allowing for a more physiological characterization of metabolite kinetics. Looking at the complete set of characterized metabolite kinetic signatures, the user can choose an appropriate polynomial degree after visual inspection or by developing a proper statistical quality measure e.g., based on an estimation of the residuals. In a few cases, minor artifacts of approx. 3% in concentration values of the fitted polynomials do occur, obviously resulting from a slight overfitting of curves due to the chosen polynomial degree.
Identification of groups of similarly behaving metabolites by hierarchical cluster analysis is somewhat affected by the number of interpolated points in the concentration curves and by the degree chosen for fitted polynomials. A higher number of interpolation points as well as different degrees of polynomials were tested, showing high stability in cluster analysis, however, at a lower node height of the dendrogram the arrangement of single metabolites changes slightly between the clusters. Note that the selection of clusters basically depends on the chosen height (cut-off) of the hierarchical tree. Depending on the selected cut-off value, two metabolites, i.e., methylmalonylcarnitine (C3-DC-M) and hydroxyvalerylcarnitine (C5-OH) might also be classified as additional biomarker candidates, interesting for further investigation. Specification of kinetic shape templates finally builds upon the number of specified clusters, depending computationally on the selected cut-off in hierarchical cluster analysis and biochemically on the eligibility and meaningfulness of clustered templates in terms of metabolite kinetics.

Previous work
Metabolic concentration data used in this study have served as a database for the development and validation of novel data mining and biomarker discovery strategies in previously published studies by our group. In Netzer et al., 2011 [58] we presented a two-step network-based approach for the identification of metabolic biomarkers, classifying alanine, acetylcarnitine (C2), propionylcarnitine (C3), and glycine, as strong, and arginine, citrulline, and lysine as moderate biomarker candidates, represented as major hubs in the dynamic network. These findings show high accordance with identified dynamic metabolic biomarker candidates in physical activity using the approach presented in this work, again selecting alanine, acetylcarnitine (C2), propionylcarnitine (C3) as strong predictors, and arginine as moderate marker candidate.
In a second paper [59] we introduced a method for biomarker identification by inferring two different types of networks, i.e., correlation networks and ratio networks. This more theoretical approach calculates scores to prioritize features using topological descriptors. Groups of obese test persons (with a body mass index (BMI) > 30) and healthy controls were compared in this study, which identified highly discriminatory biomarker candidates, i.e., histidine, ornithine, acetylcarnitine (C2), and proline.

Conclusions
In this article, we have presented a computational methodology for dynamic biomarker classification and modeling of kinetic metabolic patterns in physical activity. Insight into kinetic regulatory mechanisms could be provided by characterizing specific kinetic signatures for selected key metabolites within the groups of acylcarnitines and amino acids, and for glucose. A new data analysis strategy for the characterization and classification of dynamic biomarker candidates was introduced. We were able to specify common kinetic shape templates, identified from groups of metabolites showing a similar characteristic in dynamic time-course responses. Findings demonstrated high accordance with previously published data and established biochemical knowledge, e.g., the response of glucose, showing a behavior similar to a hockey stick function with a delayed increase in concentration after the end of physical exercise during the recovery phase.
Due to the selected study design of a cycle ergometry experiment, in which physical exercise was increased incrementally (every 3 minutes by 25 W), known kinetic patterns could be partly confirmed by our observations, in particular in response to the selected workload protocol. Major impact of the presented methodology can be seen in the fact that kinetic mechanisms in metabolism could be qualified and quantified not only through a "strong" mathematical model, but by an empiric deduction and description of de facto kinetic response patterns from quantitated metabolic time-course concentration data, measured under in-vivo experimental conditions. A further direction of research might be the analysis of additional classes of metabolites and the description and interpretation of kinetic patterns subsequent to active exercise in the recovery phase. Especially for glucose-which increases rapidly in concentration within the analyzed interval of the recovery phase-a prolonged examination time would be highly interesting, since glucose might be expected to be classified as strong predictor. From the computational viewpoint, a very challenging task would be the development of in-silico pathway models, integrating the identified kinetic signatures into a theoretical mathematical model for hypothesis generation and verification (see e.g., Teusink et al., 2000 [60]). The development of a kinetic model based on an ordinary differential equations (ODEs) description including kinetic parameters selected from our research might be an aim for additional research which, however, is beyond the scope of this article.
The approach presented in this work also shows high potential for contributing to other application areas such as pathophysiology and pharmacodynamics. In pharmacodynamics and toxicology (particularly in chronic toxicity testing), for instance, it might be applied to assess treatment effects more accurately by profiling metabolite levels over time instead of looking at end-points only (see [29]). In many complex diseases, the dynamic analysis may well identify more meaningful biomarkers and reveal a deeper insight into the actual pathomechanisms. To name one important example that is actually very close to the present study: in chronic obstructive pulmonary disease (COPD), physical exercise-and bicycle ergometry in particular-is commonly used to assess the severity of the disease and also to model exacerbations of the patients' condition [61,62]. In this setting, a dynamic depiction of the metabolic changes clearly has the potential to resolve regulatory mechanisms and distinguish cause and effect of the observed alterations (Christian Schudt, personal communication). This is especially plausible for the pathway leading to the synthesis of inflammation mediators such as prostaglandins, leukotrienes, thromboxanes etc., which is closely associated with the pathology of the disease and depends on the release of polyunsaturated fatty acids from phospholipids in a stoichiometric manner [63][64][65][66].
In this article, main focus was put on the development of a computational methodology to examine longitudinal metabolic concentration data and to present a basic approach for the mathematical modeling and statistical analysis of dynamic kinetic metabolic mechanisms. As previously stated (see section Methodology), individual metabolic response patterns are partly influenced by different factors such as physical fitness and training status, anthropometric parameters or dietary habits. Because of limitations in the specification and verification of the observed metabolic kinetic patterns, a further research goal might be to systematically investigate the underlying metabolic and physiological regulatory mechanisms by conducting additional hypothesis-driven prospective cohort studies. Furthermore, an extension of this paper is planned that will compare specific groups of interest, e.g., defined with regard to training status (response in lactate increases) or anthropometric characteristics.
Referring to the practical execution of exercise physiology experiments, it should be noted that most commonly only one blood sample is collected, usually after the end of exercise. However, the results of the presented work clearly demonstrate that characterized metabolites show a very differential kinetic characteristic during physical activity. Consequently one-point measurements may lead to misinterpretations and emphasize an obvious need for multiple measurements in exercise physiology (typically before, multiple times during, and after exercise).

Ethics statement
This study was conducted in full accordance with the principles expressed in the Declaration of Helsinki. Written informed consent was obtained from all study participants, together with a detailed questionnaire on nutrition and health status. In addition, a physician subjected all individuals to a detailed examination to ensure that they could undergo the cycle ergometry test without health risks, and this physician was also present at all times during the exercise to monitor the electrocardiogram (ECG) that was continuously recorded. All laboratory work and data analysis was conducted anonymously.

Experimental study design
In this work, longitudinal metabolic concentration data were obtained through clinical exercise testing using a cycle ergometry stress test. General guidance for clinical exercise testing can be found in "Guidelines for Clinical Exercise Testing Laboratories" [67] and "Recommendations for Clinical Exercise Laboratories" [68]. General recommendations for cycle ergometry studies were described by Driss & Vandewalle, 2013 [69], providing technical and clinical protocols including limitations for study design and execution. The overall cycle ergometry experiment was designed as a longitudinal biomarker cohort study, with 47 persons divided into 4 different groups, i.e., male and female individuals, with either average physical activity or competitive athletic activity. Study participants included amateur endurance athletes (16 males / 8 females) and professional alpine skiers (11 males / 12 females). The anthropometric characteristics of the study participants (age, body mass index (BMI), height, and weight) are summarized in Table 3. Detailed information on anthropometric data, the general training status, and the physical load during the cycle ergometer experiment are provided as supporting information (S3 File).

Clinical study execution
The workload of the cycle ergometry test was increased incrementally by 25 W every 3 minutes up to the individual's maximum physical load (the basic scheme of the study protocol is depicted in S1 Fig). The initial workload level of 25 W was skipped for all individuals, starting the exercise with a workload of 50 W. The lowest observed maximum workload was 150 W (one individual), and the highest workload level was 425 W, also reached by one individual. From each individual blood samples for metabolite profiling were taken (i) at rest (directly before starting the exercise), (ii) with each new workload level up to individual's maximum physical load, and (iii) after a short recovery phase of five minutes after the maximum workload (highest Watt level). In addition, for all test subjects, lactate concentrations were measured as a gold standard and reference for assessing physical activity. Concentration-time curves of preprocessed lactate data are visualized in S2 Fig. Median values of lactate concentrations were roughly 1.2 mM at rest, 8.5 mM at maximum workload, and 7.2 mM after recovery. According to the study protocol, lactate samples were taken at 1:30 min, samples for metabolite profiling at 2:30 min after starting a new ergometry workload level. All samples were taken from the earlobe, collected as dried blood spots (DBS) [57] and analyzed under standardized study conditions.

Targeted metabolite quantitation
In metabolomics, two basic conceptual approaches are used: untargeted and targeted metabolite profiling methods. Untargeted metabolomics seeks to create a holistic picture of metabolism by trying to identify a comprehensive set of metabolites as a snapshot of a metabolic state, while targeted metabolomics aims at a quantitation of pre-selected metabolites defined by a priori knowledge [70,71]. The two state-of-the-art technologies for analyzing metabolites are nuclear magnetic resonance (NMR) spectroscopy [72] and mass spectrometry (MS) [73]. Dynamic time-course metabolic concentration values, building the basis for data analysis and modeling in this work, were gathered from a targeted metabolomics approach [70,74,75], using triple quadrupole tandem mass spectrometry (MS/MS) [76] coupled with the concept of stable isotope dilution (SID) [77] for metabolite quantitation.

Data analysis workflow
Central steps of the selected data analysis workflow include the technical validation of raw data, preprocessing of data, selection of putative dynamic biomarker candidates, mathematical modeling and characterization of metabolite kinetic patterns, identification of metabolite groups with similar kinetic behavior, specification of observed kinetic shape templates, classification of dynamic biomarker candidates and the biochemical interpretation of findings. A flowchart of the used data analysis workflow is shown in Fig 6, representing the whole datadriven process for the discovery of biomarkers in metabolomics. Results from the different steps of the data analysis workflow are exemplarily shown and visualized for glucose, a key analyte, playing a central role in metabolism of exercise physiology and demonstrating a very specific kinetic pattern in response to physical activity.

Raw data-descriptive analysis and visualization
Raw data of the cycle ergometry experiment were test-wise reviewed and visualized in two different basic ways to obtain a better understanding about the nature of the metabolic timecourse data. Concentration data were initially analyzed by building subsets of data, referring to the levels of each individual's maximum physical load. For each subset a box plot was generated, visualizing the specific measurements (data points of the horizontal axis) versus the metabolite concentrations (see section Clinical study execution). In S3 Fig, resulting

Data preprocessing
Different lengths of metabolic concentration-time curves, resulting from the variability of each individual's maximum physical load, were made comparable by rescaling the data in time Measurement at rest was defined as 0%, maximum workload of each individual as 100% and recovery value as median value of 117%-resulting in an aligned workload curve to a uniform length. This requires additional data points added to the concentration curves using a linear interpolation approach (Fig 7A). Metabolic concentration-time curves underwent simple descriptive analysis by generating a box plot representation from rescaled concentration curves (Fig 7B). In a next step, median concentration values were extracted from interpolated concentration curves (S6 Fig), serving as a basis for mathematical modeling by curve fitting (see section Mathematical modeling). This approach perfectly treats the problem of outliers in the data without the need of applying additional methods for outlier detection and removal. However, a small set of extreme outliers was observed that was manually removed after careful visual inspection (in test person no. 14 all data points at recovery, in test person no. 35 all data points at 175 W and in test person no. 42 the data point for glucose at rest). Regarding missing concentration values it should be noted that our dataset was almost complete, except missing values at individuals' maximum workload in 12 test persons and at 150 W in one individual (no. 9), representing the last data points in the concentration time curves.

Maximum fold changes and statistical hypothesis testing
Maximum fold changes in metabolite concentrations and P-values of statistical hypothesis testing serve as a score for the ranking of putative biomarker candidates (see section Selection of dynamic biomarker candidates). The combination of fold changes and P-values, visualized using a volcano plot, is described in the literature as method of choice for the analysis and visualization of significant changes (e.g., on microarray data [78,79] or in diverse metabolomics applications [80][81][82]). As a general approach, especially in genomics studies, this method is usually used for data comparing the starting and end point of dynamic processes such as regulation of gene expression.
In this work, utilizing longitudinal time course concentration data, MFCs are calculated by the difference between observed minimum and maximum concentration values of a metabolite, independently from their timely occurrence. MFCs are calculated based on median Interpolated minimum and maximum concentration values of all 30 metabolites were assessed with respect to their density distribution by visual inspection using graphical methods such as histogram analysis [83] and quantile-quantile plots [84]. A Shapiro-Wilk Normality test was applied for normality testing of both minimum and maximum concentration data (significance level P = 0.01) [85][86][87]. Metabolites hereby yielded inhomogeneous distributions (e.g., normal distribution for histidine, octadecanoylcarnitine (C18) or glycine, non-normal distribution e.g., for xLeucine, citrulline or proline, and partly differences in distributions between minimum and maximum concentrations, e.g., for arginine). To ensure comparability between metabolites, a Wilcoxon Signed Rank Test [88] (non-parametric hypothesis test for paired samples) was used for the calculation of P-values (significance level P = 0.001). Since ranks are used for paired hypothesis testing, identical P-values are partly shown for some metabolites. Finally, calculated P-values were adjusted according to the false discovery rate (FDR) correction for multiple comparisons [89].

Mathematical modeling
The initial goal of our work was the mathematical modeling of metabolite kinetic patterns and shape templates, utilizing a set of predefined mathematical basis functions [29]. However, the introduction of predefined basic functions for the analysis of dynamic metabolomics data is and remains a challenge as also discussed by Smilde et al., 2010 [29]. Note that putative basic functions in this work are associated with kinetic patterns in response to linear increasing physical activity and can be basically classified into the following set of shape templates: a. a sustained response pattern, showing a mainly constant concentration over time, overlaid with biological or instrument noise, represented e.g., by a constant function f ðxÞ ¼ c ð7Þ b. an early response pattern, with an early significant change in concentration, mathematically described e.g., by a logarithmic function c. a halving interval response, showing a change in concentration at the half time of exercise with a following plateau, represented e.g., by a sigmoid basis function d. a late response, with a continuous decrease/increase up to the end of physical activity, described e.g., by a linear, a quadratic, or an exponential function e. a delayed response pattern, showing first a sustained characteristic followed by a strong response after the end of physical exercise during the recovery phase, mathematically represented by a so-called L-curve or hockey stick function [90,91].
Fitting the above-introduced basic functions to measured concentration-time curves was thoroughly examined and tested with the goal to characterize kinetic response patterns according to these theoretical models. In this analysis curve fitting was performed using median metabolite concentration values, extracted from interpolated concentration-time curves (see section Data preprocessing). Our preliminary results demonstrated that the approach of fitting the pre-defined set of mathematical basis functions was not feasible for the measured response curves caused by an incremental increase of physical workload. We therefore revised our initial concept by utilizing polynomial fitting of preprocessed data. This modality enables us to design kinetic response patterns that are physiologically reasonable and relevant. Polynomials of degree n are defined by following equation: After testing different polynomial degrees, we decided for a degree of nine, showing the best results in terms of curve/shape representation and smoothness (S7 Fig). To ensure comparability of analyzed metabolites after polynomial fitting, relative concentration values were calculated (in percentage of concentration changes with respect to the initial concentration at rest) (see Fig 4).
Note that there are multiple applications in metabolomics using polynomial fitting, e.g. for baseline correction [92,93], prediction of germination curves [94], calculation of mass correction profiles [95] or in spectrum deconvolution [96].

Hierarchical cluster analysis
Metabolite groups, showing similar kinetic response patterns, were examined and identified using hierarchical cluster analysis [97]. Cluster analysis was performed based on the concentration values of the fitted polynomials of 9 th degree (see section Mathematical modeling).
Results are visualized as a heatmap in