A Linear Mixed Model Spline Framework for Analysing Time Course ‘Omics’ Data

doi:10.1371/journal.pone.0134540

Fig 1.

Overview of the analysis framework.

The proposed framework consists of three stages: quality control and filtering; serial modelling of profiles; and analysis with clustering to identify similarities between profiles or with hypothesis testing to identify differences over time, between groups, and/or in group and time interactions.

More »

Expand

Fig 2.

Examples of ‘noisy’ and differentially expressed profiles.

Profiles changing over time (blue) have a mean of the standard deviations per time point (s_T) smaller than the mean of the standard deviations per molecule (s_M), while these means have similar values for noisy molecules (brown). In both cases the mean of the standard deviations per subject (s_I) is similar to s_M.

More »

Expand

Fig 3.

Workflow for the profile cluster analysis.

Trajectories derived from Linear Mixed Model Spline (LMMS) and Derivative Linear Mixed Model Spline (DLMMS) were compared to trajectories derived either from the mean or Smoothing Splines Mixed Effects (SME) models. Five clustering algorithms—hierarchical clustering (HC), kmeans (KM), Self-Organizing Maps (SOM), model-based (model) and Partitioning Around Medoids (PAM) were then applied on modelled trajectories using a range of two to nine clusters. The performance of each algorithm was assessed using the Dunn index. Gene Ontology (GO) term enrichment analysis was performed on each of the obtained clusters.

More »

Expand

Fig 4.

Clustering of filter ratios on proteomic datasets.

Scatterplots of filter ratios R_T on the x-axis against R_I on the y-axis for A) iTraq breast cancer dataset and B) and C) the iTraq kidney rejection dataset for group Allograft Rejection (AR) and Non-Rejection (NR) respectively. Colors indicate clusters from a 2-cluster model-based clustering, with red squares indicating molecules that cluster as ‘informative’ and will remain in the analysis and blue circles indicating ‘non-informative’ molecules that will be removed prior to analysis.

More »

Expand

Fig 5.

Filtering ratios of the Mus musculus data.

The filter ratios R_T and R_I were calculated for every molecule. Colors in A) indicate the -log10(p-values) for differential expression over time and in B) the proportion of missing values. C) is after discarding profiles with > 50% of missing values, with colors as in A).

More »

Expand

Table 1.

Types of models used to summarize profiles.

The number (proportion) of profiles modelled with each model selected by our proposed LMMS approach. Models are abbreviated as linear (LIN), spline (SPL), subject-specific intercept (SSI), and subject-specific intercept and slope (SSIS). Models were applied to cell line breast cancer data (Cell), Saccharomyces paradoxus evolution data (Yeast), Mus musculus chemotherapy data (Mouse), and Homo Sapiens kidney rejection Non-Rejection (NR) data (Human). The row ‘Removed’ indicates the percentage of filtered profiles using the 2-cluster model-based clustering on R_T and R_I.

More »

Expand

Fig 6.

Clustering of the iTraq breast cancer dataset.

Clustering was performed on the summarized profiles obtained from A) Linear Mixed Model Spline (LMMS), B) Derivative Linear Mixed Model Spline (DLMMS), C) mean and D) Smoothing Splines Mixed Effects (SME). The best clustering algorithm and the best number of clusters were chosen according to the Dunn index. In A), B) and D) we used hierarchical clustering and in C) Partitioning Around Medoids (PAM) clustering. The x-axis represents time (in hours) and the y-axis intensity in terms of log₂ transformed protein abundance.

More »

Expand

Table 2.

Simulation results.

Averaged sensitivity for LMMSDE and LIMMA after 100 simulations. Differential expression between groups and/or time was tested with increasing noise and fold change (FC) levels.

More »

Expand

Table 3.

iTraq kidney rejection dataset: Gene Ontology (GO) term enrichment analysis.

GO term enrichement analysis based on the proteins identified by LMMSDE as differentially expressed between Allograft Rejection (AR) and Non-Rejection (NR) patients after filtering using a 2-cluster model-based clustering based on R_T and R_I. The top GO biological processes are listed along with their FDR adjusted p-value and log odds ratio (OR).

More »

Expand