Fig 1.
Overview of the analysis framework.
The proposed framework consists of three stages: quality control and filtering; serial modelling of profiles; and analysis with clustering to identify similarities between profiles or with hypothesis testing to identify differences over time, between groups, and/or in group and time interactions.
Fig 2.
Examples of ‘noisy’ and differentially expressed profiles.
Profiles changing over time (blue) have a mean of the standard deviations per time point (sT) smaller than the mean of the standard deviations per molecule (sM), while these means have similar values for noisy molecules (brown). In both cases the mean of the standard deviations per subject (sI) is similar to sM.
Fig 3.
Workflow for the profile cluster analysis.
Trajectories derived from Linear Mixed Model Spline (LMMS) and Derivative Linear Mixed Model Spline (DLMMS) were compared to trajectories derived either from the mean or Smoothing Splines Mixed Effects (SME) models. Five clustering algorithms—hierarchical clustering (HC), kmeans (KM), Self-Organizing Maps (SOM), model-based (model) and Partitioning Around Medoids (PAM) were then applied on modelled trajectories using a range of two to nine clusters. The performance of each algorithm was assessed using the Dunn index. Gene Ontology (GO) term enrichment analysis was performed on each of the obtained clusters.
Fig 4.
Clustering of filter ratios on proteomic datasets.
Scatterplots of filter ratios RT on the x-axis against RI on the y-axis for A) iTraq breast cancer dataset and B) and C) the iTraq kidney rejection dataset for group Allograft Rejection (AR) and Non-Rejection (NR) respectively. Colors indicate clusters from a 2-cluster model-based clustering, with red squares indicating molecules that cluster as ‘informative’ and will remain in the analysis and blue circles indicating ‘non-informative’ molecules that will be removed prior to analysis.
Fig 5.
Filtering ratios of the Mus musculus data.
The filter ratios RT and RI were calculated for every molecule. Colors in A) indicate the -log10(p-values) for differential expression over time and in B) the proportion of missing values. C) is after discarding profiles with > 50% of missing values, with colors as in A).
Table 1.
Types of models used to summarize profiles.
The number (proportion) of profiles modelled with each model selected by our proposed LMMS approach. Models are abbreviated as linear (LIN), spline (SPL), subject-specific intercept (SSI), and subject-specific intercept and slope (SSIS). Models were applied to cell line breast cancer data (Cell), Saccharomyces paradoxus evolution data (Yeast), Mus musculus chemotherapy data (Mouse), and Homo Sapiens kidney rejection Non-Rejection (NR) data (Human). The row ‘Removed’ indicates the percentage of filtered profiles using the 2-cluster model-based clustering on RT and RI.
Fig 6.
Clustering of the iTraq breast cancer dataset.
Clustering was performed on the summarized profiles obtained from A) Linear Mixed Model Spline (LMMS), B) Derivative Linear Mixed Model Spline (DLMMS), C) mean and D) Smoothing Splines Mixed Effects (SME). The best clustering algorithm and the best number of clusters were chosen according to the Dunn index. In A), B) and D) we used hierarchical clustering and in C) Partitioning Around Medoids (PAM) clustering. The x-axis represents time (in hours) and the y-axis intensity in terms of log2 transformed protein abundance.
Table 2.
Averaged sensitivity for LMMSDE and LIMMA after 100 simulations. Differential expression between groups and/or time was tested with increasing noise and fold change (FC) levels.
Table 3.
iTraq kidney rejection dataset: Gene Ontology (GO) term enrichment analysis.
GO term enrichement analysis based on the proteins identified by LMMSDE as differentially expressed between Allograft Rejection (AR) and Non-Rejection (NR) patients after filtering using a 2-cluster model-based clustering based on RT and RI. The top GO biological processes are listed along with their FDR adjusted p-value and log odds ratio (OR).