PIIKA 2: An Expanded, Web-Based Platform for Analysis of Kinome Microarray Data

Kinome microarrays are comprised of peptides that act as phosphorylation targets for protein kinases. This platform is growing in popularity due to its ability to measure phosphorylation-mediated cellular signaling in a high-throughput manner. While software for analyzing data from DNA microarrays has also been used for kinome arrays, differences between the two technologies and associated biologies previously led us to develop Platform for Intelligent, Integrated Kinome Analysis (PIIKA), a software tool customized for the analysis of data from kinome arrays. Here, we report the development of PIIKA 2, a significantly improved version with new features and improvements in the areas of clustering, statistical analysis, and data visualization. Among other additions to the original PIIKA, PIIKA 2 now allows the user to: evaluate statistically how well groups of samples cluster together; identify sets of peptides that have consistent phosphorylation patterns among groups of samples; perform hierarchical clustering analysis with bootstrapping; view false negative probabilities and positive and negative predictive values for t-tests between pairs of samples; easily assess experimental reproducibility; and visualize the data using volcano plots, scatterplots, and interactive three-dimensional principal component analyses. Also new in PIIKA 2 is a web-based interface, which allows users unfamiliar with command-line tools to easily provide input and download the results. Collectively, the additions and improvements described here enhance both the breadth and depth of analyses available, simplify the user interface, and make the software an even more valuable tool for the analysis of kinome microarray data. Both the web-based and stand-alone versions of PIIKA 2 can be accessed via http://saphire.usask.ca.


PCA biosub
This directory will be present only if a file is uploaded for the "treatment-control combinations" field.
Contains files related to PCA of the various treatment-control combinations.
• PCA biosub.txt-A table in tab-delimited text format containing the values for the first three principal components for each treatment-control combination.
• PCA biosub.vrml-Contains a 3D visualization of the PCA for each treatment-control combination in Virtual Reality Modeling Language (VRML) format. To view this file, you can use a VRML viewer such as Instant Player (http://www.instantreality.org).
• PCA PC1 PC2.pdf-A two-dimensional scatterplot depicting the first two principal components, with the coordinates coming from PCA biosub.txt.
• PCA PC2 PC3.pdf-A two-dimensional scatterplot depicting the second and third principal components, with the coordinates coming from PCA biosub.txt.
• PCA PC1 PC2 PC3.pdf-A three-dimensional scatterplot depicting the first three principal components, with the coordinates coming from PCA biosub.txt.

biological reproducibility
This directory will be present only if the "Perform F test?" option is set to "Yes".
Contains files relating to the biological reproducibility of the array data (i.e. the consistency of the phosphorylation signal for each peptide in the different animals (biological replicates) for which the experiment was performed).
• F test consistent peptides.txt-For each peptide, its value will be "TRUE" if that peptide is consistent according to the F test for all treatments, and "FALSE" otherwise.
• F test pvalues.txt-Contains the P-value according to the F test for each peptide for each treatment.
• biological reproducibility summary.txt-Gives the number of peptides that were biologically consistent according to the F test for each treatment, as well as the range of values and average of these values.

distances
Contains files giving numeric representations of the similarity of pairs of samples.
• distances euclidean.txt-For each pair of samples, contains the Euclidean distance between that pair. Let n represent the number of peptides. Then the Euclidean distance is calculated as , where x i is the averaged (among all technical and biological replicates) intensity level for peptide i for the first sample, and y i is the corresponding value for the second sample.
• distances pearson.txt-For each pair of samples, contains the value (1 -Pearson correlation) for that pair. This is calculated using the cor function in R with method = "pearson".

distances biosub
This directory will be present only if a file is uploaded for the "treatment-control combinations" field.
Contains files giving numeric representations of the similarity of pairs of treatment-control combinations.
• distances biosub euclidean.txt-For each pair of treatment-control combinations, contains the Euclidean distance between that pair. Let n represent the number of peptides. Then the Euclidean distance is calculated as , where x i is the averaged (among all technical and biological replicates) intensity level for peptide i for the first treatment-control combination, and y i is the corresponding value for the second treatment-control combination.
• distances biosub pearson.txt-For each pair of treatment-control combinations, contains the value (1 -Pearson correlation) for that pair. This is calculated using the cor function in R with method = "pearson".

distances significant
Contains files giving numeric representations of the similarity of pairs of samples, but taking into account only the peptides that have a statistically significant difference in phosphorylation for that pair.
• distances euclidean.txt-For each pair of samples, contains the Euclidean distance between that pair, taking into account only the peptides for which the P-value from the paired t-test is less than the user-specified threshold. So that different pairs of samples can be compared, this value is then normalized by the number of significant peptides for that pair. Let n represent the number of peptides for which the paired t-test gives a P-value less than the specified threshold. Then the normalized Euclidean distance is calculated as 1 , where x i is the averaged (among all technical and biological replicates) intensity level for peptide i for the first sample, and y i is the corresponding value for the second sample.
• distances pearson.txt-For each pair of samples, contains the value (1 -Pearson correlation) for that pair. This is calculated using the cor function in R with method = "pearson". As with the Euclidean distance, only peptides for which the P-value from the paired t-test is less than the userspecified threshold are used in the calculation, and the resulting value is divided by the number of significant peptides so that different pairs of samples can be compared on the same scale.

distances biosub significant
This directory will be present only if a file is uploaded for the "treatment-control combinations" field.
Contains files giving numeric representations of the similarity of pairs of treatment-control combinations, but taking into account only the peptides that have a statistically significant difference in phosphorylation (after biological subtraction) for that pair.
• distances biosub significant euclidean.txt-For each pair of treatment-control combinations, contains the Euclidean distance between that pair, taking into account only the peptides for which the P-value from the paired t-test is less than the user-specified threshold. So that different pairs of treatment-control combinations can be compared, this value is then normalized by the number of significant peptides for that pair. Let n represent the number of peptides for which the paired t-test gives a P-value less than the specified threshold. Then the normalized Euclidean distance is calculated as 1 , where x i is the averaged (among all technical and biological replicates) intensity level for peptide i for the first treatment-control combination, and y i is the corresponding value for the second treatment-control combination.
• distances biosub significant pearson.txt-For each pair of treatment-control combinations, contains the value (1 -Pearson correlation) for that pair. This is calculated using the cor function in R with method = "pearson". As with the Euclidean distance, only peptides for which the P-value from the paired t-test is less than the user-specified threshold are used in the calculation, and the resulting value is divided by the number of significant peptides so that different pairs of treatment-control combinations can be compared on the same scale.

hierarchical clustering
Contains files relating to the hierarchical clustering of the samples and peptides. These files are constructed using the distance metric and linkage method chosen by the user, with the defaults being (1 -Pearson correlation) and McQuitty linkage, respectively.
• bootstrap dendrogram.pdf-Contains a dendrogram depicting the hierarchical clustering of the samples, with bootstrap values as calculated using the R package pvclust.
• heatmap.pdf-Contains a heatmap wherein the columns represent samples, the rows represent peptides, and the color of the cells represent degree of up-phosphorylation (red) or down-phosphorylation (green). The top dendrogram represents the clustering of the samples, and the left dendrogram represents the clustering of the peptides.
• heatmap.sample dendrogram.txt-A text-based version of the sample dendrogram depicted in the file heatmap euclidean average.pdf.
• heatmap.peptide dendrogram.txt-A text-based version of the peptide dendrogram depicted in the file heatmap euclidean average.pdf.

hierarchical clustering biosub
This directory will be present only if a file is uploaded for the "treatment-control combinations" field.
Contains files relating to the hierarchical clustering of the treatment-control combinations and peptides. These files are constructed using the distance metric and linkage method chosen by the user, with the defaults being (1 -Pearson correlation) and McQuitty linkage, respectively.
• bootstrap dendrogram biosub.pdf-Contains a dendrogram depicting the hierarchical clustering of the treatment-control combinations, with bootstrap values as calculated using the R package pvclust.
• heatmap biosub.pdf-Contains a heatmap wherein the columns represent treatment-control combinations, the rows represent peptides, and the color of the cells represent degree of up-phosphorylation (red) or down-phosphorylation (green) after biological subtraction. The top dendrogram represents the clustering of the treatment-control combinations, and the left dendrogram represents the clustering of the peptides.
• heatmap biosub.sample dendrogram.txt-A text-based version of the sample dendrogram depicted in the file heatmap biosub.pdf.
• heatmap biosub.peptide dendrogram.txt-A text-based version of the peptide dendrogram depicted in the file heatmap biosub.pdf.

Contains files giving various intermediate results as the data are processed by PIIKA 2.
• step1 raw data.txt-Contains the raw intensity data for each peptide for each array (foreground and background values), identical to the file uploaded by the user in the "Main input file" field.
• step2 background corrected.txt-Contains the intensity value for each peptide for each array after subtracting the background from the foreground.
• step3 vsn.txt-Contains the normalized intensity value (normalization using the vsn method) for each peptide for each array.
• step4 rearranged.txt-Contains the same data as in step3 vsn.txt, except the matrix has been rearranged such that all of the intensity values corresponding to a particular peptide are in the same row.
• step5 averages.txt-Contains the average normalized intensity value for each treatment for each peptide.
• step5 averages.consistent.txt-Contains the average normalized intensity value for each treatment for each peptide that was consistent for all arrays according to the chi-square test (if applicable), and for all animals according to the F-test (if applicable).
• step6 biosub averages.txt-For each treatment-control combination, this matrix contains the subtracted value (average value for treatment minus average value for control) for each peptide. This file will be present only if a file is uploaded for the "treatment-control combinations" field.
• step6 biosub averages.consistent.txt-For each treatment-control combination, this matrix contains the subtracted value (average value for treatment minus average value for control) for each peptide that was consistent for all arrays according to the chi-square test (if applicable), and for all animals according to the F-test (if applicable). This file will be present only if a file is uploaded for the "treatment-control combinations" field.

scatterplots
For each pair of samples, contains a scatterplot depicting the averaged normalized intensity for each peptide for each sample in that pair.
• <sample1> vrs <sample2>.pdf-A scatterplot depicting the relationship between the averaged normalized intensity values for sample 1 and the averaged normalized intensity values for sample 2.

scatterplots biosub
For each pair of treatment-control combinations, contains a scatterplot depicting the averaged normalized intensity for each peptide for each treatment-control combination in that pair.
• <treatment-control combination1> vrs <treatment-control combination2>.pdf-A scatterplot depicting the relationship between the averaged normalized intensity values for the first treatmentcontrol combination and the averaged normalized intensity values for the second treatment-control combination.

t-tests
Contains files relating to the statistical significance of differences in phosphorylation between each treatment and control.
• <sample1> vrs <sample2>.all.txt-A table in tab-delimited text format giving various statistical measures of the difference in phosphorylation of each peptide in sample 1 (treatment) versus sample 2 (control). The peptides are sorted in order of increasing P-value (where this P-value is the smaller of the P-value for up-phosphorylation or down-phosphorylation). The first row contains column headings, the meanings of which are described below.
-ID-The name of the protein from which the peptide is derived.
-Accession-The accession number of that protein.
-FC-The fold-change value for the peptide in the treatment versus the control.
-P up-The P-value for up-phosphorylation in the treatment compared to the control according to the paired t-test.
-P down-The P-value for down-phosphorylation in the treatment compared to the control according to the paired t-test.
-Beta up-The value of β for up-phosphorylation in the treatment compared to the control.
-Beta down-The value of β for down-phosphorylation in the treatment compared to the control.
-Negative predictive value up-The negative predictive value for up-phosphorylation in the treatment compared to the control.
-Negative predictive value down-The negative predictive value for down-phosphorylation in the treatment compared to the control.
• <sample1> vrs <sample2>.consistent.txt-The same as <sample1> vrs <sample2>.all.txt, except lists only peptides that are consistently phosphorylated in both the treatment and the control (if the χ 2 test was done), and which were consistently phosphorylated among the biological replicates for both treatment and control (if the F test was done). This file will be present only if one or both of the "Perform chi-square test?" or "Perform F test" options are set to "Yes".
• <sample1> vrs <sample2>.volcano.pdf-A volcano plot, which is a scatterplot with fold-change values on the x-axis and P-values on the y-axis.
• <sample1> vrs <sample2>.positive predictive value.txt-Contains the positive predictive value for this treatment-control combination (which is the same for all peptides).

technical reproducibility
This directory will be present only if the "Perform chi-square test?" option is set to "Yes".
Contains files relating to the technical reproducibility of the array data (i.e. the consistency of the phosphorylation signal for identical peptides replicated multiple times on the same array).
• chi square test consistent peptides.txt-For each peptide, its value will be "TRUE" if that peptide is consistent according to the χ 2 test for all arrays, and "FALSE" otherwise.
• chi square test pvalues.txt-Contains the P-value according to the χ 2 test for each peptide for each array.
• technical reproducibility summary.txt-Gives the number of peptides on each array that were technically consistent according to the χ 2 test for each array, as well as the range of values and average of these values.

random trees
This directory will be present only if the "Perform random tree analysis?" option is set to "Yes".
Contains files related to the random tree analysis described in the main paper, which seeks to answer the question, "Do the samples cluster together better than would be expected by chance?". These files are constructed using the distance metric and linkage method chosen by the user, with the defaults being (1 -Pearson correlation) and McQuitty linkage, respectively.
• heatmap random <n>.averages.txt-For the nth random dendrogram, contains the randomly-rearranged matrix used to generate that dendrogram.
• heatmap random <n>.sample dendrogram.txt-For the nth random dendrogram, contains a textbased version of that dendrogram.
• heatmap random tree pvalue.txt-Contains the P-value, which indicates the likelihood that the clustering of the actual tree (the dendrogram found in the hierarchical clustering directory) was better than would be expected by chance. The P-value is calculated as the proportion of random trees that got scores equal to or greater than the score for the actual tree.
• heatmap random tree scores.txt-Lists the score associated with each random tree.

peptide subset analysis
This directory will be present only if the "Perform peptide subset analysis?" option is set to "Yes".
Contains files related to the peptide subset analysis described in the main paper, which seeks to answer the question, "What subsets of the peptides give perfect or near-perfect clustering of the samples?". These files are constructed using the distance metric and linkage method chosen by the user, with the defaults being (1 -Pearson correlation) and McQuitty linkage, respectively.
• best set <n>.heatmap.pdf-Contains a heatmap generated using the n peptides found to have the best tree score.
• best set <n>.peptides.txt-Contains the n peptides found to have the best tree score.
• best set <n>.sample dendrogram.txt-Contains a text-based version of the sample dendrogram generated using the n peptides found to have the best tree score.
• best set <n>.score.txt-contains the best tree score when using n peptides.