Time course analysis of large-scale gene expression in incised muscle using correspondence analysis

Studying the time course of gene expression in injured skeletal muscle would help to estimate the timing of injuries. In this study, we investigated large-scale gene expression in incision-injured mouse skeletal muscle by DNA microarray using correspondence analysis (CA). Biceps femoris muscle samples were collected 6, 12, and 24 hours after injury, and RNA was extracted and prepared for microarray analysis. On a 2-dimensional plot by CA, the genes (row score coordinate) located farther from each time series (column score coordinate) had more upregulation at particular times. Each gene was situated in 6 subdivided triangular areas according to the magnitude of the relationship of the fold change (FC) value at each time point compared to the control. In each area, genes for which the ratios of two particular FC values were close to 1 were distributed along the two border lines. There was a tendency for genes whose FC values were almost equal to be distributed near the intersection of these 6 areas. Therefore, the gene marker candidates for estimation of the timing of injuries were detectable according to the location on the CA plot. Moreover, gene sets created by a specific gene and its surrounding genes were composed of genes that showed similar or identical fluctuation patterns to the specific gene. In various analyses on these sets, significant gene ontology term and pathway activity may reflect changes in specific genes. In conclusion, analyses of gene sets based on CA plots is effective for investigation of the time-dependent fluctuation in gene expression after injury.


Introduction
The time course of wound healing or the estimation of wound age is one of the most important research subjects in forensic pathology. A number of research projects have been performed, and the majority of them have dealt with the timing of dermal injuries [1]. Skeletal muscles are, like skin, distributed throughout the whole body, and often affected in cases of fatal and serious injuries such as stab wounds and incisions. In skin wound models, different molecular diagnostic techniques have been used to evaluate the usefulness of many markers for estimating wound age, including cytokines and chemokines. However, the aging of wounds of skeletal a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 muscle caused by sharp force injuries has not been well studied [2]. In previous studies, we devised a mouse skeletal muscle incision injury model and investigated the time-dependent dynamics of some cytokines that were selected based on the results of DNA microarray analysis of specimens 12 h post-injury [2,3]. RNA and protein expressions of the specific cytokines during 6 to 48 h after injury were examined using quantitative reverse transcription PCR (qRT-PCR) and bead-based immunoassay, and some of those molecules were considered as possible markers for estimating wound timing [2,3]. However, microarray analysis showed that an enormous number of biochemical markers were expressed during the early phase of wound recovery. The time course of the vast majority of these genes and the interactions between them remain unclear.
Correspondence analysis (CA) of gene expression has been employed in several microarray data analyses. Fellenberg et al. [4] obtained microarray data from Spellman et al. [5] in which they arrested the cell cycle using several experimental methods and analyzed the relationship between gene expression and each method by CA. The results showed that CA could reveal both relationships among either genes or hybridizations and between genes and hybridizations. In their insulin administration study on diabetic patients and normal controls, Tan et al. [6] found that CA of microarray data from Hansen et al. [7] could successfully divide each time point score into components dependent and independent on the disease status. The purpose of CA is to convert into a simpler data matrix without losing important information from the original data, to clarify the structure of a complex data matrix, and further to present the processing result visually [8]. In microarray research, CA can summarize data of each gene (rows) of each sample (columns) of originally high-dimensional data matrices in a low-dimensional projection as well as principal component analysis (PCA) [9]. CA forms a biplot in which rows and columns are simultaneously projected to subspaces of two or more dimensions, which reveals the association between them.
In this study, we obtained microarray data of incision injury samples of mouse skeletal muscle at 3 different time points post-injury. To visualize the time course fluctuation in gene expression on a plot, and to examine large-scale data using various analytical methods, CA was carried out on microarray data that was converted to a matrix (data type is fold change (FC) values) consisting of data of each time point (columns) by each gene (rows) as variables. Clustering a large number of genes should enable further exploration of injury time markers.

Animal treatment for DNA microarray
Muscle samples were obtained and processed as described in a previous report [2]. Pathogenfree 8-week-old male BALB/c mice were divided into 4 groups (control, 6, 12, and 24 hours (h) after injury: n = 4 each). After nasal anesthesia of mice with isoflurane, the skin on the dorsal side of the left hind limb was shaved, and about a 1-cm incision was made on the skin using sterile straight stainless-steel scissors. Subsequently, a 5-mm incision was made in the biceps femoris muscle, and then the skin incision was closed using a silk suture. After surgery, the animals were allowed free access to food and water. At 6, 12, and 24 h after injury, mice were euthanized with a high concentration of carbon dioxide gas, and then a 3-mm thick sample of injured muscle tissue (about 30 mg) with the injury in the center was excised. As a control sample, biceps femoris muscle was collected from an uninjured mouse that was euthanized without making the injury. The animal experiment was approved by the Nagoya City University (NCU) animal ethics committee (authorization numbers: H25M-22 and H26-M02), and conducted according to the principles of laboratory animal care, and the guidelines for animal experimentation, NCU [10,11].

RNA extraction and DNA microarray
RNA was extracted as described in a previous report [2]. The samples were homogenized using a Taitec bead crusher (TAITEC Co., Saitama, Japan) at 2,500 rpm for over 30 sec. Samples of 6 and 24 h were outsourced to perform the microarray analysis (Oncomics Co. Ltd., Nagoya, Japan). Data of the microarray analysis of the control and 12 h groups had been collected in a previous study [2]. The RNA samples subjected to the present microarray hybridization had a concentration in the range of 29.78 to 281.45 ng/μL. The microarray was scanned using a DNA microarray Scanner (G2505C; Agilent Technologies, Santa Clara, CA).

Normalization and quality control
Microarray data were deposited in the GEO database (NCBI Accession number: Series GSE140517). The signal value of each gene was normalized in the following four steps (TOHOKU CHEMICAL Co., Aomori, Japan). 1) When the signal value was lower than background (negative value), it was adjusted to 1, which meant that gene did not express. 2) The geometric mean of the values of 4 samples at each time was calculated as the representative value of the gene. 3) The signal values were converted to a base 2 logarithm. 4) In order to correct experimental errors between microarrays, the value of the 75th percentile of all gene signals was subtracted from that of each gene with respect to each time point under the assumption that the expression levels of most genes did not fluctuate. Quality control of microarray signal data was performed using settings recommended by Agilent Technologies. In addition, based on the flag information output from Feature Extraction Software v11 (Agilent Technologies), the features were evaluated with five flags, namely "saturated", "uniform", "positive and significant", "well above background", and "population outlier". The results were interpreted as "Not Detected (NDt)" when the flag was "not positive and significant" or "not above background", and as "Compromised (Cm)" when it was "saturated", "not uniform", or "population outlier". All other flags were considered to be compatible with "Detected (Dt)". The feature of a gene was determined as "Dt" only when flags of all 16 arrays (4 samples × 4 time points) were "Dt", and as "NDt" when at least one "NDt" was included in those of all arrays. Also, if at least one "Cm" flag was present, the feature was determined as "Cm". Genes containing only the features determined as "Dt" and "NDt" were used for analysis. The FC of gene expression was calculated using normalized non-logarithmic signal values of each gene of the control and each time point [12].
control. Fisher's exact test (one-sided test) was employed to examine whether the extracted gene sets contained significant numbers of genes prepared from known information (GO terms). Also, the similarity between the extracted gene sets and the gene lists classified according to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway information was investigated in the same manner. Furthermore, differences between the average fluctuation of expression of all genes included in each of the KEGG pathways and that of all genes that passed quality control were statistically examined with parametric analysis of gene set enrichment (PAGE) [13].

Corresponding analysis and distance calculation
The intensity ratio between the control and each time was calculated using the signal values of the genes that passed quality control (TOHOKU CHEMICAL). An arctangent function was applied to the reciprocal of this ratio [9], which was converted to the radian value in a range of 0 to π/2. Compared to the conventional logarithmic transformation, this conversion method reduces the variance when the intensity ratio is > 10 or < 0.1. The radian value was further converted to degrees (0˚to 90˚). A matrix (data in degrees) of which variables consisted of 3 time series by each gene was prepared, and a biplot was created with scores of two principal components of each gene and time series obtained by CA as two-dimensional coordinates (x, y). Furthermore, two kinds of (Euclidean) distances from each coordinate of the biplot were calculated: Distance 1: The distance between each gene and each time series score; Distance 2: The distance between each gene and the top 5 query genes whose expression was upregulated or downregulated the most at each time point (Table 1). Several sets of top 100, 300, and 1,000 genes close in distance to the principal components score of each time series or query gene were arranged under each setting, and GO and pathway analyses were performed as described above.
In all statistical tests, the obtained results were subjected to multiple test correction according to the Benjamini-Hochberg method, and differences were considered to be significant when the corrected p < 0.05 [12].

DNA microarray analysis
As a result of the quality control, microarray data of a total of 55,527 genes were available for further analyses. At 6 h post injury, the expressions of 7,212 genes were significantly upregulated compared with that of the control (0 h), as were 5,361 at12 h, and 6,675 at 24 h. The numbers of significantly downregulated genes were 11,746 at 6 h, 18,956 at12 h, and 12,421 at 24 h. The number of genes showing each fluctuation pattern among 27 categories according to the fluctuation direction (up-or downregulated, or unchanged) at 3 time points is listed in Table 2. The most common pattern was that of insignificant fluctuation throughout the time, which included 22,872 genes or about 40% of all genes. As for genes with more than 3-or 5-fold change, however, most of them were upregulated. The number of genes with more than 3-or 5-fold downregulation was small until 12 h post-injury, then increased at 24 h (Table 3).

Gene ontology analysis and pathway and gene set analysis
The most expressed category in the upregulated gene sets was GO terms in "biological process (BP)" followed by "cellular component (CC)" and "molecular function (MF)" (Table 4, S1 Table). The downregulated sets had smaller number of significant GO terms than the upregulated sets, although they were increased in number at 24 h. We mainly focused on the GO terms that were related to the processes of inflammation and wound healing involving myoblasts because we assumed that making the incision would initiate such processes. The "BP" category in the upregulated set mainly showed GO terms associated with inflammatory response, cytokines, and wound healing ( Table 5). The GO terms associated with skeletal muscle and myoblasts (e.g., GO: 0035914, GO: 0045445) were significant mainly among the 3or 5-fold upregulated set at 6-12 h. For example, Myod1, which is a gene involved in myotube differentiation in mice and was annotated with GO terms as described above [14], was more than 5-fold upregulated at 6 and 12 h. GO terms that were significant in "CC" and "MF" categories were primarily associated with cell membranes, or receptor activities and binding of cytokines, chemokines, and immunoglobulins. Pathways with significant differences were also predominantly selected in the upregulated gene sets (Table 4), and they were mainly related to inflammatory reactions and cytokines such as mmu04060 (Cytokine-cytokine receptor interaction) (S1 Fig). Several infection pathways such as mmu05140 (Leishmaniasis) were also involved. GO terms that were significant at all time points in both more than 3 and 5-fold upregulated sets included 647 of "BP", 17 of"CC", and 32 of "MF", whereas 17 pathways were shared by both sets (Table 6). Again, most of them were GO terms associated with inflammatory response, cell membrane, and cytokine activity, and other pathways related to inflammatory response. In the downregulated sets, significant pathways were only found in the > 3-fold set at 24 h and were mainly associated with myocardial disease and the endocrine system. Many pathways associated with the inflammatory response were indicated as significant also by PAGE at all time points.

Corresponding analysis
Row scores of 55,527 genes by column scores of 3 time series were computed by CA, and a biplot was created (Fig 1). The cumulative contribution rate reached 100% up to the second principal component (Table 7). Assuming a triangle formed by three time series coordinates (t6-24h), all genes were distributed in an overlapping, similar, and enlarged triangular area   (tAg). Genes of which the FC value was larger than 1 at each time point were relatively evenly distributed within tAg (Fig 2a-2c). However, most genes with an FC value of 5 or more at each time point tended to gather near the vertex of tAg. On the other hand, genes of which the FC value was smaller than 1 at each time point appeared to be distributed around the particular time series scores (Fig 2d-2f). In particular, most genes with an FC value of 0.2 or lower at each time point were localized to a narrower range around the time series score than genes with an FC value of 0.2 to 1. Genes formed fairly clear clusters on the plot according to whether each of 3 FC values was greater than 1 (S2 Fig). In other words, all genes were classified into 27 categories according to the fluctuation patterns at each time point. Genes significantly upregulated at a particular time point were distributed throughout the region of tAg, but appeared to be gathered around the other two time series scores (Fig 3a-3c). In contrast, those that were significantly downregulated at a particular time point were predominantly gathered around the same time series scores (Fig 3d-3f). There were more downregulated genes than upregulated ones at each time point, but the distribution areas were narrower in the downregulated genes. In addition, genes that were upregulated significantly at all time points were distributed all around tAg, whereas genes without fluctuation throughout time points were mainly distributed within the inscribed circle of tAg (Fig 4a and 4b). Genes that were downregulated at all time points were mainly Table 4. The number of GO terms belonging to each GO category and pathways that were significant in the GO and pathway analysis of each set with more than 3or 5-fold fluctuation.

Category
Fold Down 0 0 0 � The number of significant terms or pathways in the GO and pathway analysis of combined sets of the up-and downregulated genes at each time point ("Total" set in Table 3).
https://doi.org/10.1371/journal.pone.0230737.t004 Table 5. The number of genes belonging to specific GO terms in the GO analysis of each set with more than 3-or 5-fold fluctuation.

GO ID GO term 3-fold 5-fold
Both � Upregulated Downregulated Both Upregulated  distributed in the narrower region inside t6-24h (Fig 4c). Because genes that were significantly downregulated or without fluctuation throughout time were distributed at the center of tAg, there was a tendency for many genes with large FC values at each time point to be distributed near the vertex. However, some genes even with small FC values were also located near the vertex. In addition, genes showing other fluctuation patterns were distributed in a characteristic manner. For example, there were some genes that were significantly upregulated at two time points and unchanged at the other time point, or had one time point each where they were significantly upregulated, downregulated, and without fluctuation. Among them, genes with the same fluctuation direction were plotted at symmetrical positions to each other across the center of tAg according to the combination of fluctuation behavior at each time point (Fig 4d and  4e and S3 Fig).

Downregulated 6 h 12 h 24 h 6 h 12 h 24 h 6 h 12 h 24 h 6 h 12 h 24 h 6 h 12 h 24
Based on a report showing that genes with similar expression dynamics tend to be located close to each other on a CA plot [9], the regularity of the distribution was examined in detail.  On the plot, each gene was situated in 6 subdivided triangular areas according to the magnitude relationship of the FC value at each time point (Fig 5a). Each region appeared to be demarcated by 3 straight lines that intersect at one point. The genes for which the FC value at 12 h postinjury (FC12) were smallest and those of 6 h (FC6) were largest (FC12 < FC24 < FC6) were distributed in an area including the intersection coordinates (0, 0) of the first and second principal components; therefore, the 3 lines did not pass through the origin. In each area, genes for which the ratios of two adjacent FC values were close to 1 were distributed along the two border lines (S4 Fig). For example, in the region in which genes with FC6 < FC12 < FC24 were distributed, there was a tendency for genes of which FC6/FC12 was close to 1 to be located at the left end, while those for which FC12/FC24 was close to 1 were at the right end of the region (Fig  5b-5d). Moreover, genes with FC6/FC24 � 1, namely FC6 � FC12 � FC24, were distributed near the intersection of the two straight lines forming the area of A (Fig 5e). As shown in Fig 5a, six subdivided regions of A to F were considered to be separated by three straight lines. Thus, to draw these three approximate straight lines, six genes for which the ratio of two FC values was sufficiently close to 1 were selected under the assumption that they should distribute at the end of each region (Table 8). Based on the coordinates of these genes, three approximate straight lines were created (Fig 6). The detailed relationship between each straight line and each gene was examined and is described in the S1 Appendix.

Analyses of genes close to 3 time series scores (distance 1)
In the gene sets close to the column score coordinates of 6 h, most genes were significantly downregulated or unfluctuating at 6 h (Table 9). There were no upregulated genes except for those with persistent upregulation throughout time. In the top 1,000 set, there were 186 genes whose FC6 > 1, and all of them satisfied FC12 > 1 and FC24 > 1 (Table 10). A similar tendency was observed for the genes sets close to the 12 and 24 h column score coordinates. In each set, significant GO terms and pathways were generally sparse. Among them, relatively many GO terms associated with the organelles in the "CC" category and metabolic processes in the "BP" category were found in the top 1,000 set for 6 h (Table 11). There were no significant pathways in any sets of 6 to 24 h.

Analyses of genes close to query gene scores (distance 2)
The top 5 genes whose expression (FC value) was upregulated or downregulated the most at each time point were selected as query genes (Table 1). For the query genes at each time point, upregulated genes were mainly distributed outside t6-24h on the CA plot, whereas downregulated genes were mainly distributed inside it (Fig 7a-7c). In the query gene upregulated at 24 h, 2 genes of Slpi and Saa3 and 3 genes of S100a8, Cd300lf, and Cxcl5 were distributed apart across the horizontal axis. Although the difference in FC value between Saa3 and S100a8 was smaller than that between Slpi and Saa3 (Table 1), the distance to Saa3 was closer for Slpi than for S100a8 on the plot. This was because the FC values of Slpi and Saa3 satisfied FC6 < FC24 < FC12 (area of B of the Fig 7), while the other 3 genes satisfied FC24 < FC12 < FC6 (area of  Table 1). Table 12 shows the number of GO terms and pathways that were significant in all query gene sets. Similar to the more than 3 and 5-fold-upregulated sets, GO terms related to inflammatory responses and cytokines were significant in many upregulated query gene sets such as Cxcl5. The significant GO terms of "BP" and "MF" series of Slpi and Saa3 were fewer compared with other upregulated query genes, and those of "CC" was relatively large in number (Table 12). Among the sets of the downregulated query genes, there were smaller number of significant GO terms than those of the upregulated genes, at most 154 in the top 1,000 set of Tmem 233. Pathway analysis also detected those related to inflammatory responses and cytokines in many upregulated query gene sets. However, few pathways were detected with the set of Slpi, Saa3, and downregulated query genes. The details on each query genes were examined and described in the S2 Appendix.

Discussion
In previous studies, the post-injury dynamics of 10 cytokines that were selected based on the data from a microarray analysis at 12 h post-injury were investigated with qRT-PCR from 6 to  Table 2). a-c) Gray dots indicate the coordinates of genes with significant upregulation (a), and without fluctuation (b), and with downregulation (c) throughout the time course (presented as (U, U, U), (-, -, -), and (D, D, D) in Table 2, respectively). U, -, and D in the parenthesis are fluctuation statuses at, 6, 12, and 24 h in order. Genes that were upregulated significantly at all time points were distributed over the whole tAg region. Genes without fluctuation throughout the time course were mainly distributed within the inscribed circle of tAg. Genes that were downregulated at all time points were mainly distributed in the narrower region inside t6-24h. 24 or 48 h after injury using the same mouse incised wound model [2,3]. The results of the microarray analyses of the particular genes in this study were essentially consistent with those of qRT-PCR. Thus, we believe that the microarray for each time point in this study had fairly reliable reproducibility.
The basic purpose of most microarray analyses is to elucidate biological processes or pathways that consistently show differential expression between groups of samples [15]. GO annotation and pathway information are important tools in the analysis of microarray experimental  results [16][17][18][19][20]. In many cases, researchers select genes that are upregulated or downregulated more than a specific threshold in a microarray analysis, then perform GO and pathway analysis [20][21][22][23][24][25][26]. However, Yang et al. [27] suggested that genes fluctuating less at a significant level could also be of importance for the understanding of specific reactions. Shen et al. [20] Fig 6. Three straight lines, 1 to 3, calculated from the row scores of the specific genes (see Table 8 Table 9. The number of genes with each fluctuation pattern in the top 100 to 1000 gene sets whose distance was close to the column score of 6 h on the CA plot. performed PCA to visualize time-dependent expression pattern of microarray data of tibialis anterior muscle after peripheral nerve injury in rats. On the other hand, Yano et al. [9] employed CA for microarray analysis to estimate genes related to breast cancer or housekeeping genes by measuring the distances between each gene and artificial marker genes on CA plots, and performed GO analysis on the genes related to breast cancer. They considered that it was better to evaluate genes detected from the entire microarray dataset than to detect and evaluate candidate genes using thresholds. They showed also that up-or down-regulated genes could be predicted only with CA using the arctangent function, and that PCA was not appropriate for the clustering of genes according to their expression patterns using any index [9]. In this study, genes were successfully clustered based on the magnitude relationship of FC values of 3 time series with a CA plot, which effectively visualizes the data of time course experiments. GO and pathway analyses of the query gene set based on the CA plot revealed GO terms and pathways that many genes showing fluctuation patterns similar to the query genes belong to. Most of the upregulated genes shown in Table 1 are involved in inflammation or other immune system functions [28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44]. The large FC values of these genes in our microarray results were considered to be consistent with their bioactivity after injury (except for Gm5483, which is less well studied than other cytokines). In the top 100 to 1,000 gene sets created by designating Cxcl5 as a query gene, there were two types of genes in the set that were included in gene lists of the significant GO terms and pathways, since Cxcl5 is located close to a borderline and around the vertex of tAg as seen in Fig 6. Although the number of genes was small near the vertex of tAg, many GO terms and pathways were significant. This was due to the paucity of significantly downregulated genes. Most of the genes in the set were those with relatively large upregulation at each time point (Figs 2, 3 and 4a-4c). Therefore, almost all GO terms and pathways that were significant in these gene sets were considered to be upregulated

PLOS ONE
Correspondence analysis of gene expression in incised muscle at each time point. However, there were few significant GO terms and pathways in the Ly6f sets that were mapped near the vertex of tAg (Fig 7d and Table 12), suggesting that there were few GO terms and pathways to which the gene with the highest expression at 12 h belonged. On the other hand, the top 300 set of Slpi around which more genes were located than the vertex areas was composed of genes of only a single area (area B (FC6 < FC24 < FC12) shown in Fig 6). Even in the center of tAg, unless a gene located very close to point P is selected as the query gene, the gene set will not include more than two areas in Fig 6. In the central part of tAg, we assumed that many genes had few or no GO term annotations, or that these genes had various and divergent GO term annotations. Therefore, the number of GO terms that were significant in the gene set at the central part of tAg was small, but those terms could have various characteristics in each gene set. Analysis of gene sets based on querying genes can reveal GO terms and pathways that contain many genes that show a similar or identical fluctuation pattern as the query gene. Many of the upregulated query genes were located in area D on the CA plot (Table 1, S6 Fig). Among those genes, Cxcl5 was located close to straight line 1 and far from straight lines 2 and 3 on the CA plot (S6 and S7 Figs). Because each line divided the magnitude of the relationship of the FC values, we estimated that the values of FC6/FC24 and FC12/FC24 of Cxcl5 were large, whereas that of FC6/FC12 was small (In fact, their values were 13.9, 9.9, and 1.4, respectively, calculated from Table 1). Therefore, Cxcl5 and the genes located around it on the CA plot can be gene markers that were upregulated significantly in the early phase after injury.
FC6 of Gm5483, Ccl4, and Il-1β was smaller than that of Cxcl5, but their FC6/FC24 and FC12/FC24 were larger than those of Cxcl5, reflecting their farther location from straight lines 2 and 3 than Cxcl5 (S7 Fig). Ccl4 is involved in myoblast proliferation after skeletal muscle injury [45]. Il-1β is a cytokine produced by macrophages and myogenic cells [46,47], and is upregulated after tissue injury [32][33][34]. Gm5483, a synonym for Cstdc4, had a small number of GO terms such as "cysteine-type endopeptidase inhibitor activity" [14], but its function in wound healing has not been studied in detail. Considering that its FC value was maintained more than 100 even at 24 h after injury, Gm5483 was seemed to play a certain role in both the inflammatory response of early phase and the muscle repair of late phase after injury. Because of their high FC6 to FC24 ratio, these genes may be useful for distinguishing different times after injury. On the other hand, S100A8 and Cd300lf were located closer to straight lines 2 and 3 than Cxcl5 on the CA plot (S8 Fig), and the FC6/FC24 or FC12/FC24 of these genes was small compared to Cxcl5. S100A8 is expressed in differentiating suprabasal wound keratinocytes [48], and S100A8 mRNA levels are upregulated in skeletal muscle tissue during both Il-6 Table 12. The number of GO terms belonging to each GO category and pathways that were significant in the GO and pathway analysis of the top 100 to 1000 gene sets whose distances were close to each query gene on the CA plot. Biological process  Cellular component  Molecular function  Pathway   100 set  300 set  1000 set  100 set  300 set  1000 set  100 set  300 set  1000 set  100 set  300 set  1000 set   Cxcl5  468  391  1374  0  5  14  23  27  100  18  15  30   Gm5483  468  391  1370  0  5  14  23  27  100  18  15  30   Ccl4  468  390  1368  0  5  14  23  27  100  18  15  30   Il-1β  468  391  1370  0  5  14  23  27  100  18  15  30   S100a8  247  505  1266  0  4  13  4  27  92  1  5  17   Clec4d  47  310  1351  0  2  14  7  20  100  1  5  infusion and exercise [49]. Also, Cd300lf induces cell death and promotes phagocytosis [30,31]. These genes may be early, upregulated markers in wounds, although they are unsuitable for distinguishing wounds created within 24 h. Clec4d, which is expressed in monocytes and can induce phagocytosis and proinflammatory cytokine production [28], was located in area C, but close to S100a8 and Cd300lf in area D on the CA plot (S8 Fig), and the values of FC6/FC24 and FC12/FC24 were also small. Therefore, Clec4d may be a gene marker, similar to S100a8 and Cd300lf. Slpi located in area B is expressed by macrophages and neutrophils [36]. The absence of Slpi leads to delayed wound healing, an increased and prolonged inflammatory response, enhanced elastase activity, and delayed matrix accumulation [37]. Saa3 is mainly secreted from the liver and macrophages, stimulates toll-like receptor 4 activity, and induces NF-κB activation, a hallmark of inflammation [41][42][43]. On the CA plot, these genes were located near point P compared to the other upregulated query genes (S9 Fig), and each FC value had a relatively small difference. Slpi and Saa3 may be late markers that were upregulated in the wounds. However, these genes may be more useful as negative markers at the time point when expression returns to baseline. Further microarray investigations after 24 h are needed.

Query gene
In addition, Cd72, which was not designated as a query gene in this study, was located in area A at coordinates (−0.014, 0.558) on the CA plot and is a negative regulator of B-cell responsiveness (S9 Fig) [50]. Cd72 was located near the intersection of the line connecting Arg1-Ly6f and line 1 and far from straight lines 2 and 3, similar to Cxcl5. However, the foldvalues of FC6, FC12, and FC24 were 0.485, 0.611, and 5.266, respectively, which were opposite those of Cxcl5. Therefore, Cd72 and the genes located around it on the CA plot may be gene markers that were upregulated significantly in late wounds. Furthermore, Cd72 was downregulated at 6 and 12 h, unlike Cxcl5, which was upregulated at all time points, suggesting that Cd72 can be used as a marker that is downregulated early after injury. Cd72, a negative regulator of B-cell responsiveness, would have been downregulated until 12 h to promote the inflammatory response, and then turned to upregulation at 24 h to inhibit the excessive inflammatory response. Thus, the CA plot helps identify gene marker candidates that are useful for estimating the timing of injuries.
Most genes that showed a more than 5-fold upregulation in expression had GO terms associated with inflammatory reactions, cytokines, and wound healing. Among them, the GO terms associated with skeletal muscle and myoblasts were mainly significant at 6-12 h of more than 3-fold upregulated sets, which would be an early phase reaction to skeletal muscle injury. On the other hand, some GO terms such as muscle cell differentiation (GO: 0042692) were significant in the downregulated sets. This GO term was not significant in the more than 3-fold upregulated set at 12 h, which contained 47 genes belonging in the term, whereas it was significant in the more than 3-fold downregulated set containing 35 genes (Table 5). This would be due to the difference in the total number of genes in the sets; therefore, it was somewhat difficult to determine whether upregulation or downregulation was dominant. Nonetheless, at least for muscle-related GO terms, the downregulated genes appeared to increase in number at 24 h. In the "CC" category, GO terms associated with the cell membrane were significant, which should reflect, for example, cytokine receptor activation of the cell membrane by inflammation. Also, the GO terms and pathways that were significant in common with all upregulated sets suggests that many genes associated with inflammatory reactions and cell membranes maintained high expression (more than 5-fold) up to 24 h after injury. Only a small number of significant GO terms were present in the downregulated sets; several reasons for this are possible (Table 4). Some sets contained many genes with few or no GO term annotations in the set, or had various and divergent GO term annotations. There were also fewer significant pathways in the downregulated sets, presumably because for similar reasons as for the GO terms (Table 4). In the gene set that showed more than 3-fold downregulation at 24 h, pathways related to myocardial disease and the endocrine system were significant. Many genes related to or characteristic of skeletal muscle injury were also included in the myocardial disease pathway. However, this pathway was only significant at 24 h and is thus less important for the investigation of gene expression after skeletal muscle injury.
Although some GO terms and pathways were significant at a specific time point, when considering the fold changes of gene sets, we consider that analysis with these sets will be insufficient for studying the time-dependent dynamics of gene expression. In GO and pathway analyses of gene sets that changed more than 3-and 5-fold, each analysis was performed on FC values at a single time point. However, some genes were not statistically significant at that time point even though the FC value was up-or down-regulated more than 5-fold, and these were not included in the sets. Also, the large differences in the number of genes belonging to each gene set may make comparisons of each set difficult.
In gene set analysis based on query genes, all genes that pass QC are included in the set, thus there is no possibility that important genes would be excluded by statistical processing. In addition, since the number of genes in each set is all adjusted, it may be convenient to compare GO terms and pathways that were significant among each set. Moreover, GO term and pathway fluctuations throughout the time period can be estimated more easily than by using FC values as thresholds. In the CA plot of this study, genes with FC values far from 1 tended to gather around each vertex or point P of tAg, so it was easy to identify GO terms and pathways that were greatly upregulated or downregulated at each time point. However, for example, not only genes with FC24 � 1, but also genes with FC24 < 1 are included in the Cxcl5 set, thus it is necessary to confirm whether significant GO terms or pathway activity were upregulated or downregulated at 24 h compared to the control. Analyses of gene sets based on CA plots will be useful for investigating time-dependent fluctuations after injury and may compensate for some problems in the gene set based on fold changes.
In order to elucidate factors that could contribute to the formation of the gene sets other than the gene score, we investigated the characteristics of significant GO terms and pathways in the gene sets related to the time series scores. On the biplot, the 3 time series scores located in the area where genes with the lowest FC value at a particular time point (for example, the coordinate score of 6 h was located in area b in Fig 6), and genes with large FC values were sparse in the neighborhood of the time series scores (Fig 2a-2c). As a result, the genes contained in each set should have been unchanged in expression throughout time or downregulated only at particular time points, and there were only a small number of significant GO terms in the gene sets. We estimated that genes with smaller FC values would tend to cluster around the particular time series score (Fig 2d-2f), and that significant GO terms would show a strong decrease in activity at a specific time.
FC values were converted with the arctangent function and indicated as degrees. Therefore, the value of 1 was represented as 45 degrees, and the converted value shifted from 0 to 90 as the FC value became smaller. Because CA was performed on these values of degrees, genes with a small FC or a high degree value at each time point were dominant in the vicinity of the particular time series score (Fig 2d-2f). In particular, genes with an FC value less than 0.2 at each time point were distributed radially from point P toward each time series score, and those at all time points were localized around point P (S5 Fig). Each time series score seemed to be located a little away from point P in the vicinity of the line that bisected the area where the genes with the highest degrees at the same time point gathered. In contrast, genes with a small degree or a large FC value at each time point were distributed throughout the area of tAg (Fig  2a-2c). Further examination is needed to clarify the positional relationship between them, and the time series scores need further examination.

Conclusion
Visualization with CA, which can cluster genes with similar expression dynamics, is an informative method of time course analysis of cytokines. GO and pathway analysis can be applied on selected genes from the plot based on a few marker genes. In this study, GO terms and pathways related to inflammation, muscle, and so on were significant in some query gene sets, so it is useful to analyze microarrays by referring to gene expression dynamics on CA plots. One limitation of this experimental model, which follows the principle of reduction, is that the effects of skin injury could be only assessed at 0 h or in the control. Thus, the post-injury results should include interactions with skin wounds to a certain extent.
Supporting information S1 Fig. Pathway map mmu04060 (Cytokine-cytokine receptor interaction) that was significant in some gene sets. This pathway was also significant in the top 1,000 gene set whose distances are close to Cxcl5 on the CA plot. In the Cxcl5 gene set, the FC values of many genes belonging to this pathway are more than 1 throughout time course (shown in red filled frame), but FC24 of some genes is less than 1 (red frame). This is because Cxcl5 was located near the vertex of tAg where genes with large FC values tended to gather, and this set consisted of genes distributed in the area of C and D in Fig 6 (FC24 Table 2) other than the patterns shown in Fig 4a). Plot of genes that have two time series significantly upregulated and one time series significant downregulated.  5; located in areas A to F). For each gene, (highest FC value among the three time points)/(middle FC value) and (middle FC value)/(lowest FC value) were calculated, and the top 100 genes whose ratio was close to 1 were selected and plotted. Blue dots: FC6/FC12 and FC12/FC24 were close to 1 (area A in Fig 5). Gray dots: FC24/ FC6 and FC24/FC12 (area B). Yellow dots: FC24/FC6 and FC12/FC6 (area C). Red dots: FC24/FC12 and FC12/FC6 (area D). Purple dots: FC24/FC12 and FC24/FC6 (area E). Green dots: FC12/FC6 and FC24/FC6 (area F). In each area, selected genes were distributed along the two border lines. Black dots indicate 3 column scores.  Table. The number of significant GO terms included in each GO term of the next hierarchy of each GO category in GO analysis of each set in Table 3.