Ten genes are considered as potential biomarkers for the diagnosis of dermatomyositis

Objective This study aimed to identify the biomarkers and mechanisms for dermatomyositis (DM) progression at the transcriptome level through a combination of microarray and bioinformatic analyses. Method Microarray datasets for skeletal muscle of DM and healthy control (HC) were downloaded from the Gene Expression Omnibus (GEO) database, and differentially expressed genes (DEGs) were identified by using GEO2R. Enrichment analyses were performed to understand the functions and enriched pathways of DEGs. A protein–protein interaction network was constructed to identify hub genes. The top 10 hub genes were validated by other GEO datasets. The diagnostic accuracy of the top 10 hub genes for DM was evaluated using the area under the curve of the receiver operating characteristic curve. Result A total of 63 DEGs were identified between 10 DM samples and 9 HC samples. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analysis indicated that DEGs are mostly enriched in response to virus, defense response to virus, and type I interferon signaling pathway. 10 hub genes and 3 gene cluster modules were identified by Cytoscape. The identified hub genes were verified by GSE1551 and GSE11971 datasets and proven to be potential biomarkers for the diagnosis of DM. Conclusion Our work identified 10 valuable genes as potential biomarkers for the diagnosis of DM and explored the potential underlying molecular mechanism of the disease.

Introduction profiling of DM in the database. Finally, two datasets GSE48280 (GPL6244) and GSE5370 (GPL96), including muscle biopsies from 10 patients with DM and 9 HCs, were selected as test sets [15]. Two datasets GSE1551 (GPL96) and GSE11971(GPL96), which included 32 DM samples and 14 HC samples were selected as validation sets [16] ( Table 1). The overall flowchart of this study is shown in Fig 1.

Identification of DEGs
The online web-based tool GEO2R was used to determine the DEGs of patients with DM. The row expression data of GSE48280 and GSE5730 were analyzed. Adjusted P value< 0.05 and | log2 fold change|>1 were set as the threshold and considered significant between the gene expression difference to identify the DEGs between patients with DM and HCs. Online tool Draw Venn Diagram (http://bioinformatics.psb.ugent.be/webtools/Venn/) was used to detect the overlapping DEGs among the two datasets.

Functional and pathway enrichment analysis
R packages (clusterProfile and ggplot2) were used to perform GO enrichment and KEGG pathway analysis for the identified DEGs [17]. ClusterProfile package was used for analysis, and ggplot2 was applied to visualize the results.

Construction of the PPI network
A PPI network for the DEGs was constructed using the STRING database (http://www.stringdb.org/) which is an online tool to identify and predict interactions between genes and proteins (the cut-off standard as a combined score >0.4) [18]. Then, Cytoscape was used to visualize the results. Molecular Complex Detection (MCODE) V1.5.1, a plug-in of Cytoscape, was applied to identify significant modules (MCODE score �4) [19]. Moreover, Cytohubba, which is another plug-in of Cytoscape, was used to study essential nodes in the network. The nodes with high degrees of interaction were considered as hub genes [20].

Statistical analysis
Statistical analysis was performed using Rstudio. The expression levels of 10 identified hub genes were validated by GSE1551 and GSE11971 using an independent sample t test. The ROC-AUC was performed by GraphPad PRISM 7. It was used to compare the diagnostic performance of different hub genes. The different expression levels of the hub genes were searched in the Series matrix file of GSE1551 and GSE11971. P < 0.05 was considered statistically significant.

Identification of DEGs
Two gene expression datasets containing patients with DM and HCs were analyzed with twogroup comparisons (P < 0.01), and then 2458 and 178 DEGs were identified from GSE48280 and GSE5370 respectively. The volcano plots of the two gene datasets are shown in Fig 2A and 2B. After integration of these DEGs by using bioinformatic analysis, a total of 63 common

PLOS ONE
DEGs were identified in patients with DM ( Fig 2C), including 61 upregulated genes and 2 downregulated genes.

Functional enrichment analysis of DEGs
The 63 overlapping DEGs were analyzed using GO and KEGG enrichment (Fig 3). Based on GO enrichment, the biological process acts primarily on influenza A, measles, and hepatitis C. These proteins are primarily located in blood microparticle, proteasome core complex, and proteasome core complex, beta subunit complex. With regard to molecular functions, these proteins play a role in double-stranded RNA binding, single-stranded RNA binding, and threonine-type endopeptidase activity. KEGG pathway analysis presents that these proteins are primarily involved in response to virus, defense response to virus, and type I IFN signaling pathway.

PPI network analysis, MCODE cluster modules and hub gene identification
After importing the overlapping genes into STRING, the PPI network for the 63 DEGs was constructed ( Fig 4A). The network provides information on the interactions among different genes, and it helps to mine the underlying mechanisms. Based on the information in the STRING database and Cytoscape, PPI relationships were obtained, and the hub genes in the network were identified ( Fig 4B). The top 10 hub genes with high degrees were detected, including myxovirus (influenza virus) resistance 1 (MX1), myxovirus (influenza virus) resistance 2 (MX2), radical S-adenosyl methionine domain containing 2 (RSAD2), STAT1, IFN-  Node color reflects the degree of connectivity. Red represents a higher degree, and yellow represents a lower degree. Two cluster modules extracted by MCODE. Cluster 1(C) had the highest cluster score (MCODE score = 27.2), followed by cluster 2 (D) (MCODE score = 4). https://doi.org/10.1371/journal.pone.0260511.g004

Verification of the identified ten hub genes by datasets GSE1551 and GSE11971
The GSE1551 and GSE11971 both from the GPL96 platform, which included 32 DM samples and 14 HC samples, were used to verify the expression of 10 identified hub genes. The R package ggplot2 was applied to draw boxplots. In addition, the R package ggpubr was used to perform Student's t test statistical analysis. Based on the results, the mRNA expression levels of the 10 hub genes in the DM samples were significantly increased (P < 0.01, Fig 5).

ROC curves of the ten hub genes in DM samples
GraphPad PRISM 7 was used to analyze the expression profiles of 10 hub genes and draw the ROC curves. AUC is an effective and combined measure of sensitivity and specificity for assessing inherent validity of a diagnostic test [21]. The different expression levels of the hub genes were searched in the Series matrix file of GSE1551 and GSE11971. The exact values are presented in the table and imported into the software. The sensitivity, specificity, cut-off value, and AUC are calculated by GraphPad PRISM 7 and listed in Table 2. When the AUC is over 0.8, the sensitivity and specificity are over 80%, indicating that the biomarker has good diagnostic accuracy.  (Fig 6). Given their good diagnostic performance, we hypothesize that the identified hub genes may be potential biomarkers for the diagnosis of DM based on our present samples.

Discussion
In this study, a series of bioinformatic analysis identified 10 hub genes (MX1, MX2, RSAD2, STAT1, IFI35, ISG15, OAS3, OAS1, IFI6, and IFI44L) between skeletal muscle biopsy of DM and HC samples based on the gene expression profiles obtained from GSE48280 and GSE5370 datasets. The results were validated by GSE1551 and GSE1171 datasets. Based on the results of ROC-AUC analysis, the identified 10 hub genes have good diagnostic performance. Meanwhile, we investigated the biological functions of these common DEGs. GO and KEGG analyses revealed that these DEGs were significantly associated with changes in response to virus and type I IFN signaling pathway. MX1 and MX2 are IFN-inducible myxovirus dynamin-like GTPase, which are widely found in many mammals, whereas MX2 is only localized to the nucleus by a basic nuclear localization signal, which is required for antiviral activity [22]. Most of the previous studies have focused on the role of MX2 gene as an IFN-induced restriction factor of different viruses [23,24]. According to our result, MX1 and MX2 are highly expressed in muscle tissue of DM patients. The first case report regarding myxovirus and myositis was in 1979. The researchers

PLOS ONE
isolated influenza A and B viruses in a patient with acute polymyositis [25]. Recently, the sarcoplasmic expression of MX1 has been demonstrated as a hallmark of DM [26,27]. These findings can be considered as a supplement of the role of IFN pathway activation in patients with DM. MDA5, a specific antibody target of DM, plays a vital role in innate immunity by recognizing viral RNA in the cytoplasm and transmitting an induction signal downstream for type-1 IFN production [28]. These data indicate the potential involvement of a viral infection as a trigger moment for this systemic autoimmune disease. However, the relationship between the MX family and DM has not been fully revealed. Therefore, the exact role of the MX family in pathogenesis of DM may need to be further investigated.
STAT1, identified as the mediator of the cellular response to IFN, is activated by cytokine receptors via kinases of the JAK family [29]. Similar to our finding, recent studies have shown that the expression of STAT1 is significantly increased in the muscle tissue of patients with DM than in the normal control group [30]. This year, Julie J. Paik et al. proved that tofacitinib, a JAK-STAT inhibitor, was clinically effective in treating DM [31]. ISG15, a type I IFN-inducible protein, is highly upregulated in muscle, blood, and skin of patients with DM [32,33]. ISG15 production has been well characterized from type I IFN stimulation. However, what drives this production in DM is uncertain. In our study, ISG15 is considered as a good biomarker in diagnosing DM, and the AUC is up to 0.968. OAS proteins are double-stranded RNA-activated enzymes, which are induced by IFNs. The OAS family consists of four members, namely, OAS1, OAS2, OAS3, and OAS-like protein (OASL) [34]. Similar to our result, a high expression level of the OAS3 is detected in muscle biopsy of DM patients according to Musumeci's study in 2018.They concluded that the activation of IFN pathway could result in the upregulation of OAS3. Therefore, the IFN pathway could be the key pathway in the pathogenesis of DM.
Meanwhile, GO and KEGG analyses revealed that DEGs in our analysis were significantly associated with changes in response to virus and type I IFN signaling pathway. Viral antigens might play a triggering role in the onset of DM. This hypothesis is in accordance with numerous clinical observations suggesting a possible link between myopathies and infectious triggering agents [35]. With regard to the type I IFN signaling pathway, it consists of an important group of cytokines that are produced by innate immune cells, acting on adaptive immune cells. Overexpressed type I IFN-inducible transcripts and activated type I IFN signatures have been reported more than once in patients with DM [36,37]. Consistent with previous studies, we discovered that DEGs were enriched in this signaling pathway. Hence, inhibiting type I IFN signatures would be a new treatment target for DM.
In the interpretation of our results, the following limitations require careful discussion. On the one hand, we only focused on the hub genes, whereas the other DEGs, which may also have a diagnostic and therapeutic effect, were ignored. On the other hand, the identified genes and their pathways were not confirmed through in vitro assays or in vivo models. Therefore, precisely designed studies are necessary to verify these potential genes.
Our results could be further used to predict DM-related noncoding-RNAs (ncRNAs), which may be the future direction of our work. MicroRNA is a class of endogenous, small, evolutionarily conserved ncRNAs, normally act as a negative regulator of mRNAs. Several useful databases and tools could be applied for potential microRNA-disease associations [38][39][40][41]. MicroRNA can interact with long non-coding RNAs (lncRNAs) and circular RNAs (cir-cRNAs) in many diseases. Several well-designed algorithms can be used to discover the relationship among ncRNAs [42][43][44][45]. Given that the competing endogenous RNA (ceRNA) has not been established for DM, using these valuable tools, we hope to construct the ceRNA network in our future work.

Conclusion
In conclusion, based on integrated bioinformatic analyses, we identified differences in biological functions in DM compared with HC samples. In particular, our work identified 10 valuable genes in muscle tissue as potential biomarkers for the diagnosis of patients with DM. The identified hub genes all have good diagnostic performance, which may serve as new biomarkers for DM. Drugs targeting these hub genes would become new treatment options for this high-mortality disease. Thus, this analysis may guide future experimental research and clinical transformation.