Global gene network exploration based on explainable artificial intelligence approach

In recent years, personalized gene regulatory networks have received significant attention, and interpretation of the multilayer networks has been a critical issue for a comprehensive understanding of gene regulatory systems. Although several statistical and machine learning approaches have been developed and applied to reveal sample-specific regulatory pathways, integrative understanding of the massive multilayer networks remains a challenge. To resolve this problem, we propose a novel artificial intelligence (AI) strategy for comprehensive gene regulatory network analysis. In our strategy, personalized gene networks corresponding specific clinical characteristic are constructed and the constructed network is considered as a second-order tensor. Then, an explainable AI method based on deep learning is applied to decompose the multilayer networks, thus we can reveal all-encompassing gene regulatory systems characterized by clinical features of patients. To evaluate the proposed methodology, we apply our method to the multilayer gene networks under varying conditions of an epithelial–mesenchymal transition (EMT) process. From the comprehensive analysis of multilayer networks, we identified novel markers, and the biological mechanisms of the identified genes and their reciprocal mechanisms are verified through the literature. Although any biological knowledge about the identified genes was not incorporated in our analysis, our data-driven approach based on AI approach provides biologically reliable results. Furthermore, the results provide crucial evidences to reveal biological mechanism related to various diseases, e.g., keratinocyte proliferation. The use of explainable AI method based on the tensor decomposition enables us to reveal global and novel mechanisms of gene regulatory system from the massive multiple networks, which cannot be demonstrated by existing methods. We expect that the proposed method provides a new insight into network biology and it will be a useful tool to integrative gene network analysis related complex architectures of diseases.


Introduction
Gene regulatory networks are crucial for understanding complex mechanisms of diseases. To reveal heterogeneous genetic networks that underlie complex human diseases, various large- scale projects (e.g., The Cancer Genome Atlas and Cancer Genome Project) have been conducted and provided considerable amounts of omics data. The scale of gene networks is increasing, and strategies to comprehensively analyze large-scale gene networks have been claimed. In particular, there is currently substantial discussion regarding integrative analysis of sample-specific gene networks for personalized cancer diagnostics and therapeutics. Shimamura et al. [1] proposed a statistical method for sample-specific network construction, NetworkProfiler, which groups samples according to their specific genomic characteristics (e.g., drug response and survival time) and constructs a network for a target sample based only on samples having characteristics similar to those of the target sample. Thus, we can reveal gene regulatory networks under varying conditions of clinical characteristics of patients. The NetworkProfiler was applied to construct gene networks for 762 cancer cell lines characterized by EMT process, where EMT-related modulators for each cell line were measured based on 50 EMT-related genes labeled in the Molecular Signatures Database. They focused on E-cadherin, which connects epithelial cells at adherens junctions, and identified 24 candidate regulators. Interestingly, the identified genes did not consist of just the 50 genes defining the modulators, i.e., only one of the 24 identified genes was a member of the 50 genes, even though the regulators of E-cadherin were identified from the networks under varying conditions of the EMT modulators computed from the 50 genes. Among the 24 identified genes, approximately half were verified as regulators of E-cadherin from the literature. They selected KLF5 from the remaining genes and performed validation experiments. Through the experiments, the mechanism of KLF5 was demonstrated: knockdown of KLF5 decreased the expression of E-cadherin and led to morphological changes of the characteristics of EMT. Their validation results are also supported by later studies, e.g., Zhang et al. [2]. Although the other half were not verified at that time, a majority of those have been demonstrated as crucial EMT markers in the past decade [3,4]. Shimamura et al. [1] provided crucial indicators and made major contributions to reveal tumor progress related to EMT. Park et al. [5] suggested that cancer characteristics are not uniformly distributed, and the Gaussian kernel function used to control the effect of samples in the NetworkProfiler leads to extremely small amount of weight for modeling a target sample having rare cancer characteristic, because the Gaussian kernel function is based on a constant bandwidth. To address this problem, Park et al. [5] proposed a robust version of NetworkProfier based on an adaptive bandwidth via the k-nearest neighbor rule, and constructed a drug sensitivity-specific gene network based on the Sanger dataset from the Cancer Genome Project.
Although existing studies have provided crucial tools for precision medicine, understanding of large-scale multilayer gene networks is limited. In other words, a significant number of multilayer networks cannot be interpreted comprehensively using existing approaches (e.g., Shimamura et al. focused only on E-cadherin, and other regions could not be revealed).
To resolve this problem, we propose a novel strategy based on an explainable artificial intelligence (AI) methodology using tensor decomposition. Although machine learning and AI methods show remarkable performance in modeling accuracy, most of the existing approaches cannot explain how they obtain results (referred to as the black-box problem). This limits AI usage because their results cannot be verified. Maruhashi et al. [6,7] developed novel explainable AI approaches (i.e., DeepTensor and Tensor Reconstruction-based Interpretable Prediction (TRIP)) for learning multiway relations, which are deep learning approaches using tensor decomposition. Our strategy is based on two stages, i.e., constructing sample-specific gene networks and comprehensive analysis of the constructed multilayer netowrks by using the explainable AI methodology. That is, we construct a personalized gene regulatory network for each patient and the constructed network is considered as a second-order tensor. We then explore the massive multiple gene networks by using the AI method, TRIP. The use of the interpretable AI method based on tensor decomposition enables us to overcome limitation of existing gene network analysis, i.e., narrow angle in regulatory networks, and this leads to a greater understanding of biological systems of the regulatory interactions between genes. To the best of our knowledge, this is the first study on revealing gene regulatory networks using an explainable AI method.
To illustrate our strategy, we apply the proposed method to EMT-related gene regulatory networks constructed using NetworkProfiler [1]. We learn a low-dimensional subspace of 762 network tensors using the TRIP and explore multilayer networks on the constructed subspace. From the comprehensive analysis of multilayer networks, we identified novel markers, and the biological mechanisms of the identified genes and their reciprocal mechanisms are verified through the literature. Although any biological knowledge about mechanism of the identified genes was not incorporated in our analysis, the revealed genes by our method have strong evidences. In other words, our data-driven approach provides biologically reliable results. Although we illustrate our method based only on EMT-related networks, it can be expected that the proposed method will be a useful tool to global explore gene regulatory system involved in various clinical characteristics.
The remainder of this paper is organized as follows. In the Method section, we introduce the proposed a novel strategy for global exploration of multilayer personalized gene networks. In the Results section, we describe the evaluation of our strategy based on the gene regulatory networks varying according to the EMT status. Conclusions are provided in the Discussion section.

Method
Suppose X 1 , . . ., X q is q possible regulators that may control transcription of the l th target gene Y l . Consider the linear regression model for the target gene Y l , where β jl is the regression coefficient that represents the effect of regulator X j on target Y l and ε l is a random error vector ε l = (ε l1 , . . ., ε ln ) T that is assumed to be independently and identically distributed with mean 0 and variance σ 2 . To reveal gene regulatory interactions based on the regression model, various statistical and machine learning methods have been proposed and applied to gene network construction [8,9]. However, patient-specific gene regulatory systems cannot be revealed via the regression model because the strengths of the relationships between genes are given as β jl for all samples. We develop a novel strategy for integrative analysis for multilayer gene networks, which are a crucial tool for precision medicine. In our method, gene regulatory networks are constructed under varying conditions of samples and the multilayer networks are analyzed comprehensively using an explainable AI method. The gene network for a target sample is considered as a second-order tensor, and a deep learning method for tensor decomposition is applied to construct low-dimensional subspace of the multiway interaction between genes. Prediction and interpretation are performed on the constructed human-readable low-dimensional subspace, and thus we can effectively understand the constructed large-scale gene networks. Our strategy consists of two stages of constructing sample-specific gene regulatory networks and global investigating large-scale multiple gene networks based on an explainable AI technology.

Stage 1: Constructing personalized gene regulatory network based on sample-specific analysis
We consider the varying-coefficient structural equation model to construct a sample-specific gene regulatory network [10], β jl (m α ) is the regression coefficient of X j on Y l for the α th target sample of the modulator M = m α , such as drug sensitivity and survival risk of cell lines.
To estimate the varying coefficient β jl (m α ) describing strength of the relationship between the regulator and the target genes for each sample, we considered the the following kernelbased L1-type regularization method [1,5] where P(β lα ) is the recursive elastic net penalty, and is the Gaussian kernel function used to group samples according to specific cancer characteristics (i.e., modulator m i for i = 1, . . ., n). The Gaussian kernel function enables us to estimate β jl (m α ) for the α th sample based only on samples having characteristics of m i similar to the target sample modulator value m α . Thus, we can construct a gene network for a specific clinical status of samples, and it leads to evidences for personalized therapy.
In the first stage, we measure cancer characteristics m i (i = 1, . . ., n) of each sample and construct personalized gene regulatory networks under varying conditions of clinical characteristics using NetworkProfiler. This enables us to reveal patient-specific gene regulatory characteristics that are vital information of precision medicine. To comprehensively analyze the multiway interaction between genes, we proposed the use of the AI method in the second stage.

Stage 2: Extracting knowledge from the multilayer networks by explainable AI
The constructed multilayer networks (targets × regulators × samples) are considered as the input of the explainable AI method developed in our previous study, called TRIP [7]. The TRIP is a deep learning method for tensor decomposition. In this study, we consider a gene network matrix as a second-order tensor for a data point and then estimate projection matrices for the first and second modes based on the tensor decomposition. By using the projection matrices, we construct low-dimensional subspace of the network tensor. Thus, we can reduce dimensionality of large-scale multiple gene networks and extract crucial components to predict EMT-modulators.
For the K-mode tensor X for size I 1 × � � � × I K , the TRIP estimates a projection matrix C ðkÞ 2 R I k �J k and then projects the network tensors onto the constructed subspace by using C (k) . Prediction or classification is conducted on the constructed human-readable low-dimensional subspace. This leads to more explainable and interpretable results of the multilayer gene network analysis, as compared to the results on the complex high-dimensional data space.
The second stage of our strategy is based on the following two problems, • Constructing human-readable low-dimensional subspace and projecting the network tensor onto the subspace, • Predicting a response variable based on the projected network tensor X î where r i ¼ W T i vecðX i Þ, W i is the weight tensor for prediction and θ are the remaining parameters other than W i .
The optimization of the TRIP for the two aforementioned problems (i.e., projection of the network tensor and prediction of the response variable) is based on the following objective function where γ > 0 is the tuning parameter for the projection error and Lðy i ;ŷ i Þ is the loss function of the prediction given in (6). The second term is the loss function for projection of the network tensor onto the constructed subspace given in (5). As shown in the objective function (7), the TRIP estimates the projection matrix C (k) and simultaneously predicts the response variable. This implies that the TRIP enables us to achieve effective prediction results while retaining as much of the original data variance as possible.
The projection matrix C (k) that satisfies the orthonormal condition (i.e., C (k)T C (k) = I) is derived from singular value decomposition (SVD) of a latent variable Z (k) of the same size as C (k) . That is, we first perform SVD of Z (k) , and then set The latent variable Z (k) is estimated from the derivatives of the objective function O T by setting them to zero. Maruhashi et al. [7] showed that the derivatives of the objective function O T with respect to Z (k) are derived from a function of @O T /@C (k) and SVD of Z (k) , and proposed iterative algorithm for optimization problem of the TRIP given in (7). That is, the optimization problem of the TRIP is based on simultaneously taking the derivatives of the objective function O T with respect to C (k) , Z (k) , r n , and θ and setting them to zero. The projection matrices are considered as the regression coefficients of the crucial components. They learnt a multi-linear surrogate modelŷ 0 i ¼ hW; X i i þ b that minimizes the sum of the differences with the results of the prediction model P i kŷ i Àŷ 0 i k 2 2 , by setting the regression coefficient W to be a multi-linear tensor, i.e., W ¼ g ð1Þ � � � � � g ðKÞ , where � denotes outer product. The crucial components in prediction of response variable are extracted by principal component analysis of the vectors That is, the principal components of U (k) having the normalized u ðkÞ i as the column vector are extracted as crucial components for prediction of the response. For details of the notation and algorithm for the TRIP, please refer to Maruhashi et al. [7].
In the second stage, we perform integrative gene network analysis based on the constructed subspace of the network tensors.

Results
To illustrate our strategy, we apply the proposed method to personalized gene regulatory networks varying depending on the EMT process [1]. Because the EMT modulator is uniformly distributed, we consider EMT-related networks constructed by ordinary Networkprofiler instead of the robust NetworkProfiler.

Personalized gene regulatory networks under varying conditions of the EMT process
EMT-related gene networks were constructed based on the expression profiles of 762 cell liens from the Sanger Cell Line Project (http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi). A 13,508 (genes) × 762 (cancer cell lines) gene expression matrix was constructed based on the expression profiles of 13,006 mRNAs from Affymetrix GeneChip Human Genome U133 Array set and 502 microRNAs from bead-based oligonucleotide arrays. A total of 1732 regulator genes consisting of 1183 transcription factors, 47 nuclear receptors, and 502 human miRNA were extracted. The modulator describing the EMT process (epithelial-like or mesenchymal-like) of cell lines, the so-called EMT modulator, was extracted using the module discovery method [11] based on 50 genes labeled as the EMT-related genes (i.e., EMT-UP, EMT-DN, JECHLIN-GER-EMT-UP, and JECHLIN-GER-EMT-DN) in Molecular Signatures Database v2.5 (http://www.broadinstitute.org/gsea/msigdb/index.jsp). In short, 762 EMTrelated gene regulatory networks for 13,508 targets with 1732 regulator genes were constructed under varying conditions of the EMT modulator values. In this study, we consider the network for 13,508 targets × 1762 regulators as a second-order tensor for a specific EMT process and apply the TRIP for integrative gene network analysis of the 762 network tensors.

Comprehensive interpretation of the EMT-related network tensors
We learn a 50 × 50 subspace of the 762 EMT-related gene networks using the TRIP, i.e., J 1 = 50 and J 2 = 50. From the projection matrices of the constructed subspace C (k) k = 1, 2 in (9), each 50 crucial factors describing importance of target and regulator genes to predict EMT-modulator are extracted by PCA of U (1) and U (2) , respectively. In this study, we consider the crucial components of the subspace for regulator genes, i.e., U (2) . Fig 1 shows the variabilities explained by the extracted 50 independent components. As shown in Fig 1, the first component explains more than half of the total variability (i.e., 56%) of the EMT-related gene networks, and the first three components explain approximately 70% of the variability. We focus on the first three components and interpret the EMT-related gene networks based on the three components. Table 1  Colorectal cell lines are concentrated in the region with low values of the first two components. In short, the first and second components can be characterized by the 'Brain with Colorectal' and 'Leukemia with Colorectal', respectively. An independent test (i.e., Chi-squared test) between high/low values of each component (e.g., leukemia cells/nonleukemia cells and high/low values of each component) is conducted, and significant diseases (p < .01) for each component are given in the column χ 2 in Table 1, where high and low regions are classified based on the median value of each component. The brain seemed to be most associated with the first component, whereas Leukemia can be considered as the most significant disease for both the second and third components. The Brain, Leukemia and Colorectal seem to be crucial diseases for all three components. Fig 2 shows the scatter plot of the EMT modulator and extracted three components. As shown in Fig 2, the first component is strongly associated with the EMT modulator (i.e., the correlation coefficient of the EMT modulator and the first component is 0.997). This indicates that the extracted first component can be considered as another EMT modulator, and the second and third components may have information other than the EMT-related mechanism.
In order to globally explore the EMT-related mechanisms from the multilayer networks, we combine our results with the well-known EMT markers (i.e., five EMT transcription factors (EMT-TFs): ZEB1, ZEB2, SNAIL1, SNAIL2, and TWIST1, see Table 2). For each component, the target networks of the five EMT-TFs are constructed in the region of high and low values of each component. From the multiple networks in each region, we first extract target genes (TG.EMT-TFs) of the EMT-TFs and then extracted the target genes of the TG.EMT-TFs, where the genes are extracted as targets in at least one network are considered as target genes. The target networks for each region are give as binary adjacency matrices (1: two genes are associated in at least one cell line, 0: otherwise). Next, we compute compute absolute differences of the two adjacency matrices for genes, then extract 10 genes showing considerably different edges between the two adjacency matrices for high and low regions of each component (i.e., for each genes, sum of the absolute differences are computed for all genes in the EMT target network, and then 10 genes having the largest sum of the absolute differences are extracted). The identified 10 genes can be considered as markers having specific regulatory characteristics depending on the each component. Table 3 shows the identified genes for the three components and their evidence sources. We focus only on the newly identified genes other than the five EMT-TFs, even though the five EMT-TFs are selected for all three components.  Among the newly identified 17 genes, only IRF6 is used for defining the EMT modulator. For the identified genes, we compute the regulatory effect change (REC) for 13,508 target genes [1,5]. In this study, we consider direct connection between genes, i.e., in the 762 EMTrelated gene regulatory networks, the effect of the j th regulator on the l th target gene at the α th sample can be measured by the following regulatory effect (RE),  https://doi.org/10.1371/journal.pone.0241508.g002

EMT-TFs EMT-related mechanism Evidences
ZEB family: ZEB1, ZEB2 : Snail1 and Twist1 can up-regulate and cooperate with ZEB1 to induce EMT. : MYC-or serum-induced EMT were characterized by increased expression of ZEB1, ZEB2, and SNAI1. [3,12] SNAIL1 : mediator in different signaling pathways that induce EMT, such as the NBS1-SNAIL1 axis and the TGF-β/SMADS/HMGA2/SNAIL1 axis. : main role in EMT, the process by which epithelial cells acquire a migratory, mesenchymal phenotype as a result of its repression of E-cadherin. [13,14] SNAIL2 : up-regulated by Notch to induce EMT with an increase of cell migration and loss of cell-cell junctions.
: activates ZEB1 and cooperates with it to promote EMT. : direct induction of SNAIL1 is essential for TWIST1 to induce EMT. [4,15,16] TWIST1 : HIF-1α directly binds to the promoter of TWIST1 to induce EMT in hypoxic microenvironments.
: needs to induce SNAIL1 to suppress the epithelial branch of the EMT program.
: acts together with SNAIL1 to promote EMT. [4,17] https://doi.org/10.1371/journal.pone.0241508.t002 As mentioned above, each 762 cell lines have EMT modulator values and corresponding networks. Fig 3 shows the average of RE of the identified genes other on the 50 EMT-related genes for 10 networks corresponding to the 10 highest (top) and lowest (middle) EMT modulator values, where edges having zero mean for 10 cell lines are deleted. As shown in Fig 3,  FOXF1 regulates the EMT-related genes, COL6A1, COL6A2, HTRA1, IL11, MMP2, PCOLCE, PDGFRA, PDGFRB, and PMP22. The regulation system of FOXF1 with the EMT-related genes is observed in both epithelial-like and mesenchymal-like cell lines. E-cadherin (CDH1), which is one of the important genes for cell-cell adhesion in epithelial cells, is positively regulated by LSR and GRLH2 and negatively regulated by the ZEB family (i.e., ZEB1 and ZEB2). We then focus on genes not discovered as EMT markers in the literature, but selected by our method, AFF1, KANK2, PCBD1, ZNF91, and MAFB, i.e., their EMT-related mechanisms have not yet been revealed. Although a majority of the unrevealed genes regulate the EMT-related genes, their REs are extremely small. From these results, it can be considered that the genes may have mechanisms that are not directly involved in the EMT process.
For the selected 10 genes from the analysis of the three components, we compute REC matrices consisting of 10 columns (extracted 10 genes) and 13,493 rows (13,493 target genes) and then perform PCA of the REC matrix. The identified genes for each component are grouped in the first two PC spaces, and the reciprocal mechanisms between the grouped genes can be found in the published literature. Overall flowchart of our strategy for global exploration of EMT-related networks is given in

PLOS ONE
Global gene network explorer based on explainable AI are identified, i.e., MAFB, GRHL2, ANKRD5, FOXF2, and ZNF91. Fig 5 shows the target networks of the five genes. A relatively sparse network can be seen in the Low.E&Low.C1 region, as compared to the High.E&High.C1 region. That is, only five EMT-TFs have a strong relationship between each other in Low.E&Low.C1, whereas there are many relationships between not only the identified 10 genes, but also their target and regulator genes in the High.E&High. C1 region. As shown in Table 3, more than half of the identified genes are confirmed as EMT markers (i.e., their EMT-related mechanisms have been reported in the literature). For instance, GRHL2 reduces the invasion and migration through the inhibition of TGF-βinduced EMT in gastric cancer [26]; GRHL2 leads to mesenchymal-epithelial transition (MET) through the inhibition of ZEB1 [13]. ANKRD5 plays roles in protocadherin-mediated cell protrusion and adhesion, and participates in cell adhesion [18]. FOXF2 was identified as a novel TF related to EMT-suppressing in basal-like breast cancer (BLBC) FOXF2 negatively targets TWIST1 in the EMT programming and metastasis progress of BLBC [22]. We focus only on the newly discovered five genes because the reciprocal mechanisms between the five EMT-TFs were well known in previous studies. For both high and low regions of the first component, GRHL2 is grouped with well-known EMT marker ZEB1. Their reciprocal mechanisms in the EMT-related process are demonstrated as follows [25,27,28]: GRHL2 suppresses EMT and restores sensitivity to anoikis by repressing ZEB1 expression; Combination of TGF-β and Wnt activation represses GRHL2 expression by direct interaction of ZEB1 with the GRHL2 promoter, inducing EMT; Reciprocal feedback loop between GRHL2 and ZEB1 controls epithelial versus mesenchymal phenotypes and EMT-driven tumor progression; GRHL2 is the main gatekeeper of EMT in EOC via miR-200-ZEB1, and their axis forms the core of EMT signaling. The EMT-related interaction between FOXC2 and TWIST1 is also demonstrated as follows [21,22]: FOXC2 transcriptionally represses the expression of two EMT-TFs TWIST1 and FOXC2; FOXC2 negatively targets TWIST1 in the EMT programming and metastasis progress of BLBC. Genes MAFB and ZNF91 are members of Gene-Ontology-Terms Class: GO:0006355-regulation of transcription, DNA-templated. However, the interaction between MAFB and ZNF91 in diseases has not been yet demonstrated.

PLOS ONE
GLI3 in region for the low component 2, whereas the target network for the high component 2 involves many genes. Fig 8 shows the results of PCA for the REC matrices. We focus on the results of the low EMT region (i.e., Low.E&High.C2 versus Low.E&Low.C2) and group the genes as follows, ZEB1 and OVOL2: OVOL2 is one of the well-known EMT markers, and the interaction in the EMT process of OVOL2 and ZEB1 has been demonstrated in many studies: OVOL-  [39].

FOXF1 and SNAI1
: FOXF1 is also confirmed as an EMT-related marker, and reciprocal mechanisms of genes in this group have been demonstrated: the expression of FOXF1 inhibits cancer cell invasion and migration, whereas the inactivation of FOXF1 stimulates cell invasion and migration (Wei et al., 2014); higher level of FOXF1 is positively associated with enrichment of EMT gene signatures [19]; Overexpression of FOXF1 induces EMT by transcriptionally activating SNAI1 in colorectal cancer metastasis [19]. GLI3 and SNAI2: GLI3 is one of the zinc finger protein and well-known marker of Sonic Hedgehog, and interaction between GLI3 and SNAI2 has been demonstrated: shRNA-GLI3-transfected cells were associated with the decreased expression of stem cell-and EMT-related genes (CD44, BMI1, POU5F1, and SNAI2) [24].  C3 regions, where TP63 and IFI16 are also identified as crucial genes in the analysis of the second component. IRF6 is one of the 50 EMT-related genes defining the EMT modulator. The EMT-related mechanism of IRF6 has been demonstrated in previous studies; especially, there are many studies on the association between well-known EMT markers with IRF6: ectopic expression of IRF6 increases the expression of SNAI2 and diminishes the expression of various epithelial markers (e.g., E-cadherin) in EMT; TGFβ3 increases IRF6 expression, and IRF6 appears to regulate EMT during palatal fusion via SNAI2 [33]; IRF6 is downregulated during the EMT process of breast cancer and prostate cancer [34]. Although EMT-related mechanisms of KANK2 have not yet been revealed, it has been demonstrated that KANK2 concentrates around most mature focal adhesions and binds talin in migrating cells [51]. AFF1 is known as the mixed lineage leukemia fusion-associated gene and plays a role in osteogenic differentiation of human mesenchymal stem cells [52][53][54]. However, EMT-related mechanisms of AFF1 have not yet been demonstrated.
The target networks of the newly identified five genes are presented in Fig 9. There are no significant differences between the two regions, except for the association between TP63 and FOXF2. It can be considered that the target networks related to the third component may be dominated by the mechanism of the high EMT region, and only regulating FOXF2 by TP63 can be considered as a specific characteristic related to component 3. Fig 10 shows the projected 10 regulators and target genes on the first two PC spaces of REC. We group the genes as follows,

Group 1: IRF6 and TP63
IRF6 regulated by TP63 plays a tumor suppressor role in squamous cell carcinomas through a Notch-dependent mechanism, which plays critical roles in EMT pathway [34]. The reciprocal mechanisms of genes in groups 2 and 3 have not yet been demonstrated.

Functional enrichment analysis based on the bioinformatics tool DAVID
To identify biological processes involved in the extracted components, we performed gene enrichment analysis using the bioinformatics tool Database for Annotation, Visualization and Integrated Discovery (DAVID) [55]. We used all regulatory genes as a background and performed functional enrichment analysis for 17 genes (1% of the regulatory genes) other than five EMT-TFs showing different regulatory systems in the high and low regions of each component. Fig 11 shows the functional annotation chart (p < .05) and the corresponding p-value (i.e., −log(p.value)). For the first component, six clusters are found, where the most significant cluster corresponding to the lowest p-value (i.e., the highest score for enrichment) is "transcriptional activator activity, RNA polymerase II transcription regulatory region sequence-specific binding" grouping genes OVOL2, MAFB, FOXF1, FOXF2, GRHL2, and ALX1. The "RNA polymerase II transcription regulatory region sequence-specific binding" is a GO annotation of ZEB1. "Embryonic digestive tract morphogenesis", "disease mutation", "lung lobe morphogenesis", and "palate development" are common factors enriched in the first and second components. Except for "disease mutation", the three clusters are the GO functional terms (http:// www.informatics.jax.org/), • Embryonic digestive tract morphogenesis (GO:0048566): the anatomical structures of the digestive tract are generated and organized during embryonic development. The digestive tract is the anatomical structure through which food passes and is processed. • Lung lobe morphogenesis (GO:0060463): The process in which the anatomical structures of a lung lobe are generated and organized. A lung lobe is a projection that extends from the lung.
• Palate development (GO:0060021: Roof of mouth development): The biological process whose specific outcome is the progression of the roof of the mouth from an initial condition to its mature state. This process begins with the formation of the structure and ends with the mature structure. The roof of the mouth is the partition that separates the nasal and oral cavities.
We focus on sterile alpha motif/pointed domain (grouping FLI1, ELF3, and TP63), which is involved in interactions with proteins, DNA and RNA. In a previous study, it was demonstrated that sterile alpha motif-pointed domain containing ETS TF (SPDEF) negatively regulates CCL2 and the EMT markers in prostate cancer cells, and the interaction between SPDEF and CDH1 (E-cadherin) related to the EMT process was also demonstrated: decreased SPDEF levels significantly induce CCL2 and CDH2 (N-cadherin), and decrease CDH1 (E-cadherin) mRNA and protein expression, confirming the association between SPDEF inhibition and EMT in cells. [56]. Although only ELF3 was demonstrated as a crucial regulator of E-cadherin in Shimamura et al., [1], interactions FLI1 and TP63 with E-cadherin are also verified as follows: Cav1-Snail-E-cadherin pathway plays a central role in the expression of the oncogenic transformation functions of fusion gene EWS/FLI1 [57]; reactivation of 4Np63a is linked to the maintenance of epithelial markers and suggests that E-cadherin has a dual role in lung squamous cell carcinoma [58]. For the second and third components, "transcription activation" (grouping TP63, TGFB1I1, and GRHL2) and "keratinocyte proliferation" (grouping IRF6 and TP63) are the most enriched terms, respectively. keratinocyte is frequently reported to be related to the second and third components in Fig 11. keratinocyte plays a role in cell-cell adhesion [59,60]. We found that gene TP63, which is known as a keratinocyte TF, plays a key role in the EMT process through interaction with the well-known EMT markers, i.e., TGF-β, GRHL2, and miR-200n family: Ectopic 4Np63a expression in normal human epidermal keratinocytes yields the EMT phenotype in a TGF-β-dependent manner; Knockdown of all isoforms of p63 leads to the EMT phenotype through loss of GRHL2 and miR-200 family genes [61].
Furthermore, it has been found that IRF6 (target of TP63) is induced by the NOTCH signaling pathway, which plays vital roles in the development and progression of cancers through regulating ZEB1 expression and EMT pathway, in breast cancer and keratinocytes [34]. Interestingly, the cluster consisting of IRF6 and TP63 related to keratinocyte proliferation was also identified by PCA of REC for the third component (see Fig 8). In addition, it was shown that the interaction of TP63 and FOXF2 is uniquely different in the target networks for the high and low regions of the third component (see Fig 7). In short, the third component and a part of the second component of the EMT-related networks can be described as a keratinocyterelated factor and the interaction of TP63 and IRF6, which may play a vital role in EMT in keratinocytes.

Discussion
We introduced a novel methodology for a comprehensive analysis of large-scale personalized network tensors. In this study, we considered a gene regulatory network as a tensor for a data point, and decompose the multilayer networks represented as tensors by using an AI approach, TRIP. Unlike existing studies for sample-specific gene network construction, our strategy analyzes whole multilayer networks based on tensor decomposition, thus we can perform wide exploration of the large-scale gene regulatory networks for all patients. To illustrate our method, we applied it to personalized networks constructed for 762 cell lines having varying conditions of the EMT process. We identified novel candidate markers and verified biological mechanisms of a majority of the identified markers based on the literature. Although most of the identified markers were found in previous studies, some of the revealed genes could not be verified. Further work is required towards experimental validation of the newly revealed markers. In this study, our strategy was illustrated based only on EMT-related gene regulatory networks. As one of our future works, we consider the comprehensive analysis of dynamic systems of personalized gene networks in accordance with various clinical characteristics (e.g., drug sensitivity of cell lines).
Global exploration of multilayer networks along with clinical characteristics of a patient provides crucial information for evidence-based personalized medicine. We expect that the proposed strategy based on an explainable AI approach will provide a novel insight into network biology.